Thesis-2021-Liu.pdf (2.37 MB)
Download file

Reinforcement learning for content caching and crowdsourcing leveraging context awareness

Download (2.37 MB)
posted on 30.11.2021, 14:30 by Xingchi Liu
With the evolution of content demand characteristics and the emergence of crowdsourced streaming services, Internet video traffic including on-demand and live videos has grown explosively. This leads to new challenges for video streaming systems in terms of controlling the core network congestion and meeting the quality of service (QoS) requirements of users. To solve these problems, leveraging edge computing and storage resources has become a promising solution to enable content caching and transcoding. For on-demand videos, by caching popular contents at the network edge, the user requests for those contents will not be transmitted to the core network and hence can be served with less delay. Thus, the backhaul link load and the network congestion can be alleviated. However, how to determine the optimal caching placement under content popularity dynamics aiming to maximize the caching efficiency remains to be an open issue. On the other hand, to ensure the adaptive streaming of live videos, various formatting and quality versions need to be transcoded concurrently. Utilizing the abundant computational resources at the user end (UE) is a promising solution to provide adaptive streaming and meet the stringent latency requirements of those services. The goal of this thesis is to study reinforcement learning (RL) to solve online decision-making problems in content caching and video transcoding systems at the network edge leveraging the contextual information.
First, we study how to dynamically update the content placement at the edge server assuming the unknown and time-varying content popularity profile. The caching decision problem is modelled as a non-stationary Markov decision process (MDP) with varying states and transition probabilities. A context-aware pop-ularity learning algorithm is designed to learn the time-varying file popularities via incremental clustering scheme. With the assistance of the learned knowledge, an RL-based content caching scheme is designed via state-action-reward-state-action (SARSA) and linear function approximation. Next, enlightened by the RL-based caching scheme, a reactive caching algorithm is proposed to reduce the computational complexity by directly comparing the popularities between the requested file and the cached files for cache replacement decision.
Secondly, an edge-assisted transcoding system is proposed for crowdsourced live streaming services, and a new quality of experience metric is defined which considers the influences from both the quality and the genre of the received live video. The transcoding task assignment and viewer association problem is formulated as a non-convex integer optimization problem aiming to maximize the network utility of the transcoding system, which is then solved by the computationally attractive complementary geometric programming (CGP).
Thirdly, a more complex edge transcoding system is studied taking into account the delay requirements of the viewers. To identify the risk of choosing highly unstable transcoders while learning the transcoding capabilities of transcoders, we first study to solve a risk-aware multi-armed bandit (MAB) problem with refined upper confidence bounds (UCBs) of the arms’ variances. Based on the UCBs, a risk-aware contextual learning algorithm is designed to decide which transcoders are more stable and more efficient. In addition, an epoch-based transcoding task assignment and viewer association algorithm is proposed to maximize the network utility and maintain low transcoding task switching cost.
Finally, a structured bandit problem is studied to solve the transcoder selection problem from a different perspective. Here, assuming that there are performance correlations among multiple transcoders but the context information used to build the correlations is not available. To tackle the structured bandit problem which assumes the arm rewards are functions of globally shared parameters, an enhanced Thompson sampling (TS)-based algorithm is designed to sequentially select fog transcoders while handling the exploration-exploitation (EE) dilemma.



  • Mechanical, Electrical and Manufacturing Engineering


Loughborough University

Rights holder

© Xingchi Liu

Publication date



A thesis submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.




Mahsa Derakhshani ; Sangarapillai Lambotharan

Qualification name


Qualification level


This submission includes a signed certificate in addition to the thesis file(s)

I have submitted a signed certificate