This paper investigates asynchronous reinforcement learning algorithms for joint buffer-aided relay selection and power allocation in the non-orthogonal-multiple-access (NOMA) relay network. With the hybrid NOMA/OMA transmission, we investigate joint relay selection and power allocation to maximize the throughput with the delay constraint. To solve this complicated high-dimensional optimization problem, we propose two asynchronous reinforcement learning-based schemes: the asynchronous deep Q-Learning network (ADQN)-based scheme and the asynchronous advantage actor-critic (A3C)-based scheme, respectively. The A3C-based scheme achieves better performance and robustness when the action space is large, while the ADQN-based scheme converges faster with a small action space. Moreover, a-prior information is exploited to improve the convergence of the proposed schemes. The simulation results show that the proposed asynchronous learning-based schemes can learn from the environment and achieve good convergence.
Funding
Communications Signal Processing Based Solutions for Massive Machine-to-Machine Networks (M3NETs)
Engineering and Physical Sciences Research Council