We study the problem of autonomous operation of the device-to-device (D2D) pairs in a heterogeneous cellular network with multiple base stations (BSs). The spectrum bands of the BSs (that may overlap with each other) comprise the sets of orthogonal wireless channels. We consider the following spectrum usage scenarios: 1) the D2D pairs transmit over the dedicated frequency bands and 2) the D2D pairs operate on the shared cellular/D2D channels. The goal of each device pair is to jointly select the wireless channel and power level to maximize its reward, defined as the difference between the achieved throughput and the cost of power consumption, constrained by its minimum tolerable signal-to-interference-plus-noise ratio requirements. We formulate this problem as a stochastic non-cooperative game with multiple players (D2D pairs) where each player becomes a learning agent whose task is to learn its best strategy (based on the locally observed information) and develop a fully autonomous multi-agent Q-learning algorithm converging to a mixed-strategy Nash equilibrium. The proposed learning method is implemented in a long term evolution-advanced network and evaluated via the OPNET-based simulations. The algorithm shows relatively fast convergence and near-optimal performance after a small number of iterations.
History
School
Science
Department
Computer Science
Published in
IEEE Transactions on Communications
Volume
64
Issue
9
Pages
3996 - 4012
Publisher
Institute of Electrical and Electronics Engineers (IEEE)