Risk-aware multi-armed bandits with refined upper confidence bounds

Liu, Xingchi; Derakhshani, Mahsa; Lambotharan, Sangarapillai; Van der Schaar, Mihaela

Risk-aware multi-armed bandits with refined upper confidence bounds

journal contribution

posted on 2021-01-07, 11:52 authored by Xingchi Liu, Mahsa DerakhshaniMahsa Derakhshani, Sangarapillai Lambotharan, Mihaela Van der Schaar

The classical multi-armed bandit (MAB) framework studies the exploration-exploitation dilemma of the decisionmaking problem and always treats the arm with the highest expected reward as the optimal choice. However, in some applications, an arm with a high expected reward can be risky to play if the variance is high. Hence, the variation of the reward should be considered to make the arm-selection process risk-aware. In this paper, the mean-variance metric is investigated to measure the uncertainty of the received rewards. We first study a risk-aware MAB problem when the reward follows a Gaussian distribution, and a concentration inequality on the variance is developed to design a Gaussian risk aware-upper confidence bound algorithm. Furthermore, we extend this algorithm to a novel asymptotic risk aware-upper confidence bound algorithm by developing an upper confidence bound of the variance based on the asymptotic distribution of the sample variance. Theoretical analysis proves that both proposed algorithms achieve the O(log(T)) regret. Finally, numerical results demonstrate that our algorithms outperform several risk-aware MAB algorithms.

Funding

Royal Academy of Engineering under the Leverhulme Trust Research Fellowship scheme (DerakhshaniLTRF1920\16\67)

History

School

Mechanical, Electrical and Manufacturing Engineering

Published in

IEEE Signal Processing Letters

Volume

28

Pages

269 - 273

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Version

AM (Accepted Manuscript)

Rights holder

Publisher statement

Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Acceptance date

2020-12-12

Publication date

2020-12-28

Copyright date

2021

DOI

https://doi.org/10.1109/lsp.2020.3047725

ISSN

1070-9908

eISSN

1558-2361

Publisher version

https://doi.org/10.1109/lsp.2020.3047725

Language

en

Depositor

Dr Mahsa Derakhshani. Deposit date: 4 January 2021

Usage metrics

Keywords

Networking & Telecommunications Electrical and Electronic Engineering Communications Technologies exploration and exploitation risk-aware bandit Multi-armed bandit Artificial Intelligence and Image Processing

Risk-aware multi-armed bandits with refined upper confidence bounds

Funding

Royal Academy of Engineering under the Leverhulme Trust Research Fellowship scheme (DerakhshaniLTRF1920\16\67)

History

School

Published in

Volume

Pages

Publisher

Version

Rights holder

Publisher statement

Acceptance date

Publication date

Copyright date

DOI

ISSN

eISSN

Publisher version

Language

Depositor

Usage metrics

Categories

Keywords

Licence

Exports