Loughborough University
Browse
1-s2.0-S0893608020301519-main (1).pdf (1.18 MB)
Download file

Deep Multi-Critic Network for accelerating Policy Learning in multi-agent environments

Download (1.18 MB)
journal contribution
posted on 2020-05-12, 10:24 authored by Joosep Hook, Varuna De-SilvaVaruna De-Silva, Ahmet Kondoz
Humans live among other humans, not in isolation. Therefore, the ability to learn and behave in multi-agent environments is essential for any autonomous system that intends to interact with people. Due to the presence of multiple simultaneous learners in a multi-agent learning environment, the Markov assumption used for single-agent environments is not tenable, necessitating the development of new Policy Learning algorithms. Recent Actor-Critic algorithms proposed for multi-agent environments, such as Multi-Agent Deep Deterministic Policy Gradients and Counterfactual Multi-Agent Policy Gradients, find a way to use the same mathematical framework as single agent environments by augmenting the Critic with extra information. However, this extra information can slow down the learning process and afflict the Critic with Curse of Dimensionality. To combat this, we propose a novel Deep Neural Network configuration called Deep Multi-Critic Network. This architecture works by taking a weighted sum over the outputs of multiple critic networks of varying complexity and size. The configuration was tested on data collected from a real-world multi-agent environment. The results illustrate that by using Deep Multi-Critic Network, less data is needed to reach the same level of performance as when not using the configuration. This suggests that as the configuration learns faster from less data, then the Critic may be able to learn Q-values faster, accelerating Actor training as well.

Funding

Engineering and Physical Sciences Research Council in the United Kingdom , under grant number EP/T000783/1: MIMIC: Multimodal Imitation Learning in Multi-Agent Environments.

History

School

  • Loughborough University London

Published in

Neural Networks

Volume

128

Issue

August 2020

Pages

97 - 106

Publisher

Elsevier

Version

  • VoR (Version of Record)

Rights holder

© The Authors

Publisher statement

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Acceptance date

2020-04-27

Publication date

2020-05-04

Copyright date

2020

ISSN

0893-6080

Language

  • en

Depositor

Joosep Hook. Deposit date: 11 May 2020

Usage metrics

    Categories

    No categories selected

    Licence

    Exports