Loughborough University
Browse

Policy generation from latent embeddings for reinforcement learning

Download (3.21 MB)
conference contribution
posted on 2023-06-16, 15:15 authored by Corentin ArtaudCorentin Artaud, Rafael Moreira-Pina, Xiyu ShiXiyu Shi, Varuna De-SilvaVaruna De-Silva
The human brain endows us with extraordinary capabilities that enable us to create, imagine, and generate anything we desire. Specifically, we have fascinating imaginative skills allowing us to generate fundamental knowledge from abstract concepts. Motivated by these traits, numerous areas of machine learning, notably unsupervised learning and reinforcement learning, have started using such ideas at their core. Nevertheless, these methods do not come without fault. A fundamental issue with reinforcement learning especially now when used with neural networks as function approximators is their limited achievable optimality compared to its uses from tabula rasa. Due to the nature of learning with neural networks, the behaviours achievable for each task are inconsistent and providing a unified approach that enables such optimal policies to exist within a parameter space would facilitate both the learning procedure and the behaviour outcomes. Consequently, we are interested in discovering whether reinforcement learning can be facilitated with unsupervised learning methods in a manner to alleviate this downfall. This work aims to provide an analysis of the feasibility of using generative models to extract learnt reinforcement learning policies (i.e. model parameters) with the intention of conditionally sampling the learnt policy-latent space to generate new policies. We demonstrate that under the current proposed architecture, these models are able to recreate policies on simple tasks whereas fail on more complex ones. We therefore provide a critical analysis of these failures and discuss further improvements which would aid the proliferation of this work.

History

School

  • Loughborough University London

Published in

Intelligent Systems and Pattern Recognition

Volume

1941

Pages

155–168

Source

The International Conference on Intelligent Systems & Pattern Recognition

Publisher

Springer

Version

  • AM (Accepted Manuscript)

Rights holder

© The Author(s), under exclusive license to Springer Nature Switzerland AG

Publisher statement

This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-031-46338-9_12. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms

Publication date

2023-11-05

Copyright date

2024

ISBN

9783031463372; 9783031463389

Book series

Communications in Computer and Information Science

Language

  • en

Editor(s)

Akram Bennour; Ahmed Bouridane; Lotfi Chaari

Location

Hammamet, Tunisia

Event dates

11th May 2023 - 13th May 2023

Depositor

Dr Xiyu Shi. Deposit date: 15 June 2023

Usage metrics

    Loughborough Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC