Loughborough University
Browse
fnbot-14-578675.pdf (2.5 MB)

Detecting changes and avoiding catastrophic forgetting in dynamic partially observable environments

Download (2.5 MB)
journal contribution
posted on 2021-01-13, 09:25 authored by Jeff DickJeff Dick, Pawel Ladosz, Eseoghene Ben-Iwhiwhu, Hideyasu Shimadzu, Peter KinnellPeter Kinnell, Praveen K Pilly, Soheil Kolouri, Andrea SoltoggioAndrea Soltoggio
The ability of an agent to detect changes in an environment is key to successful adaptation. This ability involves at least two phases: learning a model of an environment, and detecting that a change is likely to have occurred when this model is no longer accurate. This task is particularly challenging in partially observable environments, such as those modeled with partially observable Markov decision processes (POMDPs). Some predictive learners are able to infer the state from observations and thus perform better with partial observability. Predictive state representations (PSRs) and neural networks are two such tools that can be trained to predict the probabilities of future observations. However, most such existing methods focus primarily on static problems in which only one environment is learned. In this paper, we propose an algorithm that uses statistical tests to estimate the probability of different predictive models to fit the current environment. We exploit the underlying probability distributions of predictive models to provide a fast and explainable method to assess and justify the model's beliefs about the current environment. Crucially, by doing so, the method can label incoming data as fitting different models, and thus can continuously train separate models in different environments. This new method is shown to prevent catastrophic forgetting when new environments, or tasks, are encountered. The method can also be of use when AI-informed decisions require justifications because its beliefs are based on statistical evidence from observations. We empirically demonstrate the benefit of the novel method with simulations in a set of POMDP environments.

Funding

This material was based upon work supported by the United States Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under Contract No. FA8750-18-C-0103.

History

School

  • Mechanical, Electrical and Manufacturing Engineering
  • Science

Department

  • Computer Science
  • Mathematical Sciences

Published in

Frontiers in Neurorobotics

Volume

14

Publisher

Frontiers Media SA

Version

  • VoR (Version of Record)

Rights holder

© The Authors

Publisher statement

This is an Open Access Article. It is published under the Creative Commons Attribution 4.0 International Licence (CC BY 4.0). Full details of this licence are available at: https://creativecommons.org/licenses/by/4.0/

Acceptance date

2020-11-20

Publication date

2020-12-23

Copyright date

2020

ISSN

1662-5218

eISSN

1662-5218

Language

  • en

Depositor

Dr Andrea Soltoggio. Deposit date: 12 January 2021

Article number

578675

Usage metrics

    Loughborough Publications

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC