Loughborough University
Browse

Causally aware reinforcement learning agents for autonomous cyber defence

Download (3.85 MB)
journal contribution
posted on 2024-09-26, 10:03 authored by Tom Purves, Kostas KyriakopoulosKostas Kyriakopoulos, Sian Jenkins, Iain Phillips, Tim Dudman

Artificial Intelligence (AI) is seen as a disruptive solution to the ever increasing security threats on network infrastructures. To automate the process of defending networked environments from such threats, approaches such as Reinforcement Learning (RL) have been used to train agents in cyber adversarial games. One primary challenge is how contextual information could be integrated into RL models to create agents which adapt their behaviour to adversarial posture. Two desirable characteristics identified for such models are that they should be interpretable and causal.To address this challenge, we propose an approach through the integration of a causal rewards model with a modified Proximal Policy Optimisation (PPO) agent in Meta’s MBRL-Lib framework. Our RL agents are trained and evaluated against a range of cyber-relevant scenarios in the Dstl YAWNING-TITAN (YT) environment. We have constructed and experimented with two types of reward functions to facilitate the agent’s learning process. Evaluation metrics include, among others, games won by the defence agent (blue wins), episode length, healthy nodes and isolated nodes.Results show that, over all scenarios, our causally aware agent achieves better performance than causally-blind state-of-the-art benchmarks in these scenarios for the above evaluation metrics. In particular, with our proposed High Value Target (HVT) rewards function, which aims not to disrupt HVT nodes, the number of isolated nodes is improved by 17% and 18% against the model-free and Neural Network (NN) model-based agents across all scenarios. More importantly, the overall performance improvement for the blue wins metric exceeded that of model-free and NN model-based agents by 40% and 17%, respectively, across all scenarios.

Funding

Frazer-Nash Consultancy Ltd. on behalf of the Defence Science and Technology Laboratory (Dstl)

History

School

  • Mechanical, Electrical and Manufacturing Engineering

Published in

Knowledge-Based Systems

Volume

304

Publisher

Elsevier B.V.

Version

  • VoR (Version of Record)

Rights holder

© The Authors

Publisher statement

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Acceptance date

2024-09-13

Publication date

2024-09-19

Copyright date

2024

ISSN

0950-7051

eISSN

1872-7409

Language

  • en

Depositor

Dr Kostas Kyriakopoulos. Deposit date: 13 September 2024

Article number

112521