Loughborough University
Browse

'I don't want to play with you anymore': Dynamic partner judgements in moody reinforcement learners playing the prisoner's dilemma

Download (1.25 MB)
journal contribution
posted on 2025-02-12, 15:37 authored by Grace Feehan, Syeda FatimaSyeda Fatima
Emerging reinforcement learning algorithms that utilize human traits as part of their conceptual architecture have been demonstrated to encourage cooperation in social dilemmas when compared to their unaltered origins. In particular, the addition of a mood mechanism facilitates more cooperative behaviour in multi-agent iterated prisoner dilemma (IPD) games, for both static and dynamic network contexts. Mood-altered agents also exhibit humanlike behavioural trends when environmental aspects of the dilemma are altered, such as the structure of the payoff matrix used. It is possible that other environmental effects from both human and agent-based research will interact with moody structures in previously unstudied ways. As the literature on these interactions is currently small, we seek to expand on previous research by introducing two more environmental dimensions; voluntary interaction in dynamic networks, and stability of interaction through varied network restructuring. From an initial Erdos-Renyi random network, we manipulate the structure of a network IPD according to existing methodology in human-based research, to investigate possible replication of their findings. We also facilitated strategic selection of opponents through the introduction of two partner evaluation mechanisms and tested two selection thresholds for each. We found that even minimally strategic play termination in dynamic networks is enough to enhance cooperation above a static level, though the thresholds for these strategic decisions are critical to desired outcomes. More forgiving thresholds lead to better maintenance of cooperation between kinder strategies than stricter ones, despite overall cooperation levels being relatively low. Additionally, moody reinforcement learning combined with certain play termination decision strategies can mimic trends in human cooperation affected by structural changes to the IPD played on dynamic networks - as can kind and simplistic strategies such as Tit-For-Tat. Implications of this in comparison with human data is discussed, and suggestions for diversification of further testing are made.

Funding

UKRI

History

School

  • Science

Published in

Knowledge Engineering Review

Volume

39

Pages

1 - 29

Publisher

Cambridge University Press (CUP)

Version

  • VoR (Version of Record)

Rights holder

© The Author(s)

Publisher statement

This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.

Acceptance date

2024-02-20

Publication date

2024-03-26

Copyright date

2024

ISSN

0269-8889

eISSN

1469-8005

Language

  • en

Depositor

Dr Syeda Fatima. Deposit date: 21 June 2024

Article number

e2

Usage metrics

    Loughborough Publications

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC