Approaching realism in social dilemmas: moody reinforcement learners can conditionally replicate human models
More obscure components of affect - such as mood - are frequently undervalued in artificial agent modelling, despite demonstrative evidence that they may have an influence on human social processes. Artificial mood in particular has been shown to increase proportions of prosocial behaviours when combined with typically self-focused machine learning algorithms; though, the research at this stage is fairly preliminary and has much scope for expansion. Social dilemmas provide a simplified and succinct domain in which to test such mechanisms whilst adding novel experimental elements from human research. Utilising the Iterated Prisoner’s Dilemma, we evaluate a promising existing model of moody reinforcement learning technique across a broad range of network structures and environmental manipulations. With the end goal of identifying flaws in the model and suggesting improvements, we first review what it means to model such human structures, what psychological research tells us about such structures to begin with, and alternative existing models to the one deployed here. Through these, we design a formal framework of critique and analysis to thoroughly review the existing state of the algorithm. Then, we present three clusters of quantitative experiments across two custom-designed multi-agent network simulations - static and dynamic - with a number of additional experimental factors taken from prior agent work and existing human psychological research. These factors include manipulation of interaction structure, the payoff matrix used, proportions of game-playing strategies present, the ability to reject game partners, the method by which we evaluate game partners for this rejection, and variables controlling the restructuring of the network itself. Overall, we find that all but one of these factors provides methods by which we can further enhance the algorithm’s naturally cooperative nature. In particular, the most elucidating aspects in regards to modelling human behavioural trends are the algorithm’s natural reactivity to summary variables of the payoff matrix structure - the Cooperation Index - and certain dynamic network restructuring factors, both of which are completely novel to literature on this algorithm. The latter of these two is only facilitated through one particular partner-rejection evaluation strategy, despite its simple structure, and incentivises scope for further research. Lastly, we provide in-depth analysis of the algorithm’s strengths and flaws, with clear outlines for judging its successes and recommendations for its next stages of development. Combining these, we hope to add to the growth of this particular computational affect model in new ways.
- Computer Science
Rights holder© Grace Feehan
NotesA doctoral thesis submitted in partial fulfilment of the requirements for the award of doctor of philosophy of Loughborough University
Loughborough Email firstname.lastname@example.org
This submission includes a signed certificate in addition to the thesis file(s)
- I have submitted a signed certificate