Knowledge decomposition towards generative multi-agent learning
Humans are, in essence, multitudes of agents navigating a complex social environment requiring interaction and coordination with other humans to achieve our goals. It is this inherent social characteristic that drives human development and the success of our species: our ability to cooperate, communicate, and build integrated groups allows us to solve problems and embrace opportunities that no one individual could do alone. For instance, labor partitioning and the specialization enabled by human social arrangements have been of pivotal importance for promoting the development of sophisticated technologies and cultural achievements. In many ways, the challenges and opportunities presented by multi-agent systems in the natural world share commonalities with those studied in the field of artificial intelligence. Recognizing such similarities, multi-agent reinforcement learning (MARL) has emerged as a paradigm for crafting intelligent agents capable of performing well in complex, interactive environments.
While real-world multi-agent systems are inherently generative, allowing agents to interact, learn, and adapt within dynamic environments, replicating our complex social structures as humans and applying prior knowledge to new situations without extensive training remains a significant challenge in MARL. For example, in real-world scenarios, humans are generative in nature, having the ability to form new ideas, such as creating new technologies by combining existing knowledge with novel insights. In a similar manner, multi-agent systems should be capable of drawing upon their accumulated knowledge, reasoning abilities, and creative capacities to devise efficient, near-optimal solutions. Embedding this human-inspired near-optimal decision-making generation capability, MARL algorithms could be better equipped to handle the constraints and challenges of real-world multi-agent systems in contrast to requiring agents to learn effective coordination strategies from scratch through repeated interactions. Throughout this thesis, we present a principled framework of Generative Policy Networks (GPN), instilling agents with the knowledge and decision-making capabilities needed to thrive in complex environments under both the single and multi-agent paradigms. The approach presented fills the significant gap where current research lacks particularly in the sample efficiency, stability, and overall performance of MARL systems deployed in real-world applications where ability to quickly adapt and coordinate is crucial.
Transferring knowledge among tasks is inherently another fundamental challenge in MARL. This is due to the general complexity of interactions, the non-stationarity of the learning environment, and the curse of dimensionality in high-dimensional state and action spaces. Furthermore, heterogeneity of tasks, partial observation in real-world scenarios, and emergent complex behaviors give rise to complications in the transfer process. Moreover, scalability issues arise with more agents in a given setting, and generalization of learned strategies of diverse multi-agent tasks and their environments becomes difficult. All of these challenges together require sophisticated approaches to effectively capture and transfer the learned information; thus adaptation of knowledge gained in one context of MARL to another is rather difficult. The comprehensive knowledge of the impact of each component within the general training objective of MARL algorithms is crucial for facilitating the development of algorithms capable of effective knowledge transfer. These components include (1) state representation--where we examine how different encoding methods affect knowledge transfer, (2) action space--investigating how discrete representations influence transferability and (3) the learning algorithm-concentrating on the problem of centralized training with decentralized execution (CTDE). To tackle the major issue of knowledge transfer, the presented work analyzes the decomposition of each component that make up the training process of MARL. Building upon these foundational components, the analysis of task structure and its crucial role in knowledge transfer is further explored. Task structure refers to the underlying patterns, hierarchies, and relationships present in different multi-agent scenarios. To this end, we propose Dynamic Intra-Option Mixtures (DIOMIX), a task decomposition technique which breaks down complex multi-agent tasks into simpler, more manageable sub-tasks. Through the examination of task structure, we uncover generalizable principles that can guide the development of more robust and adaptable MARL systems, capable of efficiently transferring knowledge across a diverse range of multi-agent scenarios.
The presented work tackles two major difficulties associated with MARL: the challenge of enabling agents to exhibit human-like generative decision-making capabilities; and the difficulty of transferring knowledge between tasks in complex, dynamic environments. These systems not only learn from their environments but also leverage generative reasoning to rapidly devise near-optimal solutions, reflecting a key aspect of human cognition. This thesis presents comprehensive novel techniques to overcoming the inherent challenges of multi-agent systems, aiming to push the boundaries of MARL’s practical applications in real-world, dynamic environments.
History
School
- Loughborough University, London
Publisher
Loughborough UniversityRights holder
© Corentin ArtaudPublication date
2025Notes
A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.Language
- en
Supervisor(s)
Varuna De-Silva ; Xiyu ShiQualification name
- PhD
Qualification level
- Doctoral
This submission includes a signed certificate in addition to the thesis file(s)
- I have submitted a signed certificate