Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Adaptive cooperation in multi-agent reinforcement learning (MARL) requires
policies to express homogeneous, specialised, or mixed behaviours, yet
achieving this adaptivity remains a critical challenge. While parameter sharing
(PS) is standard for efficient learning, it notoriously suppresses the
behavioural diversity required for specialisation. This failure is largely due
to cross-agent gradient interference, a problem we find is surprisingly
exacerbated by the common practice of coupling agent IDs with observations.
Existing remedies typically add complexity through altered objectives, manual
preset diversity levels, or sequential updates -- raising a fundamental
question: can shared policies adapt without these intricacies? We propose a
solution built on a key insight: an agent-conditioned hypernetwork can generate
agent-specific parameters and decouple observation- and agent-conditioned
gradients, directly countering the interference from coupling agent IDs with
observations. Our resulting method, HyperMARL, avoids the complexities of prior
work and empirically reduces policy gradient variance. Across diverse MARL
benchmarks (22 scenarios, up to 30 agents), HyperMARL achieves performance
competitive with six key baselines while preserving behavioural diversity
comparable to non-parameter sharing methods, establishing it as a versatile and
principled approach for adaptive MARL. The code is publicly available at
https://github.com/KaleabTessera/HyperMARL.
Authors (4)
Kale-ab Abebe Tessera
Arrasy Rahman
Amos Storkey
Stefano V. Albrecht
Submitted
December 5, 2024
Key Contributions
This paper introduces HyperMARL, an adaptive hypernetwork approach for Multi-Agent RL (MARL) that enables policies to exhibit diverse behaviors (homogeneous, specialized, mixed) without the complexity of altered objectives or sequential updates. It leverages agent-conditioned hypernetworks to generate agent-specific parameters, effectively decoupling gradients and countering interference caused by coupling agent IDs with observations.
Business Value
Enables the development of more sophisticated and adaptable multi-agent systems, crucial for applications like coordinated drone swarms, autonomous vehicle platooning, and complex robotic teams.