Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI Researchers,Machine Learning Engineers,Game Developers,Robotics Engineers 2 weeks ago

MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games

reinforcement-learning › multi-agent
📄 Abstract

Abstract: Developing Large Language Models (LLMs) to cooperate and compete effectively within multi-agent systems is a critical step towards more advanced intelligence. While reinforcement learning (RL) has proven effective for enhancing reasoning in single-agent tasks, its extension to multi-turn, multi-agent scenarios remains underexplored due to the challenges of long-horizon credit assignment and agent-specific advantage estimation. To address these challenges, we introduce MARS, an end-to-end RL framework that incentivizes Multi-Agent Reasoning of LLMs through Self-play in both cooperative and competitive games. MARS features a turn-level advantage estimator that aligns learning signals with each interaction for credit assignment, and an agent-specific advantage normalization to stabilize multi-agent training. By learning with self-play across cooperative and competitive games, the MARS agent trained from Qwen3-4B develops strong strategic abilities that generalize to held-out games with up to 28.7% performance improvements. More importantly, the capability acquired through self-play generalizes beyond games, yielding consistent performance gains of multi-agent systems in reasoning benchmarks. When integrated into leading multi-agent systems, our MARS agent achieves significant performance gains of 10.0% on AIME and 12.5% on GPQA-Diamond. These results establish end-to-end RL training with self-play in strategic games as a powerful approach for developing generalizable multi-agent reasoning capabilities in LLMs. Our code and models are publicly available at https://github.com/thu-nics/MARS.
Authors (13)
Huining Yuan
Zelai Xu
Zheyue Tan
Xiangmin Yi
Mo Guang
Kaiwen Long
+7 more
Submitted
October 17, 2025
arXiv Category
cs.AI
arXiv PDF

Key Contributions

MARS is an end-to-end RL framework that enhances Multi-Agent Reasoning of LLMs through self-play in strategic games. It addresses challenges in MARL with a turn-level advantage estimator for credit assignment and agent-specific advantage normalization for stable training, enabling LLMs to develop strong strategic abilities.

Business Value

Enables the development of more sophisticated multi-agent systems, crucial for applications like autonomous vehicle coordination, complex game AI, and collaborative robotics.