arxiv_ai 95% Match Research Paper AI Researchers,Machine Learning Engineers,Game Developers,Robotics Engineers 2 weeks ago

MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games

reinforcement-learning › multi-agent

📄 Abstract

Abstract: Developing Large Language Models (LLMs) to cooperate and compete effectively within multi-agent systems is a critical step towards more advanced intelligence. While reinforcement learning (RL) has proven effective for enhancing reasoning in single-agent tasks, its extension to multi-turn, multi-agent scenarios remains underexplored due to the challenges of long-horizon credit assignment and agent-specific advantage estimation. To address these challenges, we introduce MARS, an end-to-end RL framework that incentivizes Multi-Agent Reasoning of LLMs through Self-play in both cooperative and competitive games. MARS features a turn-level advantage estimator that aligns learning signals with each interaction for credit assignment, and an agent-specific advantage normalization to stabilize multi-agent training. By learning with self-play across cooperative and competitive games, the MARS agent trained from Qwen3-4B develops strong strategic abilities that generalize to held-out games with up to 28.7% performance improvements. More importantly, the capability acquired through self-play generalizes beyond games, yielding consistent performance gains of multi-agent systems in reasoning benchmarks. When integrated into leading multi-agent systems, our MARS agent achieves significant performance gains of 10.0% on AIME and 12.5% on GPQA-Diamond. These results establish end-to-end RL training with self-play in strategic games as a powerful approach for developing generalizable multi-agent reasoning capabilities in LLMs. Our code and models are publicly available at https://github.com/thu-nics/MARS.

Authors (13)

Huining Yuan

Zelai Xu

Zheyue Tan

Xiangmin Yi

Mo Guang

Kaiwen Long

+7 more

Submitted

October 17, 2025

arXiv Category

cs.AI

arXiv PDF

Key Contributions

MARS is an end-to-end RL framework that enhances Multi-Agent Reasoning of LLMs through self-play in strategic games. It addresses challenges in MARL with a turn-level advantage estimator for credit assignment and agent-specific advantage normalization for stable training, enabling LLMs to develop strong strategic abilities.

Business Value

Enables the development of more sophisticated multi-agent systems, crucial for applications like autonomous vehicle coordination, complex game AI, and collaborative robotics.

Paper Metadata

Innovation Type

Framework/Methodological

Deployment Feasibility

Moderate. Requires significant computational resources for self-play and RL training. Integration with specific LLMs is needed.

Limitations Addressed

Challenges in extending RL to multi-turn, multi-agent scenarios,Long-horizon credit assignment,Agent-specific advantage estimation,Developing cooperative and competitive LLM agents

Performance Gains

Strong strategic abilities that generalize to held-out games

Technical Tags

Multi-Agent Reinforcement Learning (MARL)LLM CooperationLLM CompetitionSelf-PlayStrategic GamesLong-horizon Credit AssignmentAgent-Specific Advantage EstimationTurn-level Advantage EstimatorAgent-Specific Advantage NormalizationMARS Framework

Research Topics

Multi-Agent SystemsReinforcement LearningLLM ReasoningGame TheoryCooperative AI

Methods & Architectures

MARS FrameworkSelf-PlayTurn-level Advantage EstimatorAgent-Specific Advantage Normalization LLM-based AgentsReinforcement Learning Agents

Applications & Tasks

Game Playing Robotics Coordination Multi-agent Simulation Autonomous Systems Challenges in extending RL to multi-agent scenariosLong-horizon credit assignmentAgent-specific advantage estimationDeveloping cooperative and competitive LLM agents Enhancing LLM cooperation and competitionTraining LLMs for strategic gamesImproving multi-agent reasoningDeveloping advanced multi-agent systems

Related Fields

Reinforcement LearningArtificial IntelligenceGame TheoryMulti-Agent SystemsNatural Language Processing

Keywords

multi-agent RLLLMcooperationcompetitionself-playstrategic gamesMARScredit assignmentadvantage estimationreasoningQwen3-4B

Academic Context

#Multi-Agent Systems#Reinforcement Learning#LLM Reasoning#Game Theory#Cooperative AI

Technology Stack

Frameworks & Libraries

RLMARS

Commercial Potential

Potential Products

Advanced Game AIMulti-Agent Coordination SystemsSimulated Training Environments for Agents

Target Industries

GamingRoboticsAutonomous SystemsDefense

Use Case Examples

Developing AI opponents for complex strategy gamesCoordinating fleets of autonomous dronesSimulating team dynamics for training purposes

Competitive Edge

Addresses specific challenges in MARL for LLMs, offering a more stable and effective training methodology compared to naive extensions of single-agent RL.

Market Opportunity

Large and growing, driven by advancements in AI and multi-agent systems.

Revenue Models

Licensing of the MARS frameworkdevelopment of specialized multi-agent AI solutions.

Resource Requirements

Compute Needs

Very high, due to self-play and extensive RL training.

Data Requirements

Environments for strategic games (cooperative and competitive).

Deployment Constraints

Requires significant computational resources for training. Generalization to unseen games or complex real-world scenarios is a challenge.

Scalability

Scalability is a major challenge in MARL due to the exponential growth of state-action spaces. The proposed methods aim to improve training stability.

Production Readiness

Maturity Level

Research/Development

Time to Market

2-4 years

Patent Potential

High, for the MARS framework, turn-level advantage estimator, and normalization techniques.

View Full Paper Back to Papers