Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper RL Researchers,MAS Developers,Robotics Engineers,AI System Architects 1 week ago

Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

reinforcement-learning › multi-agent
📄 Abstract

Abstract: To combat the prohibitive communication costs of ``free-for-all" multi-agent systems (MAS), we introduce \textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO) algorithm to train agents on a communication-aware reward that explicitly penalizes verbosity. Across seven reasoning benchmarks, Agent-GSPO not only achieves new state-of-the-art performance but does so with a fraction of the token consumption of existing methods. By fostering emergent strategies like ``strategic silence," our approach provides a practical blueprint for developing scalable and economically viable multi-agent systems.
Authors (4)
Yijia Fan
Jusheng Zhang
Jing Yang
Keze Wang
Submitted
October 26, 2025
arXiv Category
cs.MA
arXiv PDF

Key Contributions

Agent-GSPO is a framework that optimizes communication efficiency in multi-agent systems by directly training for token economy using sequence-level RL and the GSPO algorithm. It achieves state-of-the-art performance on reasoning benchmarks with significantly reduced token consumption, fostering strategies like 'strategic silence'.

Business Value

Enables the development of more economically viable and scalable multi-agent systems by drastically reducing communication costs, crucial for applications like swarm robotics, distributed AI, and complex simulations.