Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Agent-GSPO is a framework that optimizes communication efficiency in multi-agent systems by directly training for token economy using sequence-level RL and the GSPO algorithm. It achieves state-of-the-art performance on reasoning benchmarks with significantly reduced token consumption, fostering strategies like 'strategic silence'.
Enables the development of more economically viable and scalable multi-agent systems by drastically reducing communication costs, crucial for applications like swarm robotics, distributed AI, and complex simulations.