arxiv_ml 95% Match Research Paper RL Researchers,MARL Researchers,Robotics Engineers,AI Game Developers 1 week ago

Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL

reinforcement-learning › offline-rl

📄 Abstract

Abstract: A key challenge in offline multi-agent reinforcement learning (MARL) is achieving effective many-agent multi-step coordination in complex environments. In this work, we propose Oryx, a novel algorithm for offline cooperative MARL to directly address this challenge. Oryx adapts the recently proposed retention-based architecture Sable and combines it with a sequential form of implicit constraint Q-learning (ICQ), to develop a novel offline autoregressive policy update scheme. This allows Oryx to solve complex coordination challenges while maintaining temporal coherence over long trajectories. We evaluate Oryx across a diverse set of benchmarks from prior works -- SMAC, RWARE, and Multi-Agent MuJoCo -- covering tasks of both discrete and continuous control, varying in scale and difficulty. Oryx achieves state-of-the-art performance on more than 80% of the 65 tested datasets, outperforming prior offline MARL methods and demonstrating robust generalisation across domains with many agents and long horizons. Finally, we introduce new datasets to push the limits of many-agent coordination in offline MARL, and demonstrate Oryx's superior ability to scale effectively in such settings.

Authors (13)

Claude Formanek

Omayma Mahjoub

Louay Ben Nessir

Sasha Abramowitz

Ruan de Kock

Wiem Khlifi

+7 more

Submitted

May 28, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Proposes Oryx, a novel algorithm for offline cooperative MARL that combines a retention-based architecture with sequential implicit constraint Q-learning. This enables effective many-agent coordination and maintains temporal coherence over long trajectories.

Business Value

Enables more sophisticated and coordinated behavior in multi-agent systems, crucial for applications like swarm robotics, autonomous logistics, and complex simulations.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

Moderate, as offline MARL requires high-quality offline data, but the algorithm itself is designed for effectiveness.

Limitations Addressed

Difficulty in achieving effective many-agent coordination in offline MARL and maintaining temporal coherence over long trajectories.

Performance Gains

Achieves state-of-the-art performance on over 80% of 65 tested datasets, outperforming prior offline MARL methods.

Technical Tags

offline MARLmany-agent coordinationsequential constraint Q-learningautoregressive policy updatetemporal coherencediscrete controlcontinuous controlSMACRWAREMulti-Agent MuJoCoOryx

Research Topics

Multi-Agent Reinforcement LearningOffline Reinforcement LearningCoordinationSequential Decision MakingDeep Learning Architectures

Methods & Architectures

Retention-based architecture (Sable)Implicit Constraint Q-learning (ICQ)Offline autoregressive policy update OryxSable architecture

Applications & Tasks

Robotics Autonomous Systems Game AI Simulation Achieving effective many-agent coordination in offline MARLMaintaining temporal coherence over long trajectoriesSolving complex coordination challenges Cooperative multi-agent controlRobotic team coordinationStrategy games with multiple agents

Datasets & Benchmarks

Datasets

SMAC, RWARE, Multi-Agent MuJoCo

Related Fields

Reinforcement LearningMulti-Agent SystemsRoboticsArtificial IntelligenceGame Theory

Keywords

offline MARLmulti-agent coordinationreinforcement learningsequential decision makingtemporal coherenceautoregressiveQ-learningSMACRWAREOryxmany-agent

Academic Context

#Multi-Agent Reinforcement Learning#Offline Reinforcement Learning#Coordination#Sequential Decision Making#Deep Learning Architectures

Commercial Potential

Potential Products

Coordinated drone swarms for surveillance or deliveryRobotic systems for complex assembly tasksAI opponents in strategy games

Target Industries

RoboticsLogisticsGamingDefenseSimulation

Use Case Examples

Training a team of robots to collaboratively move a large objectDeveloping AI agents that can coordinate complex strategies in a real-time strategy gameOptimizing traffic flow in a simulated city with many autonomous vehicles

Competitive Edge

Oryx offers a novel approach to offline MARL by combining architectural innovations with a specific Q-learning variant, aiming to achieve state-of-the-art coordination performance across diverse benchmarks.

Resource Requirements

Compute Needs

Likely significant, especially for training on complex MARL benchmarks.

Data Requirements

Requires high-quality offline datasets from multi-agent interactions.

Deployment Constraints

Availability of sufficient and diverse offline data, complexity of multi-agent coordination.

Scalability

Evaluated across benchmarks of varying scale and difficulty, suggesting some scalability.

View Full Paper Back to Papers