Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI Researchers,ML Engineers,HPC Specialists,Developers of Large-Scale AI Systems 2 weeks ago

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

large-language-models › training-methods
📄 Abstract

Abstract: We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a trillion-parameter scale introduces unprecedented challenges, including train-inference misalignment, inefficiencies in rollout processing, and bottlenecks in the RL system. To address these, we pioneer three interconnected innovations: (1) IcePop stabilizes RL training via token-level discrepancy masking and clipping, resolving instability from training-inference mismatches; (2) C3PO++ improves resource utilization for long rollouts under a token budget by dynamically partitioning them, thereby obtaining high time efficiency; and (3) ASystem, a high-performance RL framework designed to overcome the systemic bottlenecks that impede trillion-parameter model training. Ring-1T delivers breakthrough results across critical benchmarks: 93.4 on AIME-2025, 86.72 on HMMT-2025, 2088 on CodeForces, and 55.94 on ARC-AGI-v1. Notably, it attains a silver medal-level result on the IMO-2025, underscoring its exceptional reasoning capabilities. By releasing the complete 1T parameter MoE model to the community, we provide the research community with direct access to cutting-edge reasoning capabilities. This contribution marks a significant milestone in democratizing large-scale reasoning intelligence and establishes a new baseline for open-source model performance.
Authors (104)
Ling Team
Anqi Shen
Baihui Li
Bin Hu
Bin Jing
Cai Chen
+98 more
Submitted
October 21, 2025
arXiv Category
cs.CL
arXiv PDF Code

Key Contributions

This paper introduces Ring-1T, the first open-source trillion-scale thinking model, and three key innovations: IcePop for stabilizing RL training via token-level discrepancy masking and clipping; C3PO++ for improving resource utilization in long rollouts through dynamic partitioning; and ASystem, a high-performance RL framework to overcome systemic bottlenecks in training trillion-parameter models.

Business Value

Enables the development of significantly more powerful AI models capable of complex reasoning and problem-solving, pushing the boundaries of AI capabilities for various applications.

View Code on GitHub