arxiv_cl 97% Match Research Paper LLM researchers,AI researchers,Machine learning engineers,NLP practitioners 6 days ago

Scaling Latent Reasoning via Looped Language Models

large-language-models › reasoning

📄 Abstract

Abstract: Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT. We hope our results show the potential of LoopLM as a novel scaling direction in the reasoning era. Our model could be found in: http://ouro-llm.github.io.

Authors (33)

Rui-Jie Zhu

Zixuan Wang

Kai Hua

Tianyu Zhang

Ziniu Li

Haoran Que

+27 more

Submitted

October 29, 2025

arXiv Category

cs.CL

arXiv PDF Code

Key Contributions

This paper introduces Ouro, a family of pre-trained Looped Language Models (LoopLM) that integrate reasoning into the pre-training phase via iterative latent space computation and an entropy-regularized objective for depth allocation. Ouro models (1.4B, 2.6B) achieve performance matching larger SOTA LLMs across benchmarks, demonstrating superior knowledge manipulation capabilities and more aligned reasoning traces compared to explicit CoT.

Business Value

Offers a more efficient and potentially more powerful way to build LLMs with strong reasoning abilities, which can lead to more capable AI assistants, better problem-solving tools, and more reliable information processing.

Paper Metadata

Innovation Type

Model Architecture/Training Method

Deployment Feasibility

Moderate, as it requires specialized training infrastructure, but smaller models show competitive performance.

Limitations Addressed

Reasoning being a post-training task (e.g., CoT),Inefficient use of pre-training data for reasoning,Limitations of explicit reasoning traces

Performance Gains

Ouro 1.4B/2.6B match up to 12B SOTA LLMs

View Code on GitHub

Technical Tags

Looped Language Models (LoopLM)Latent ReasoningPre-trainingEntropy RegularizationDepth AllocationChain-of-Thought (CoT)Knowledge ManipulationOuroborosOuro

Research Topics

Integrating reasoning into LLM pre-trainingAlternative scaling directions for LLMsImproving knowledge manipulation capabilitiesComparing latent vs. explicit reasoning

Methods & Architectures

Model architecture design (LoopLM)Pre-training methodologyScaling experimentsControlled experiments Looped Language Models (LoopLM)Ouro (1.4B, 2.6B)

Applications & Tasks

Natural Language Processing Artificial Intelligence Research Reasoning deferred to post-training (e.g., CoT)Under-leveraging pre-training data for reasoningLimitations of explicit text generation for reasoning Improving LLM reasoning capabilitiesScaling LLMs with integrated reasoningEnhancing knowledge manipulation

Datasets & Benchmarks

Benchmarks

Wide range of benchmarks

Related Fields

Large Language ModelsArtificial IntelligenceMachine LearningNatural Language ProcessingDeep Learning

Keywords

LLMReasoningLoopLMPre-trainingLatent SpaceKnowledge ManipulationChain-of-ThoughtScalingOuroAI ArchitectureDeep Learning

Academic Context

#Integrating reasoning into LLM pre-training#Alternative scaling directions for LLMs#Improving knowledge manipulation capabilities#Comparing latent vs. explicit reasoning

Companies & Organizations

Companies Mentioned

OpenAI

Commercial Potential

Potential Products

More capable AI assistantsAdvanced reasoning enginesFoundation models with inherent reasoning

Target Industries

TechnologyAI ResearchSoftware Development

Use Case Examples

Solving complex logic puzzlesGenerating more coherent and reasoned textImproving performance on complex reasoning benchmarks

Competitive Edge

Proposes a novel pre-training approach (LoopLM) that integrates reasoning directly, offering an alternative scaling direction to current methods that rely heavily on post-training techniques like CoT.

Market Opportunity

Rapidly growing market for advanced LLMs.

Revenue Models

N/A

Resource Requirements

Compute Needs

Very High (for pre-training 7.7T tokens)

Data Requirements

Massive text corpus (7.7T tokens)

Deployment Constraints

Computational resources for inference, especially for larger models.

Scalability

The LoopLM architecture is designed for scaling, with experiments up to 7.7T tokens.

Regulatory Considerations

N/A

Production Readiness

Maturity Level

Research

Time to Market

N/A

Licensing

Open source (code available)

Patent Potential

Moderate (for novel architecture/training method)

View Full Paper Back to Papers