arxiv_ai 95% Match Research Paper LLM researchers,AI/ML engineers,Reinforcement learning practitioners,AI safety researchers 1 week ago

Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math

large-language-models › training-methods

📄 Abstract

Abstract: Reinforcement learning (RL) can elicit strong reasoning in large language models (LLMs), yet most open efforts focus on math and code. We propose Reasoning Curriculum, a simple two-stage curriculum that first elicits reasoning skills in pretraining-aligned domains such as math, then adapts and refines these skills across other domains via joint RL. Stage 1 performs a brief cold start and then math-only RL with verifiable rewards to develop reasoning skills. Stage 2 runs joint RL on mixed-domain data to transfer and consolidate these skills. The curriculum is minimal and backbone-agnostic, requiring no specialized reward models beyond standard verifiability checks. Evaluated on Qwen3-4B and Llama-3.1-8B over a multi-domain suite, reasoning curriculum yields consistent gains. Ablations and a cognitive-skill analysis indicate that both stages are necessary and that math-first elicitation increases cognitive behaviors important for solving complex problems. Reasoning Curriculum provides a compact, easy-to-adopt recipe for general reasoning.

Authors (5)

Bo Pang

Deqian Kong

Silvio Savarese

Caiming Xiong

Yingbo Zhou

Submitted

October 30, 2025

arXiv Category

cs.AI

arXiv PDF

Key Contributions

This paper proposes 'Reasoning Curriculum,' a two-stage RL curriculum that first elicits reasoning skills in math using verifiable rewards, then transfers and refines these skills across other domains. This minimal, backbone-agnostic approach shows consistent gains and improves cognitive behaviors crucial for complex problem-solving.

Business Value

Enables the development of more capable and versatile LLMs that can tackle a wider range of complex reasoning tasks, improving AI applications in various fields.

Paper Metadata

Innovation Type

Training Methodology

Deployment Feasibility

Feasible for LLM developers; requires RL expertise and computational resources for training.

Limitations Addressed

The difficulty of eliciting broad reasoning skills in LLMs, especially beyond math and code, and the challenge of transferring these skills effectively to diverse domains.

Performance Gains

Yields consistent gains across a multi-domain suite; ablations show both stages are necessary, and math-first elicitation increases important cognitive behaviors.

Technical Tags

reinforcement learningreasoning skillscurriculum learningdomain adaptationmath reasoningLLM trainingverifiable rewardsmulti-domain learningcognitive skillstransfer learning

Research Topics

LLM ReasoningReinforcement LearningCurriculum LearningDomain AdaptationAI Training Methodologies

Methods & Architectures

Reinforcement Learning (RL)Curriculum Learning (two-stage)Joint RLVerifiable reward functions Qwen3-4BLlama-3.1-8B

Applications & Tasks

General AI Reasoning Education Scientific Discovery Eliciting Reasoning Skills in LLMsTransferring Reasoning Skills Across DomainsImproving LLM Performance on Complex Tasks Developing broad reasoning capabilities in LLMsAdapting reasoning skills from math to other domainsImproving LLM performance on multi-domain tasks

Datasets & Benchmarks

Benchmarks

Multi-domain suite: Evaluated on Qwen3-4B and Llama-3.1-8B.

Reasoning performanceCognitive skill analysisPerformance gainsAccuracy

Related Fields

Reinforcement LearningMachine LearningArtificial IntelligenceNatural Language ProcessingCognitive Science

Keywords

Reasoning CurriculumReinforcement LearningLLMMath ReasoningDomain AdaptationTransfer LearningVerifiable RewardsCognitive SkillsQwen3-4BLlama-3.1-8BAI TrainingCurriculum Learning

Academic Context

#LLM Reasoning#Reinforcement Learning#Curriculum Learning#Domain Adaptation#AI Training Methodologies

Commercial Potential

Potential Products

More capable general-purpose LLMsAI tutors for complex subjectsAI agents for scientific research

Target Industries

TechnologyEducationResearchSoftware Development

Use Case Examples

Training LLMs to solve complex multi-step reasoning problems.Developing AI assistants that can reason across different knowledge domains.Enhancing the problem-solving abilities of foundation models.

Competitive Edge

Offers a novel and effective curriculum learning strategy for enhancing LLM reasoning capabilities, particularly for transfer across domains.

Market Opportunity

Growing demand for LLMs with advanced reasoning capabilities.

Revenue Models

Licensing of trained modelsAI development services.

Resource Requirements

Compute Needs

Significant compute resources required for RL training of large language models.

Data Requirements

Requires diverse datasets covering math and other target domains, along with mechanisms for verifiable rewards.

Deployment Constraints

Requires expertise in RL and careful curriculum design; training can be computationally intensive.

Scalability

The curriculum approach is designed to be backbone-agnostic and potentially scalable to larger models and more domains.

Production Readiness

Maturity Level

Research/Development

Time to Market

Medium-term (for widespread adoption)

Patent Potential

Moderate (for novel training techniques)

View Full Paper Back to Papers