arxiv_ai 95% Match Research Paper AI Researchers,LLM Developers,Educators in AI 1 week ago

SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning

large-language-models › reasoning

📄 Abstract

Abstract: How to design reinforcement learning (RL) tasks that effectively unleash the reasoning capability of large language models (LLMs) remains an open question. Existing RL tasks (e.g., math, programming, and constructing reasoning tasks) suffer from three key limitations: (1) Scalability. They rely heavily on human annotation or expensive LLM synthesis to generate sufficient training data. (2) Verifiability. LLMs' outputs are hard to verify automatically and reliably. (3) Controllable Difficulty. Most tasks lack fine-grained difficulty control, making it hard to train LLMs to develop reasoning ability from easy to hard. To address these limitations, we propose Saturn, a SAT-based RL framework that uses Boolean Satisfiability (SAT) problems to train and evaluate LLMs reasoning. Saturn enables scalable task construction, rule-based verification, and precise difficulty control. Saturn designs a curriculum learning pipeline that continuously improves LLMs' reasoning capability by constructing SAT tasks of increasing difficulty and training LLMs from easy to hard. To ensure stable training, we design a principled mechanism to control difficulty transitions. We introduce Saturn-2.6k, a dataset of 2,660 SAT problems with varying difficulty. It supports the evaluation of how LLM reasoning changes with problem difficulty. We apply Saturn to DeepSeek-R1-Distill-Qwen and obtain Saturn-1.5B and Saturn-7B. We achieve several notable results: (1) On SAT problems, Saturn-1.5B and Saturn-7B achieve average pass@3 improvements of +14.0 and +28.1, respectively. (2) On math and programming tasks, Saturn-1.5B and Saturn-7B improve average scores by +4.9 and +1.8 on benchmarks (e.g., AIME, LiveCodeBench). (3) Compared to the state-of-the-art (SOTA) approach in constructing RL tasks, Saturn achieves further improvements of +8.8%. We release the source code, data, and models to support future research.

Authors (6)

Huanyu Liu

Jia Li

Hao Zhu

Kechi Zhang

Yihong Dong

Ge Li

Submitted

May 22, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Proposes SATURN, a SAT-based RL framework to address limitations in training LLMs for reasoning. SATURN enables scalable task construction, rule-based verification, and precise difficulty control, which are crucial for developing robust reasoning abilities in LLMs.

Business Value

Improves the efficiency and reliability of training LLMs for complex reasoning tasks, potentially leading to more capable AI assistants and tools for education and research.

Paper Metadata

Innovation Type

Novel Framework

Deployment Feasibility

High, as it focuses on improving the training methodology for existing LLMs rather than requiring new hardware.

Limitations Addressed

Scalability of training data generation,Verifiability of LLM outputs,Controllable difficulty in training tasks

Technical Tags

reinforcement learningboolean satisfiabilitylanguage modelscurriculum learningreasoningscalabilityverifiabilitydifficulty controlLLM synthesishuman annotation

Research Topics

Language Model ReasoningReinforcement Learning for LLMsAutomated Task GenerationAI Education and TrainingScalable AI Evaluation

Methods & Architectures

SAT-based RL frameworkCurriculum learning pipelineBoolean Satisfiability (SAT) problems Large Language Models (LLMs)

Applications & Tasks

AI Education LLM Development Unleashing LLM ReasoningScalable Data GenerationVerifiable LLM OutputsControllable Difficulty in Training Reasoning capability enhancementLLM trainingLLM evaluation

Related Fields

Artificial IntelligenceMachine LearningNatural Language ProcessingFormal MethodsComputer Science Education

Keywords

SATReinforcement LearningLarge Language ModelsReasoningCurriculum LearningScalabilityVerifiabilityDifficulty ControlBoolean SatisfiabilityLLM TrainingAI EvaluationAutomated Reasoning

Academic Context

#Language Model Reasoning#Reinforcement Learning for LLMs#Automated Task Generation#AI Education and Training#Scalable AI Evaluation

Technology Stack

Frameworks & Libraries

SAT solver

Commercial Potential

Potential Products

AI tutorsAdvanced reasoning enginesAutomated problem solvers

Target Industries

EducationTechnologyResearch

Use Case Examples

Training LLMs to solve complex math problemsDeveloping LLMs that can perform logical deductionsCreating verifiable AI reasoning systems

Competitive Edge

Offers a more scalable, verifiable, and controllable approach to LLM reasoning training compared to existing methods that rely on human annotation or expensive LLM synthesis.

Market Opportunity

Growing market for LLM development tools and platforms.

Revenue Models

Licensing of training methodologiesspecialized LLM development services.

Resource Requirements

Compute Needs

Moderate to High (for LLM training)

Data Requirements

Synthetically generated SAT problems

Scalability

Designed for scalability through automated task construction.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years

Patent Potential

Low

View Full Paper Back to Papers