arxiv_cl 92% Match Research Paper AI researchers,ML engineers,Researchers in reasoning and problem-solving 1 week ago

SPICE: Self-Play In Corpus Environments Improves Reasoning

large-language-models › reasoning

📄 Abstract

Abstract: Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model acts in two roles: a Challenger that mines documents from a large corpus to generate diverse reasoning tasks, and a Reasoner that solves them. Through adversarial dynamics, the Challenger creates an automatic curriculum at the frontier of the Reasoner's capability, while corpus grounding provides the rich, near-inexhaustible external signal necessary for sustained improvement. Unlike existing ungrounded self-play methods that offer more limited benefits, SPICE achieves consistent gains across mathematical (+8.9%) and general reasoning (+9.8%) benchmarks on multiple model families. Our analysis reveals how document grounding is a key ingredient in SPICE to continuously generate its own increasingly challenging goals and achieve them, enabling sustained self-improvement.

Authors (10)

Bo Liu

Chuanyang Jin

Seungone Kim

Weizhe Yuan

Wenting Zhao

Ilia Kulikov

+4 more

Submitted

October 28, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

SPICE (Self-Play In Corpus Environments) is a novel reinforcement learning framework where an LLM acts as both a Challenger and a Reasoner. The Challenger mines a corpus to generate diverse reasoning tasks, creating an automatic curriculum at the frontier of the Reasoner's capabilities. Corpus grounding provides a rich external signal for sustained improvement, leading to significant gains in mathematical and general reasoning benchmarks.

Business Value

Enhancing LLM reasoning capabilities can lead to more sophisticated AI assistants, improved automated problem-solving tools, and advancements in scientific discovery, impacting various industries that rely on complex analytical tasks.

Paper Metadata

Innovation Type

Novel Training Framework

Deployment Feasibility

Moderate, requires significant computational resources for training and a large corpus.

Limitations Addressed

Existing ungrounded self-play methods that offer limited benefits; the need for rich, external signals for sustained AI self-improvement.

Performance Gains

Achieves consistent gains across mathematical (+8.9%) and general reasoning (+9.8%) benchmarks on multiple model families.

Technical Tags

Self-PlayCorpus EnvironmentsReinforcement Learning (RL)LLM AgentsReasoning TasksAutomatic CurriculumDocument GroundingAdversarial DynamicsMathematical ReasoningGeneral ReasoningSelf-Improvement

Research Topics

Artificial IntelligenceMachine LearningReinforcement LearningLarge Language ModelsReasoningSelf-Improvement

Methods & Architectures

Self-PlayReinforcement LearningCorpus groundingAdversarial training LLM (as Challenger and Reasoner)

Applications & Tasks

AI Research Automated Problem Solving Knowledge Discovery Improving LLM reasoning capabilitiesEnabling sustained self-improvement in AIGenerating challenging reasoning tasks Mathematical reasoningGeneral reasoningTask generation

Datasets & Benchmarks

Datasets

large corpus

Benchmarks

Mathematical reasoning: +8.9% • General reasoning: +9.8%

Performance on reasoning benchmarks

Related Fields

Artificial IntelligenceMachine LearningReinforcement LearningNatural Language ProcessingCognitive Science

Keywords

self-playreinforcement learningLLMreasoningcorpus groundingautomatic curriculumself-improvementadversarial learningmathematical reasoninggeneral reasoningAI agentsnatural language processing

Academic Context

#Artificial Intelligence#Machine Learning#Reinforcement Learning#Large Language Models#Reasoning#Self-Improvement

Commercial Potential

Potential Products

Advanced AI reasoning enginesAutomated scientific discovery platformsSophisticated AI tutors

Target Industries

TechnologyResearch & DevelopmentEducationFinance

Use Case Examples

An AI system that can autonomously solve complex mathematical problems.A research assistant that can generate hypotheses and design experiments.An AI tutor that can adapt its teaching to the student's current understanding level.

Competitive Edge

SPICE leverages corpus grounding in a self-play framework, offering a more effective and scalable approach to improving LLM reasoning compared to ungrounded self-play methods.

Resource Requirements

Compute Needs

High compute requirements for training large models with RL.

Data Requirements

A large, diverse corpus of text documents.

Deployment Constraints

Requires significant computational resources and a large corpus for operation.

Scalability

The framework is designed for sustained self-improvement, suggesting good scalability.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers