Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 92% Match Research Paper AI researchers,ML engineers,Researchers in reasoning and problem-solving 1 week ago

SPICE: Self-Play In Corpus Environments Improves Reasoning

large-language-models › reasoning
📄 Abstract

Abstract: Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single model acts in two roles: a Challenger that mines documents from a large corpus to generate diverse reasoning tasks, and a Reasoner that solves them. Through adversarial dynamics, the Challenger creates an automatic curriculum at the frontier of the Reasoner's capability, while corpus grounding provides the rich, near-inexhaustible external signal necessary for sustained improvement. Unlike existing ungrounded self-play methods that offer more limited benefits, SPICE achieves consistent gains across mathematical (+8.9%) and general reasoning (+9.8%) benchmarks on multiple model families. Our analysis reveals how document grounding is a key ingredient in SPICE to continuously generate its own increasingly challenging goals and achieve them, enabling sustained self-improvement.
Authors (10)
Bo Liu
Chuanyang Jin
Seungone Kim
Weizhe Yuan
Wenting Zhao
Ilia Kulikov
+4 more
Submitted
October 28, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

SPICE (Self-Play In Corpus Environments) is a novel reinforcement learning framework where an LLM acts as both a Challenger and a Reasoner. The Challenger mines a corpus to generate diverse reasoning tasks, creating an automatic curriculum at the frontier of the Reasoner's capabilities. Corpus grounding provides a rich external signal for sustained improvement, leading to significant gains in mathematical and general reasoning benchmarks.

Business Value

Enhancing LLM reasoning capabilities can lead to more sophisticated AI assistants, improved automated problem-solving tools, and advancements in scientific discovery, impacting various industries that rely on complex analytical tasks.