Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Self-improving systems require environmental interaction for continuous
adaptation. We introduce SPICE (Self-Play In Corpus Environments), a
reinforcement learning framework where a single model acts in two roles: a
Challenger that mines documents from a large corpus to generate diverse
reasoning tasks, and a Reasoner that solves them. Through adversarial dynamics,
the Challenger creates an automatic curriculum at the frontier of the
Reasoner's capability, while corpus grounding provides the rich,
near-inexhaustible external signal necessary for sustained improvement. Unlike
existing ungrounded self-play methods that offer more limited benefits, SPICE
achieves consistent gains across mathematical (+8.9%) and general reasoning
(+9.8%) benchmarks on multiple model families. Our analysis reveals how
document grounding is a key ingredient in SPICE to continuously generate its
own increasingly challenging goals and achieve them, enabling sustained
self-improvement.
Authors (10)
Bo Liu
Chuanyang Jin
Seungone Kim
Weizhe Yuan
Wenting Zhao
Ilia Kulikov
+4 more
Submitted
October 28, 2025
Key Contributions
SPICE (Self-Play In Corpus Environments) is a novel reinforcement learning framework where an LLM acts as both a Challenger and a Reasoner. The Challenger mines a corpus to generate diverse reasoning tasks, creating an automatic curriculum at the frontier of the Reasoner's capabilities. Corpus grounding provides a rich external signal for sustained improvement, leading to significant gains in mathematical and general reasoning benchmarks.
Business Value
Enhancing LLM reasoning capabilities can lead to more sophisticated AI assistants, improved automated problem-solving tools, and advancements in scientific discovery, impacting various industries that rely on complex analytical tasks.