Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The reasoning capabilities of large reasoning models (LRMs), such as OpenAI's
o1 and DeepSeek-R1, have seen substantial advancements through deep thinking.
However, these enhancements come with significant resource demands,
underscoring the need for training effective small reasoning models. A critical
challenge is that small models possess different reasoning capacities and
cognitive trajectories compared with their larger counterparts. Hence, directly
distilling chain-of-thought (CoT) rationales from large LRMs to smaller ones
can sometimes be ineffective and often requires a substantial amount of
annotated data. In this paper, we first introduce a novel
Critique-Rethink-Verify (CRV) system, designed for training smaller yet
powerful LRMs. Our CRV system consists of multiple LLM agents, each
specializing in unique tasks: (i) critiquing the CoT rationales according to
the cognitive capabilities of smaller models, (ii) rethinking and refining
these CoTs based on the critiques, and (iii) verifying the correctness of the
refined results. Building on the CRV system, we further propose the Cognitive
Preference Optimization (CogPO) algorithm to continuously enhance the reasoning
abilities of smaller models by aligning their reasoning processes with their
cognitive capacities. Comprehensive evaluations on challenging reasoning
benchmarks demonstrate the efficacy of our CRV+CogPO framework, which
outperforms other methods by a large margin.
Authors (5)
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
Key Contributions
Introduces a novel Critique-Rethink-Verify (CRV) system to train smaller LLMs with enhanced reasoning abilities, overcoming limitations of direct CoT distillation. The CRV system uses specialized LLM agents for critiquing, rethinking, and verifying CoT rationales, tailored to the cognitive capacities of smaller models, reducing the need for extensive annotated data.
Business Value
Enables the creation of more accessible and cost-effective AI models with strong reasoning capabilities, democratizing advanced AI for a wider range of applications and devices.