Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 92% Match Research Paper AI Researchers,ML Engineers,Developers of smaller AI models 1 day ago

Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment

large-language-models › training-methods
📄 Abstract

Abstract: The reasoning capabilities of large reasoning models (LRMs), such as OpenAI's o1 and DeepSeek-R1, have seen substantial advancements through deep thinking. However, these enhancements come with significant resource demands, underscoring the need for training effective small reasoning models. A critical challenge is that small models possess different reasoning capacities and cognitive trajectories compared with their larger counterparts. Hence, directly distilling chain-of-thought (CoT) rationales from large LRMs to smaller ones can sometimes be ineffective and often requires a substantial amount of annotated data. In this paper, we first introduce a novel Critique-Rethink-Verify (CRV) system, designed for training smaller yet powerful LRMs. Our CRV system consists of multiple LLM agents, each specializing in unique tasks: (i) critiquing the CoT rationales according to the cognitive capabilities of smaller models, (ii) rethinking and refining these CoTs based on the critiques, and (iii) verifying the correctness of the refined results. Building on the CRV system, we further propose the Cognitive Preference Optimization (CogPO) algorithm to continuously enhance the reasoning abilities of smaller models by aligning their reasoning processes with their cognitive capacities. Comprehensive evaluations on challenging reasoning benchmarks demonstrate the efficacy of our CRV+CogPO framework, which outperforms other methods by a large margin.
Authors (5)
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
Submitted
April 14, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces a novel Critique-Rethink-Verify (CRV) system to train smaller LLMs with enhanced reasoning abilities, overcoming limitations of direct CoT distillation. The CRV system uses specialized LLM agents for critiquing, rethinking, and verifying CoT rationales, tailored to the cognitive capacities of smaller models, reducing the need for extensive annotated data.

Business Value

Enables the creation of more accessible and cost-effective AI models with strong reasoning capabilities, democratizing advanced AI for a wider range of applications and devices.