arxiv_cl 92% Match Research Paper AI Researchers,ML Engineers,Developers of smaller AI models 1 day ago

Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment

large-language-models › training-methods

📄 Abstract

Abstract: The reasoning capabilities of large reasoning models (LRMs), such as OpenAI's o1 and DeepSeek-R1, have seen substantial advancements through deep thinking. However, these enhancements come with significant resource demands, underscoring the need for training effective small reasoning models. A critical challenge is that small models possess different reasoning capacities and cognitive trajectories compared with their larger counterparts. Hence, directly distilling chain-of-thought (CoT) rationales from large LRMs to smaller ones can sometimes be ineffective and often requires a substantial amount of annotated data. In this paper, we first introduce a novel Critique-Rethink-Verify (CRV) system, designed for training smaller yet powerful LRMs. Our CRV system consists of multiple LLM agents, each specializing in unique tasks: (i) critiquing the CoT rationales according to the cognitive capabilities of smaller models, (ii) rethinking and refining these CoTs based on the critiques, and (iii) verifying the correctness of the refined results. Building on the CRV system, we further propose the Cognitive Preference Optimization (CogPO) algorithm to continuously enhance the reasoning abilities of smaller models by aligning their reasoning processes with their cognitive capacities. Comprehensive evaluations on challenging reasoning benchmarks demonstrate the efficacy of our CRV+CogPO framework, which outperforms other methods by a large margin.

Authors (5)

Wenrui Cai

Chengyu Wang

Junbing Yan

Jun Huang

Xiangzhong Fang

Submitted

April 14, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Introduces a novel Critique-Rethink-Verify (CRV) system to train smaller LLMs with enhanced reasoning abilities, overcoming limitations of direct CoT distillation. The CRV system uses specialized LLM agents for critiquing, rethinking, and verifying CoT rationales, tailored to the cognitive capacities of smaller models, reducing the need for extensive annotated data.

Business Value

Enables the creation of more accessible and cost-effective AI models with strong reasoning capabilities, democratizing advanced AI for a wider range of applications and devices.

Paper Metadata

Innovation Type

Novel Training System

Deployment Feasibility

High, as it focuses on creating smaller, more deployable models.

Limitations Addressed

High resource demands of large reasoning models, ineffectiveness of direct CoT distillation to smaller models, and the need for substantial annotated data for training smaller reasoning models.

Technical Tags

small LLMsreasoning abilitiescognitive alignmentdistillationchain-of-thoughtCRV systemLLM agentscritique-rethink-verifyannotated dataresource demands

Research Topics

Efficient LLM TrainingKnowledge DistillationReasoning in Small ModelsAI Agent SystemsCognitive Modeling

Methods & Architectures

Critique-Rethink-Verify (CRV) systemDistillationLLM Agent Collaboration Small LLMLarge Reasoning Model (LRM)

Applications & Tasks

Natural Language Processing Artificial Intelligence Research Machine Learning Education Enhancing Reasoning in Small LLMsReducing Resource Demands for LLMsImproving Distillation Effectiveness ReasoningChain-of-Thought GenerationKnowledge Distillation

Related Fields

Machine LearningDeep LearningNatural Language ProcessingArtificial Intelligence

Keywords

small LLMreasoningcognitive alignmentdistillationchain-of-thoughtCRVLLM agentsknowledge transferefficiencyresource reductionartificial intelligencenatural language processingdeep learningmodel trainingcapabilities

Academic Context

#Efficient LLM Training#Knowledge Distillation#Reasoning in Small Models#AI Agent Systems#Cognitive Modeling

Companies & Organizations

Companies Mentioned

OpenAI DeepSeek

Commercial Potential

Potential Products

Efficient reasoning engines for edge devicesCost-effective AI assistantsSpecialized AI models for specific reasoning tasks

Target Industries

TechnologySoftwareMobileEdge Computing

Use Case Examples

Developing AI assistants for mobile devices with strong reasoning.Creating cost-effective AI solutions for businesses with limited compute resources.Training smaller models for specific reasoning tasks like math or logic problems.

Competitive Edge

Offers a more effective alternative to direct CoT distillation for training small reasoning models, particularly when annotated data is scarce.

Resource Requirements

Scalability

Focuses on creating smaller, more scalable models.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers