arxiv_cl 97% Match Research Paper AI researchers,ML engineers,Educators,Mathematicians 1 week ago

Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning

large-language-models › reasoning

📄 Abstract

Abstract: Natural language chain-of-thought (N-CoT) and Program chain-of-thought (P-CoT) have emerged as two primary paradigms for large language models (LLMs) to solve mathematical reasoning problems. Current research typically endeavors to achieve unidirectional enhancement: P-CoT enhanced N-CoT or N-CoT enhanced P-CoT. In this paper, we seek to fully unleash the two paradigms' strengths for mutual enhancement and ultimately achieve simultaneous improvements. We conduct a detailed analysis of the error types across two paradigms, based on which we propose Parrot, a novel training pipeline for mathematical problems: 1) Three target-designed subtasks integrate sequential P-CoT and N-CoT generation. 2) A subtask hybrid training strategy to facilitate natural language semantic transferability. 3) The converted N-CoT auxiliary reward is designed to alleviate the sparse rewards in P-CoT optimization. Extensive experiments demonstrate that Parrot significantly enhances both the performance of N-CoT and P-CoT, especially on N-CoT. Using Parrot SFT, the N-CoT performance of LLaMA2 and CodeLLaMA achieve gains of +21.87 and +21.48 on MathQA over the RL baseline, which is resource-intensive.

Authors (12)

Senjie Jin

Lu Chen

Zhiheng Xi

Yuhui Wang

Sirui Song

Yuhao Zhou

+6 more

Submitted

October 29, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This paper proposes Parrot, a novel training pipeline designed to achieve mutual enhancement between Natural Language Chain-of-Thought (N-CoT) and Program Chain-of-Thought (P-CoT) for LLMs solving mathematical reasoning problems. It introduces three subtasks for sequential generation, a hybrid training strategy for semantic transferability, and an N-CoT auxiliary reward to address sparse rewards in P-CoT optimization, leading to significant simultaneous improvements in both paradigms.

Business Value

Enhances the reliability and accuracy of AI systems in solving complex mathematical problems, which has applications in automated tutoring, scientific research, and financial modeling.

Paper Metadata

Innovation Type

Training Pipeline/Methodology

Deployment Feasibility

Feasible, as it focuses on improving the training methodology of existing LLMs, rather than requiring entirely new architectures.

Limitations Addressed

Unidirectional enhancement of P-CoT and N-CoT,Sparse rewards in P-CoT optimization,Difficulty in achieving simultaneous improvements from both paradigms

Performance Gains

significantly enhances both the performance

Technical Tags

chain-of-thought (CoT)program synthesismathematical reasoningLLMstraining pipelinemutual enhancementauxiliary rewarderror analysis

Research Topics

Reasoning in AILarge Language ModelsMachine Learning TrainingMathematical AIAI Explainability

Methods & Architectures

Sequential P-CoT and N-CoT generationSubtask hybrid training strategyN-CoT auxiliary rewardError analysis Large Language Models (LLMs)

Applications & Tasks

Mathematical Problem Solving Education Technology Improving LLM reasoningEnhancing mathematical problem-solvingAddressing limitations of unidirectional CoT enhancement Mathematical ReasoningProblem Solving

Related Fields

Artificial IntelligenceCognitive ScienceSymbolic AI

Keywords

chain-of-thoughtCoTprogram synthesismathematical reasoningLLMtrainingpipelinereasoningAImachine learningN-CoTP-CoTauxiliary reward

Academic Context

#Reasoning in AI#Large Language Models#Machine Learning Training#Mathematical AI#AI Explainability

Commercial Potential

Potential Products

AI-powered math tutorsAutomated theorem proversScientific discovery tools

Target Industries

EducationResearch and DevelopmentFinanceTechnology

Use Case Examples

Solving complex math word problemsGenerating step-by-step solutions for programming challengesAssisting in mathematical research

Competitive Edge

Improves upon existing CoT methods by enabling synergistic enhancement between program-based and natural language-based reasoning, leading to superior performance.

Market Opportunity

Significant market for AI in education and automated problem-solving.

Revenue Models

Licensing of improved modelsspecialized AI solutions for educational platforms.

Resource Requirements

Compute Needs

High, as training LLMs with complex reasoning tasks and multiple CoT paradigms requires significant computational resources.

Data Requirements

Requires datasets of mathematical problems with both natural language explanations and program-based solutions.

Deployment Constraints

The complexity of the training pipeline might make it challenging to implement and fine-tune for specific applications. Performance gains are most pronounced on mathematical reasoning tasks.

Scalability

Scalable in terms of applying the training methodology to different LLMs and mathematical domains, but the training process itself is computationally intensive.

Production Readiness

Maturity Level

Research

Time to Market

Medium to Long

Patent Potential

Medium

View Full Paper Back to Papers