Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Natural language chain-of-thought (N-CoT) and Program chain-of-thought
(P-CoT) have emerged as two primary paradigms for large language models (LLMs)
to solve mathematical reasoning problems. Current research typically endeavors
to achieve unidirectional enhancement: P-CoT enhanced N-CoT or N-CoT enhanced
P-CoT. In this paper, we seek to fully unleash the two paradigms' strengths for
mutual enhancement and ultimately achieve simultaneous improvements. We conduct
a detailed analysis of the error types across two paradigms, based on which we
propose Parrot, a novel training pipeline for mathematical problems: 1) Three
target-designed subtasks integrate sequential P-CoT and N-CoT generation. 2) A
subtask hybrid training strategy to facilitate natural language semantic
transferability. 3) The converted N-CoT auxiliary reward is designed to
alleviate the sparse rewards in P-CoT optimization. Extensive experiments
demonstrate that Parrot significantly enhances both the performance of N-CoT
and P-CoT, especially on N-CoT. Using Parrot SFT, the N-CoT performance of
LLaMA2 and CodeLLaMA achieve gains of +21.87 and +21.48 on MathQA over the RL
baseline, which is resource-intensive.
Authors (12)
Senjie Jin
Lu Chen
Zhiheng Xi
Yuhui Wang
Sirui Song
Yuhao Zhou
+6 more
Submitted
October 29, 2025
Key Contributions
This paper proposes Parrot, a novel training pipeline designed to achieve mutual enhancement between Natural Language Chain-of-Thought (N-CoT) and Program Chain-of-Thought (P-CoT) for LLMs solving mathematical reasoning problems. It introduces three subtasks for sequential generation, a hybrid training strategy for semantic transferability, and an N-CoT auxiliary reward to address sparse rewards in P-CoT optimization, leading to significant simultaneous improvements in both paradigms.
Business Value
Enhances the reliability and accuracy of AI systems in solving complex mathematical problems, which has applications in automated tutoring, scientific research, and financial modeling.