Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The creation of high-quality datasets to improve Large Language Model (LLM)
reasoning remains a significant challenge, as current methods often suffer from
generating low-quality/incorrect answers and limited information richness from
available data sources. To address this, we propose AgenticMath, a novel
agentic pipeline for generating high-quality mathematical question-answer pairs
to enhance the supervised fine-tuning of LLMs. Our method operates through four
stages: (1) Seed Question Filter that selects questions with high information
richness, complexity, and clarity; (2) an Agentic Question Rephrase step that
employs a multi-agent system to generate diverse, logically consistent
paraphrases; (3) an Answer Augment step where rewrite answers using
chain-of-thought reasoning to enhance numerical and logical correctness,
without reliance on human-provided labels; and (4) a final Question and Answer
Evaluation that retains only the most superior pairs. Extensive experiments
demonstrate that, fine-tuning 3B-8B parameter LLMs on AgenticMath generated
datasets (comprising only 30-60K math samples) achieves competitive or superior
performance on diverse in domain and out-of-domain mathematical reasoning
benchmarks compared to baselines trained on much more data (e.g., 400K or 2.3M
samples). Our work demonstrates that targeted, high-quality data generation is
a more efficient path to improving mathematical reasoning in LLMs than
large-scale, low-quality alternatives.
Authors (7)
Xianyang Liu
Yilin Liu
Shuai Wang
Hao Cheng
Andrew Estornell
Yuzhi Zhao
+1 more
Submitted
October 22, 2025
Key Contributions
Proposes AgenticMath, a novel agentic pipeline for generating high-quality mathematical question-answer pairs to enhance LLM supervised fine-tuning. The pipeline uses a multi-agent system for diverse question rephrasing and chain-of-thought reasoning for answer augmentation, ensuring superior quality and correctness without human labels.
Business Value
Enables the creation of better datasets for training LLMs, leading to improved performance in mathematical reasoning tasks, which has applications in education, scientific research, and complex problem-solving.