arxiv_ai 90% Match Research Paper LLM researchers,AI engineers,Educators in STEM,Developers of AI tutors 2 weeks ago

AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation

large-language-models › reasoning

📄 Abstract

Abstract: The creation of high-quality datasets to improve Large Language Model (LLM) reasoning remains a significant challenge, as current methods often suffer from generating low-quality/incorrect answers and limited information richness from available data sources. To address this, we propose AgenticMath, a novel agentic pipeline for generating high-quality mathematical question-answer pairs to enhance the supervised fine-tuning of LLMs. Our method operates through four stages: (1) Seed Question Filter that selects questions with high information richness, complexity, and clarity; (2) an Agentic Question Rephrase step that employs a multi-agent system to generate diverse, logically consistent paraphrases; (3) an Answer Augment step where rewrite answers using chain-of-thought reasoning to enhance numerical and logical correctness, without reliance on human-provided labels; and (4) a final Question and Answer Evaluation that retains only the most superior pairs. Extensive experiments demonstrate that, fine-tuning 3B-8B parameter LLMs on AgenticMath generated datasets (comprising only 30-60K math samples) achieves competitive or superior performance on diverse in domain and out-of-domain mathematical reasoning benchmarks compared to baselines trained on much more data (e.g., 400K or 2.3M samples). Our work demonstrates that targeted, high-quality data generation is a more efficient path to improving mathematical reasoning in LLMs than large-scale, low-quality alternatives.

Authors (7)

Xianyang Liu

Yilin Liu

Shuai Wang

Hao Cheng

Andrew Estornell

Yuzhi Zhao

+1 more

Submitted

October 22, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Proposes AgenticMath, a novel agentic pipeline for generating high-quality mathematical question-answer pairs to enhance LLM supervised fine-tuning. The pipeline uses a multi-agent system for diverse question rephrasing and chain-of-thought reasoning for answer augmentation, ensuring superior quality and correctness without human labels.

Business Value

Enables the creation of better datasets for training LLMs, leading to improved performance in mathematical reasoning tasks, which has applications in education, scientific research, and complex problem-solving.

Paper Metadata

Innovation Type

Methodological Innovation / Data Generation Pipeline

Deployment Feasibility

High, provides a concrete pipeline for data generation.

Limitations Addressed

Addresses the challenge of creating high-quality, diverse, and correct mathematical datasets for LLM training, overcoming issues of low quality, incorrect answers, and limited richness found in existing methods.

Technical Tags

LLM ReasoningMath Data GenerationAgentic PipelineSupervised Fine-TuningChain-of-ThoughtMulti-Agent SystemQuestion AnsweringDataset QualityMathematical Reasoning

Research Topics

Large Language ModelsReasoning CapabilitiesDataset GenerationAI AgentsMathematical AI

Methods & Architectures

Agentic pipeline (Seed Question Filter, Agentic Question Rephrase, Answer Augment, QA Evaluation)Multi-agent system for rephrasingChain-of-thought reasoning for answers Large Language Models (LLMs)

Applications & Tasks

Education AI Research Mathematical Problem Solving Low-quality/incorrect math data generationLimited information richness in datasetsImproving LLM mathematical reasoning Generating high-quality math Q&A pairsSupervised fine-tuning of LLMs for mathEnhancing LLM reasoning

Related Fields

Artificial IntelligenceMachine LearningNatural Language ProcessingEducation TechnologyAI Agents

Keywords

LLM ReasoningMath Data GenerationAgentic PipelineSupervised Fine-TuningChain-of-ThoughtMulti-Agent SystemQuestion AnsweringDataset QualityMathematical ReasoningAI AgentsLLM TrainingAgenticMath

Academic Context

#Large Language Models#Reasoning Capabilities#Dataset Generation#AI Agents#Mathematical AI

Commercial Potential

Potential Products

High-quality math datasets for LLM trainingAI-powered math tutoring systemsTools for automated generation of educational content

Target Industries

EducationEdTechAI ResearchPublishing

Use Case Examples

Generating diverse math problems for training AI tutorsCreating datasets to improve LLMs' ability to solve complex word problemsAutomating the creation of practice materials for standardized math tests

Competitive Edge

Offers a novel agentic approach to math dataset generation, aiming for higher quality and diversity than previous automated methods.

Market Opportunity

Large market for AI training data, especially for specialized domains like math.

Revenue Models

Licensing of generated datasets; offering data generation services.

Resource Requirements

Compute Needs

Moderate to high, for running the agentic pipeline and LLM fine-tuning.

Data Requirements

Requires seed questions with sufficient complexity and information richness.

Deployment Constraints

Ensuring the quality and correctness of generated data at scale; potential for biases in generated data.

Scalability

The agentic pipeline is designed for scalability in generating large datasets.

Regulatory Considerations

None explicitly mentioned.

Production Readiness

Maturity Level

Research/Development

Time to Market

12-24 months for datasets to be widely used in training.

Patent Potential

Moderate, related to the novel agentic pipeline architecture or specific multi-agent coordination strategies.

View Full Paper Back to Papers