arxiv_cl 91% Match Research Paper AI researchers,ML engineers,Developers of LLM agents,Cognitive scientists 1 week ago

AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis

large-language-models › training-methods

📄 Abstract

Abstract: Training large language model agents on tasks at the frontier of their capabilities is key to unlocking advanced reasoning. We introduce a data synthesis approach inspired by the educational theory of the Zone of Proximal Development (ZPD), which defines this frontier as tasks an LLM cannot solve alone but can master with guidance. To operationalize this, we present the AgentFrontier Engine, an automated pipeline that synthesizes high-quality, multidisciplinary data situated precisely within the LLM's ZPD. This engine supports both continued pre-training with knowledge-intensive data and targeted post-training on complex reasoning tasks. From the same framework, we derive the ZPD Exam, a dynamic and automated benchmark designed to evaluate agent capabilities on these frontier tasks. We train AgentFrontier-30B-A3B model on our synthesized data, which achieves state-of-the-art results on demanding benchmarks like Humanity's Last Exam, even surpassing some leading proprietary agents. Our work demonstrates that a ZPD-guided approach to data synthesis offers a scalable and effective path toward building more capable LLM agents.

Authors (10)

Xuanzhong Chen

Zile Qiao

Guoxin Chen

Liangcai Su

Zhen Zhang

Xinyu Wang

+4 more

Submitted

October 28, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

AgentFrontier introduces a data synthesis approach inspired by the Zone of Proximal Development (ZPD) to train LLM agents on tasks at the frontier of their capabilities. The AgentFrontier Engine automatically synthesizes high-quality, multidisciplinary data situated within the LLM's ZPD for both continued pre-training and targeted post-training. It also introduces the ZPD Exam, a dynamic benchmark for evaluating agent capabilities on these frontier tasks.

Business Value

By enabling LLM agents to tackle more complex reasoning tasks, this research can lead to more capable AI assistants, advanced automation tools, and breakthroughs in fields requiring sophisticated problem-solving, driving innovation across industries.

Paper Metadata

Innovation Type

Novel Data Synthesis and Evaluation Methodology

Deployment Feasibility

Moderate, requires infrastructure for data synthesis and training LLMs.

Limitations Addressed

The challenge of training LLM agents on tasks at the frontier of their capabilities and the lack of dynamic benchmarks for evaluating such advanced reasoning.

Performance Gains

AgentFrontier-30B-A3B achieves state-of-the-art results on demanding benchmarks like Humanity's Last Exam, surpassing some leading proprietary agents.

Technical Tags

LLM AgentsData SynthesisZone of Proximal Development (ZPD)Frontier TasksComplex ReasoningContinued Pre-trainingPost-trainingAutomated BenchmarkLLM CapabilitiesKnowledge-Intensive Data

Research Topics

Large Language ModelsAI AgentsMachine LearningReasoningData GenerationEducational Theory

Methods & Architectures

Data SynthesisAutomated pipelineZPD-guided generationBenchmark creation LLM Agents

Applications & Tasks

AI Research Advanced Reasoning Systems AI Agent Development Improving LLM agent capabilitiesTraining on tasks at the frontier of LLM abilitiesEvaluating complex reasoning Complex reasoningKnowledge-intensive tasksAgent performance evaluation

Datasets & Benchmarks

Benchmarks

Humanity's Last Exam • ZPD Exam (dynamic benchmark)

Performance on demanding benchmarksAgent capabilities

Related Fields

Artificial IntelligenceMachine LearningNatural Language ProcessingAI AgentsCognitive ScienceEducational Psychology

Keywords

LLM agentsdata synthesisZPDreasoningfrontier taskstraining databenchmarkAI capabilitiescomplex problem solvingknowledge intensivepost-trainingpre-training

Academic Context

#Large Language Models#AI Agents#Machine Learning#Reasoning#Data Generation#Educational Theory

Commercial Potential

Potential Products

Advanced AI agents for complex tasksPlatforms for synthesizing training data for frontier AI capabilitiesDynamic AI evaluation benchmarks

Target Industries

TechnologyResearch & DevelopmentEducationConsulting

Use Case Examples

Developing AI agents that can perform complex scientific research or engineering design.Creating AI tutors capable of guiding students through challenging, multi-step problems.Building AI systems that can assist in strategic decision-making in business or government.

Competitive Edge

AgentFrontier's ZPD-guided data synthesis and dynamic benchmark offer a more targeted and effective approach to pushing the capability frontier of LLM agents compared to generic training data or static benchmarks.

Resource Requirements

Compute Needs

High compute for data synthesis and LLM training.

Data Requirements

Requires diverse knowledge sources for data synthesis.

Deployment Constraints

Complexity of the data synthesis pipeline and training infrastructure.

Scalability

The automated pipeline suggests good scalability for generating data and benchmarks.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers