arxiv_cl 94% Match Research Paper AI Researchers,LLM Developers,Information Retrieval Specialists,Software Engineers building AI agents 1 week ago

Lost in the Maze: Overcoming Context Limitations in Long-Horizon Agentic Search

large-language-models › reasoning

📄 Abstract

Abstract: Long-horizon agentic search requires iteratively exploring the web over long trajectories and synthesizing information across many sources, and is the foundation for enabling powerful applications like deep research systems. In this work, we show that popular agentic search frameworks struggle to scale to long trajectories primarily due to context limitations-they accumulate long, noisy content, hit context window and tool budgets, or stop early. Then, we introduce SLIM (Simple Lightweight Information Management), a simple framework that separates retrieval into distinct search and browse tools, and periodically summarizes the trajectory, keeping context concise while enabling longer, more focused searches. On long-horizon tasks, SLIM achieves comparable performance at substantially lower cost and with far fewer tool calls than strong open-source baselines across multiple base models. Specifically, with o3 as the base model, SLIM achieves 56% on BrowseComp and 31% on HLE, outperforming all open-source frameworks by 8 and 4 absolute points, respectively, while incurring 4-6x fewer tool calls. Finally, we release an automated fine-grained trajectory analysis pipeline and error taxonomy for characterizing long-horizon agentic search frameworks; SLIM exhibits fewer hallucinations than prior systems. We hope our analysis framework and simple tool design inform future long-horizon agents.

Authors (7)

Howard Yen

Ashwin Paranjape

Mengzhou Xia

Thejas Venkatesh

Jack Hessel

Danqi Chen

+1 more

Submitted

October 21, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This paper introduces SLIM (Simple Lightweight Information Management), a framework designed to overcome context limitations in long-horizon agentic web search. By separating retrieval into distinct search and browse tools and periodically summarizing the trajectory, SLIM maintains concise context, enabling longer, more focused searches with significantly lower cost and fewer tool calls compared to baselines.

Business Value

Enables the development of more powerful automated research assistants and information gathering tools that can tackle complex, multi-step queries efficiently. This can significantly boost productivity in knowledge work.

Paper Metadata

Innovation Type

Framework/Methodology

Deployment Feasibility

High. Leverages existing LLMs and web browsing capabilities. The SLIM framework is designed to be simple and integrate easily.

Limitations Addressed

Context window limitations in LLM agents,Inefficiency and cost of long web search trajectories,Noise and irrelevance in accumulated search content,Premature stopping of agentic search processes

Performance Gains

Comparable performance to strong baselines at substantially lower cost,Far fewer tool calls than baselines

Technical Tags

Agentic SearchLong-Horizon TasksWeb ExplorationInformation SynthesisContext ManagementInformation RetrievalSLIM FrameworkSearch ToolsBrowse ToolsTrajectory SummarizationLLM Agents

Research Topics

LLM AgentsInformation RetrievalWeb SearchReasoningKnowledge Synthesis

Methods & Architectures

SLIM (Simple Lightweight Information Management) FrameworkSeparation of Search and Browse ToolsPeriodic Trajectory SummarizationPrompt Engineering LLM AgentsFoundation Models (e.g., o3)

Applications & Tasks

Web Search Research Systems Information Gathering Automated Data Collection Context Limitations in Long-Horizon Agentic SearchAccumulation of Long, Noisy ContentHitting Context Window and Tool BudgetsEarly Termination of Search Trajectories Enabling Long-Horizon Agentic SearchEfficient Web ExplorationSynthesizing Information Across Multiple SourcesReducing Cost and Tool Calls in Agentic Search

Datasets & Benchmarks

Datasets

BrowseComp, HLE

Benchmarks

56% on BrowseComp • 31% on HLE (with o3 base model)

Task Success RateCostTool Calls

Related Fields

Large Language ModelsArtificial Intelligence AgentsInformation RetrievalWeb MiningNatural Language Processing

Keywords

agentic searchLLM agentslong-horizon tasksweb explorationinformation synthesiscontext managementinformation retrievalSLIMprompt engineeringresearch systems

Academic Context

#LLM Agents#Information Retrieval#Web Search#Reasoning#Knowledge Synthesis

Commercial Potential

Potential Products

Automated research platformsAdvanced web scraping toolsIntelligent information gathering agents

Target Industries

TechnologyFinanceLegalMarket ResearchAcademia

Use Case Examples

An agent that can research a complex scientific topic by browsing multiple sources and synthesizing findingsAutomated market analysis by gathering and summarizing information from news and reportsBuilding deep research systems that can answer intricate questions

Competitive Edge

Provides a more efficient and scalable solution for long-horizon agentic search compared to existing frameworks that suffer from context limitations and high costs.

Market Opportunity

Large and growing market for AI-powered information retrieval and automation tools.

Revenue Models

SaaS for automated research platformsAPI access to agentic search capabilities.

Resource Requirements

Compute Needs

Moderate (inference cost depends on base LLM and search depth)

Data Requirements

N/A (operates on live web data)

Deployment Constraints

Reliant on LLM capabilities and web access. Potential for web scraping policy violations.

Scalability

The SLIM framework is designed for scalability by managing context efficiently. Cost scales sub-linearly with task horizon.

Regulatory Considerations

Web scraping policiesterms of service.

Production Readiness

Maturity Level

Research/Development

Time to Market

1-2 years

Patent Potential

Moderate (novel framework for agentic search context management)

View Full Paper Back to Papers