arxiv_cl 92% Match Research Paper AI researchers,ML engineers,Developers working with long documents,Data scientists 1 day ago

ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models

large-language-models › reasoning

📄 Abstract

Abstract: Large Language Models (LLMs), constrained by limited context windows, often face significant performance degradation when reasoning over long contexts. To address this, Retrieval-Augmented Generation (RAG) retrieves and reasons over chunks but frequently sacrifices logical coherence due to its reliance on similarity-based rankings. Similarly, divide-and-conquer frameworks (DCF) split documents into small chunks for independent reasoning and aggregation. While effective for local reasoning, DCF struggles to capture long-range dependencies and risks inducing conflicts by processing chunks in isolation. To overcome these limitations, we propose ToM, a novel Tree-oriented MapReduce framework for long-context reasoning. ToM leverages the inherent hierarchical structure of long documents (e.g., main headings and subheadings) by constructing a DocTree through hierarchical semantic parsing and performing bottom-up aggregation. Using a Tree MapReduce approach, ToM enables recursive reasoning: in the Map step, rationales are generated at child nodes; in the Reduce step, these rationales are aggregated across sibling nodes to resolve conflicts or reach consensus at parent nodes. Experimental results on 70B+ LLMs show that ToM significantly outperforms existing divide-and-conquer frameworks and retrieval-augmented generation methods, achieving better logical coherence and long-context reasoning. Our code is available at https://github.com/gjn12-31/ToM .

Authors (8)

Jiani Guo

Zuchao Li

Jie Wu

Qianren Wang

Yun Li

Lefei Zhang

+2 more

Submitted

November 1, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Introduces ToM, a novel Tree-oriented MapReduce framework for long-context reasoning in LLMs. By leveraging hierarchical document structure through semantic parsing and bottom-up aggregation, ToM overcomes limitations of RAG and standard divide-and-conquer methods, enabling recursive processing for better long-range dependency capture and logical coherence.

Business Value

Enables LLMs to process and reason over much larger documents (e.g., legal contracts, research papers, books) more effectively, unlocking new applications in legal tech, research analysis, and enterprise knowledge management.

Paper Metadata

Innovation Type

Algorithmic Framework

Deployment Feasibility

Moderate, requires implementing the ToM framework and integrating it with LLMs. May have higher computational overhead than standard RAG.

Limitations Addressed

Limited context windows in LLMs,Sacrificed logical coherence in RAG,Inability of DCF to capture long-range dependencies,Risks of induced conflicts in isolated chunk processing

Performance Gains

Improved reasoning over long contexts,Enhanced logical coherence

Technical Tags

long-context reasoningRetrieval-Augmented Generationdivide-and-conquerTree-oriented MapReducehierarchical structuresemantic parsingbottom-up aggregationLLM context window

Research Topics

LLM ReasoningLong-Context ProcessingDocument UnderstandingHierarchical Data Processing

Methods & Architectures

Tree-oriented MapReduceHierarchical semantic parsingBottom-up aggregationRecursive processing Large Language Models (LLMs)

Applications & Tasks

Document Analysis Knowledge Extraction Complex Question Answering Limited LLM context windowsLoss of logical coherence in RAGDifficulty capturing long-range dependenciesConflicts from isolated chunk processing Reasoning over long documentsMaintaining logical coherenceExtracting information from hierarchical texts

Related Fields

Natural Language ProcessingInformation RetrievalDistributed ComputingDocument Analysis

Keywords

long-context reasoningLLMRetrieval-Augmented Generationdivide-and-conquerTree-oriented MapReducehierarchical structuresemantic parsingaggregationdocument analysiscontext windowNLPreasoning

Academic Context

#LLM Reasoning#Long-Context Processing#Document Understanding#Hierarchical Data Processing

Technology Stack

Frameworks & Libraries

MapReduce

Commercial Potential

Potential Products

LLM-powered document analysis toolsAutomated legal contract review systemsResearch summarization platforms

Target Industries

LegalFinancePublishingResearch & DevelopmentEnterprise Software

Use Case Examples

Summarizing lengthy legal documentsAnswering complex questions based on entire booksAnalyzing financial reports with extensive appendices

Competitive Edge

Provides a structured, hierarchical approach to long-context reasoning that aims to improve upon the limitations of flat RAG and standard DCF methods.

Market Opportunity

Increasing need for LLMs to handle large volumes of text data.

Revenue Models

Licensing of the ToM frameworkSaaS solutions for document analysis.

Resource Requirements

Compute Needs

Potentially high, due to recursive processing and semantic parsing.

Data Requirements

Long documents, potentially with hierarchical structure

Deployment Constraints

Complexity of the ToM framework implementation,Computational cost

Scalability

The MapReduce paradigm suggests scalability, but the recursive nature and semantic parsing might introduce bottlenecks.

Production Readiness

Maturity Level

Research/Prototype

Time to Market

2-3 years

Patent Potential

Moderate (for the ToM framework)

View Full Paper Back to Papers