arxiv_ai 95% Match Research Paper LLM Researchers,AI Researchers,Developers of complex AI systems 1 week ago

Lost in Transmission: When and Why LLMs Fail to Reason Globally

large-language-models › reasoning

📄 Abstract

Abstract: Despite their many successes, transformer-based large language models (LLMs) continue to struggle with tasks that require complex reasoning over large parts of their input. We argue that these failures arise due to capacity limits on the accurate flow of information within LLMs. To formalize this issue, we introduce the bounded attention prefix oracle (BAPO) model, a new computational framework that models bandwidth constraints on attention heads, the mechanism for internal communication in LLMs. We show that several important reasoning problems like graph reachability require high communication bandwidth for BAPOs to solve; we call these problems BAPO-hard. Our experiments corroborate our theoretical predictions: GPT-4o, Claude, and Gemini succeed on BAPO-easy tasks and fail even on relatively small BAPO-hard tasks. BAPOs also reveal another benefit of chain of thought (CoT): we prove that breaking down a task using CoT can turn any BAPO-hard problem into a BAPO-easy one. Our results offer principled explanations for key LLM failures and suggest directions for architectures and inference methods that mitigate bandwidth limits.

Authors (4)

Tobias Schnabel

Kiran Tomlinson

Adith Swaminathan

Jennifer Neville

Submitted

May 13, 2025

arXiv Category

cs.AI

arXiv PDF

Key Contributions

This paper identifies that LLM failures in complex reasoning stem from capacity limits on information flow within transformers. It introduces the Bounded Attention Prefix Oracle (BAPO) model to formalize bandwidth constraints on attention heads, defines 'BAPO-hard' problems, and shows that current LLMs fail on these. The work also demonstrates how Chain of Thought (CoT) can simplify BAPO-hard problems.

Business Value

Helps developers understand and mitigate limitations in LLMs for tasks requiring deep reasoning, leading to more reliable AI applications.

Paper Metadata

Innovation Type

Theoretical Framework and Empirical Analysis

Deployment Feasibility

Provides analytical insights, guiding the development of future LLMs rather than being a deployable system itself.

Limitations Addressed

LLMs' struggle with complex, long-context reasoning and understanding the underlying causes.

Performance Gains

Identifies specific failure modes and conditions under which current LLMs fail, providing a basis for future improvements.

Technical Tags

LLM reasoningtransformer modelsinformation flowbounded attentionBAPO modelcommunication bandwidthBAPO-hard problemsGPT-4oClaudeGeminichain of thoughtCoT

Research Topics

Large Language Model ReasoningTransformer ArchitecturesInformation Theory in AIComputational Complexity

Methods & Architectures

Bounded Attention Prefix Oracle (BAPO) modelTheoretical analysisEmpirical evaluationBandwidth constraint modeling Transformer-based LLMsGPT-4oClaudeGemini

Applications & Tasks

Natural Language Understanding AI Reasoning LLM Evaluation LLM Failures in Complex ReasoningInformation Bottlenecks in TransformersGlobal Reasoning Limitations Complex reasoning over large inputsGraph reachabilityEvaluating LLM reasoning capabilities

Datasets & Benchmarks

Benchmarks

GPT-4o, Claude, Gemini performance on BAPO-easy and BAPO-hard tasks.

Success rate on reasoning tasksCommunication bandwidth requirementsPerformance on BAPO-hard vs. BAPO-easy tasks

Related Fields

Artificial IntelligenceMachine LearningNatural Language ProcessingComputer Science TheoryCognitive Science

Keywords

LLM ReasoningTransformer ModelsInformation FlowAttention MechanismBounded AttentionBAPO ModelCommunication BandwidthBAPO-hardChain of ThoughtLLM LimitationsComplex ReasoningGPT-4oClaudeGemini

Academic Context

#Large Language Model Reasoning#Transformer Architectures#Information Theory in AI#Computational Complexity

Commercial Potential

Target Industries

AI ResearchTechnology

Use Case Examples

Diagnosing why an LLM fails on a complex logical puzzle.Designing future LLMs with improved long-range dependency handling.

Competitive Edge

Provides a theoretical explanation for observed LLM reasoning failures, complementing empirical studies.

Resource Requirements

Compute Needs

Not specified, but experiments likely required significant compute for LLM evaluations.

Data Requirements

Requires datasets for evaluating reasoning tasks, potentially custom-built for BAPO-hard problems.

Deployment Constraints

The BAPO model is a theoretical construct; practical implementation requires architectural changes to LLMs.

Scalability

Addresses limitations in scaling reasoning capabilities within current transformer architectures.

Production Readiness

Maturity Level

Theoretical/Research

View Full Paper Back to Papers