Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Despite their many successes, transformer-based large language models (LLMs)
continue to struggle with tasks that require complex reasoning over large parts
of their input. We argue that these failures arise due to capacity limits on
the accurate flow of information within LLMs. To formalize this issue, we
introduce the bounded attention prefix oracle (BAPO) model, a new computational
framework that models bandwidth constraints on attention heads, the mechanism
for internal communication in LLMs. We show that several important reasoning
problems like graph reachability require high communication bandwidth for BAPOs
to solve; we call these problems BAPO-hard. Our experiments corroborate our
theoretical predictions: GPT-4o, Claude, and Gemini succeed on BAPO-easy tasks
and fail even on relatively small BAPO-hard tasks. BAPOs also reveal another
benefit of chain of thought (CoT): we prove that breaking down a task using CoT
can turn any BAPO-hard problem into a BAPO-easy one. Our results offer
principled explanations for key LLM failures and suggest directions for
architectures and inference methods that mitigate bandwidth limits.
Authors (4)
Tobias Schnabel
Kiran Tomlinson
Adith Swaminathan
Jennifer Neville
Key Contributions
This paper identifies that LLM failures in complex reasoning stem from capacity limits on information flow within transformers. It introduces the Bounded Attention Prefix Oracle (BAPO) model to formalize bandwidth constraints on attention heads, defines 'BAPO-hard' problems, and shows that current LLMs fail on these. The work also demonstrates how Chain of Thought (CoT) can simplify BAPO-hard problems.
Business Value
Helps developers understand and mitigate limitations in LLMs for tasks requiring deep reasoning, leading to more reliable AI applications.