arxiv_ml 95% Match Research Paper AI Theorists,LLM Researchers,Computer Scientists,Researchers in Natural Language Processing 1 day ago

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

large-language-models › reasoning

📄 Abstract

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance in many applications, including challenging reasoning problems via chain-of-thoughts (CoTs) techniques that generate ``thinking tokens'' before answering the questions. While existing theoretical works demonstrate that CoTs with discrete tokens boost the capability of LLMs, recent work on continuous CoTs lacks a theoretical understanding of why it outperforms discrete counterparts in various reasoning tasks such as directed graph reachability, a fundamental graph reasoning problem that includes many practical domain applications as special cases. In this paper, we prove that a two-layer transformer with $D$ steps of continuous CoTs can solve the directed graph reachability problem, where $D$ is the diameter of the graph, while the best known result of constant-depth transformers with discrete CoTs requires $O(n^2)$ decoding steps where $n$ is the number of vertices ($D

Authors (6)

Hanlin Zhu

Shibo Hao

Zhiting Hu

Jiantao Jiao

Stuart Russell

Yuandong Tian

Submitted

May 18, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Provides a theoretical understanding of why continuous Chain-of-Thoughts (CoTs) outperform discrete CoTs in reasoning tasks. It proves that a two-layer transformer with continuous CoTs can solve directed graph reachability in $D$ steps (graph diameter), significantly outperforming discrete CoTs which require $O(n^2)$ steps.

Business Value

Deepens the understanding of LLM reasoning capabilities, paving the way for more efficient and powerful LLMs for complex problem-solving in various domains.

Paper Metadata

Innovation Type

Theoretical Breakthrough

Deployment Feasibility

Primarily theoretical, providing foundational insights. Practical implications involve designing LLMs that can leverage continuous reasoning mechanisms.

Limitations Addressed

Lack of theoretical understanding for the effectiveness of continuous CoTs in LLMs, particularly for complex reasoning tasks like graph reachability.

Performance Gains

Demonstrates a significant theoretical advantage for continuous CoTs in solving graph reachability problems compared to discrete CoTs, requiring fewer steps ($D$ vs $O(n^2)$).

Technical Tags

Large Language Models (LLMs)Chain-of-Thoughts (CoTs)Continuous CoTsDiscrete CoTsReasoningDirected Graph ReachabilityTransformerTheoretical AnalysisComputational ComplexityGraph Reasoning

Research Topics

LLM Reasoning CapabilitiesTheoretical Foundations of LLMsComputational ComplexityGraph AlgorithmsTransformer Architectures

Methods & Architectures

Theoretical proofConstruction of a two-layer transformerAnalysis of continuous CoTsComparison with discrete CoTs Two-layer TransformerLarge Language Models (LLMs)

Applications & Tasks

Natural Language Processing Artificial Intelligence Theory Computer Science Theory ReasoningGraph Problem SolvingTheoretical Analysis Solving directed graph reachabilityUnderstanding the theoretical advantage of continuous CoTsProving the capability of LLMs for complex reasoning

Datasets & Benchmarks

Benchmarks

Directed graph reachability problem

Number of decoding stepsComputational complexity

Related Fields

Theoretical Computer ScienceMachine Learning TheoryNatural Language ProcessingGraph Theory

Keywords

LLMsChain of ThoughtsContinuous ReasoningDiscrete ReasoningGraph ReachabilityTransformerTheoretical AnalysisComputational ComplexityAI ReasoningDeep Learning Theory

Academic Context

#LLM Reasoning Capabilities#Theoretical Foundations of LLMs#Computational Complexity#Graph Algorithms#Transformer Architectures

Commercial Potential

Potential Products

More capable LLMs for complex reasoningTheoretical frameworks for analyzing LLM reasoning

Target Industries

AI ResearchSoftware DevelopmentData AnalysisScientific Research

Use Case Examples

LLMs that can solve complex logic puzzlesAI systems for automated theorem provingAdvanced code generation and analysis tools

Competitive Edge

Provides a crucial theoretical underpinning for the effectiveness of continuous CoTs, explaining their advantage over discrete CoTs for specific reasoning tasks.

Resource Requirements

Compute Needs

Theoretical work, minimal compute. Practical LLMs require substantial compute.

Data Requirements

Not applicable for theoretical proof; practical LLMs require vast text data.

Deployment Constraints

The theoretical result applies to a specific transformer architecture; generalizing to all LLMs requires further research.

Scalability

The theoretical result shows improved step complexity ($D$ vs $n^2$), indicating better scalability for graph reachability.

Production Readiness

Maturity Level

Theoretical Research

View Full Paper Back to Papers