Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper AI Theorists,LLM Researchers,Computer Scientists,Researchers in Natural Language Processing 1 day ago

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

large-language-models › reasoning
📄 Abstract

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance in many applications, including challenging reasoning problems via chain-of-thoughts (CoTs) techniques that generate ``thinking tokens'' before answering the questions. While existing theoretical works demonstrate that CoTs with discrete tokens boost the capability of LLMs, recent work on continuous CoTs lacks a theoretical understanding of why it outperforms discrete counterparts in various reasoning tasks such as directed graph reachability, a fundamental graph reasoning problem that includes many practical domain applications as special cases. In this paper, we prove that a two-layer transformer with $D$ steps of continuous CoTs can solve the directed graph reachability problem, where $D$ is the diameter of the graph, while the best known result of constant-depth transformers with discrete CoTs requires $O(n^2)$ decoding steps where $n$ is the number of vertices ($D
Authors (6)
Hanlin Zhu
Shibo Hao
Zhiting Hu
Jiantao Jiao
Stuart Russell
Yuandong Tian
Submitted
May 18, 2025
arXiv Category
cs.LG

Key Contributions

Provides a theoretical understanding of why continuous Chain-of-Thoughts (CoTs) outperform discrete CoTs in reasoning tasks. It proves that a two-layer transformer with continuous CoTs can solve directed graph reachability in $D$ steps (graph diameter), significantly outperforming discrete CoTs which require $O(n^2)$ steps.

Business Value

Deepens the understanding of LLM reasoning capabilities, paving the way for more efficient and powerful LLMs for complex problem-solving in various domains.