Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Provides a theoretical understanding of why continuous Chain-of-Thoughts (CoTs) outperform discrete CoTs in reasoning tasks. It proves that a two-layer transformer with continuous CoTs can solve directed graph reachability in $D$ steps (graph diameter), significantly outperforming discrete CoTs which require $O(n^2)$ steps.
Deepens the understanding of LLM reasoning capabilities, paving the way for more efficient and powerful LLMs for complex problem-solving in various domains.