Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Chain-of-Thought (CoT) prompting has been widely recognized for its ability
to enhance reasoning capabilities in large language models (LLMs). However, our
study reveals a surprising contradiction to this prevailing perspective within
the fundamental domain of pattern-based in-context learning (ICL). Through
extensive experiments involving 16 state-of-the-art LLMs and nine diverse
pattern-based ICL datasets, we demonstrate that CoT and its reasoning variants
consistently underperform direct answering across varying model scales and
benchmark complexities. To systematically investigate this unexpected
phenomenon, we designed extensive experiments to validate several hypothetical
explanations. Our analysis uncovers a fundamental hybrid mechanism of
explicit-implicit reasoning driving CoT's performance in pattern-based ICL:
while explicit reasoning falters due to LLMs' struggles to infer underlying
patterns from demonstrations, implicit reasoning-disrupted by the increased
contextual distance of CoT rationales-often compensates, delivering correct
answers despite flawed rationales. This hybrid mechanism explains CoT's
relative underperformance, as noise from weak explicit inference undermines the
process, even as implicit mechanisms partially salvage outcomes. Notably, even
long-CoT reasoning models, which excel in abstract and symbolic reasoning, fail
to fully overcome these limitations despite higher computational costs. Our
findings challenge existing assumptions regarding the universal efficacy of
CoT, yielding novel insights into its limitations and guiding future research
toward more nuanced and effective reasoning methodologies for LLMs.
Authors (10)
Tianshi Zheng
Yixiang Chen
Chengxi Li
Chunyang Li
Qing Zong
Haochen Shi
+4 more
Key Contributions
Demonstrates that Chain-of-Thought (CoT) prompting consistently underperforms direct answering in pattern-based in-context learning (ICL) across various LLMs and datasets. This challenges the prevailing view of CoT's benefits. The study uncovers a hybrid mechanism of explicit-implicit reasoning as the cause, where explicit reasoning falters due to pattern inference issues and implicit reasoning is disrupted.
Business Value
Provides crucial insights for optimizing prompt engineering strategies, potentially leading to more efficient and effective LLM applications by avoiding suboptimal prompting techniques.