arxiv_cl 95% Match Research Paper AI Researchers,NLP Engineers,Prompt Engineers,LLM Developers 1 day ago

The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning

large-language-models › reasoning

📄 Abstract

Abstract: Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models (LLMs). However, our study reveals a surprising contradiction to this prevailing perspective within the fundamental domain of pattern-based in-context learning (ICL). Through extensive experiments involving 16 state-of-the-art LLMs and nine diverse pattern-based ICL datasets, we demonstrate that CoT and its reasoning variants consistently underperform direct answering across varying model scales and benchmark complexities. To systematically investigate this unexpected phenomenon, we designed extensive experiments to validate several hypothetical explanations. Our analysis uncovers a fundamental hybrid mechanism of explicit-implicit reasoning driving CoT's performance in pattern-based ICL: while explicit reasoning falters due to LLMs' struggles to infer underlying patterns from demonstrations, implicit reasoning-disrupted by the increased contextual distance of CoT rationales-often compensates, delivering correct answers despite flawed rationales. This hybrid mechanism explains CoT's relative underperformance, as noise from weak explicit inference undermines the process, even as implicit mechanisms partially salvage outcomes. Notably, even long-CoT reasoning models, which excel in abstract and symbolic reasoning, fail to fully overcome these limitations despite higher computational costs. Our findings challenge existing assumptions regarding the universal efficacy of CoT, yielding novel insights into its limitations and guiding future research toward more nuanced and effective reasoning methodologies for LLMs.

Authors (10)

Tianshi Zheng

Yixiang Chen

Chengxi Li

Chunyang Li

Qing Zong

Haochen Shi

+4 more

Submitted

April 7, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Demonstrates that Chain-of-Thought (CoT) prompting consistently underperforms direct answering in pattern-based in-context learning (ICL) across various LLMs and datasets. This challenges the prevailing view of CoT's benefits. The study uncovers a hybrid mechanism of explicit-implicit reasoning as the cause, where explicit reasoning falters due to pattern inference issues and implicit reasoning is disrupted.

Business Value

Provides crucial insights for optimizing prompt engineering strategies, potentially leading to more efficient and effective LLM applications by avoiding suboptimal prompting techniques.

Paper Metadata

Innovation Type

Empirical Finding / Analysis

Deployment Feasibility

High, as it's an analytical study informing best practices.

Limitations Addressed

The assumption that CoT always enhances reasoning in ICL, particularly for pattern-based tasks.

Performance Gains

CoT consistently underperforms direct answering.

Technical Tags

Chain-of-Thought (CoT)In-Context Learning (ICL)pattern-based learningreasoning variantsdirect answeringmodel scalebenchmark complexityhybrid mechanismexplicit reasoningimplicit reasoning

Research Topics

Large Language Model ReasoningIn-Context LearningPrompt EngineeringLLM LimitationsCognitive Architectures

Methods & Architectures

Extensive ExperimentsHypothetical Explanation ValidationComparative Analysis (CoT vs. Direct Answering) LLM

Applications & Tasks

Natural Language Processing Artificial Intelligence Research Understanding CoT LimitationsInvestigating ICL PerformanceAnalyzing LLM Reasoning Mechanisms Pattern-based In-Context LearningReasoning Tasks

Datasets & Benchmarks

Datasets

nine diverse pattern-based ICL datasets

Performance (Accuracy)Model ScaleBenchmark Complexity

Related Fields

Artificial IntelligenceMachine LearningNatural Language ProcessingCognitive Science

Keywords

Chain-of-ThoughtCoTIn-Context LearningICLLLMreasoningpromptinglimitationspattern recognitionexplicit reasoningimplicit reasoningperformanceevaluationartificial intelligencenatural language processing

Academic Context

#Large Language Model Reasoning#In-Context Learning#Prompt Engineering#LLM Limitations#Cognitive Architectures

Commercial Potential

Target Industries

TechnologySoftware DevelopmentAI Research

Use Case Examples

Guiding prompt engineering for tasks where pattern recognition is key.Understanding why certain prompts might lead to suboptimal LLM performance.Developing more nuanced evaluation methodologies for LLM reasoning.

Competitive Edge

Challenges the universal applicability of CoT prompting, suggesting a more nuanced approach is needed for certain ICL tasks.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers