arxiv_ml 95% Match Research Paper LLM Researchers,AI Engineers,Developers of reasoning systems,NLP Practitioners 1 day ago

DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching

large-language-models › reasoning

📄 Abstract

Abstract: Large Reasoning Models (LRMs) demonstrate strong performance on complex reasoning tasks, yet they often suffer from overthinking, producing excessively long chain-of-thought (CoT) traces that increase inference cost and may degrade accuracy. Our analysis reveals a clear anti-correlation between reasoning length and accuracy, where across multiple stochastic decodes, the short reasoning paths consistently achieve the highest correctness, while longer ones accumulate errors and repetitions. These short optimal reasoning paths can be found ideally through full enumeration of the reasoning space. However, the tree-structured reasoning space grows exponentially with sequence length, rendering exhaustive exploration infeasible. To address this, we propose DTS, a model-agnostic decoding framework that sketches the reasoning space by selectively branching at high-entropy tokens and applies early stopping to select the shortest completed reasoning path. This approach approximates the optimal solution that enhances both efficiency and accuracy, without requiring additional training or supervision. Experiments on AIME2024 and AIME2025 datasets with DeepSeek-R1-Distill-Qwen-7B and 1.5B show that DTS improves accuracy by up to 8%, reduces average reasoning length by 23%, and decreases repetition frequency by 12%, demonstrating DTS's ability for scalable and efficient LRM reasoning.

Authors (7)

Zicheng Xu

Guanchu Wang

Yu-Neng Chuang

Guangyao Zheng

Alexander S. Szalay

Zirui Liu

+1 more

Submitted

November 1, 2025

arXiv Category

cs.AI

arXiv PDF

Key Contributions

DTS is a model-agnostic decoding framework designed to enhance Large Reasoning Models (LRMs) by addressing 'overthinking' and excessive inference cost. It sketches the exponential reasoning space by selectively branching at high-entropy tokens and employs early stopping to find the shortest, most accurate reasoning path, mitigating the anti-correlation between reasoning length and accuracy.

Business Value

Reduces the computational cost and latency of complex reasoning tasks performed by LLMs, making advanced AI reasoning capabilities more accessible and practical for real-time applications and resource-constrained environments.

Paper Metadata

Innovation Type

Algorithmic/Decoding Strategy

Deployment Feasibility

High. DTS is model-agnostic, meaning it can be applied to various LRMs without retraining. Its effectiveness relies on accurate identification of high-entropy tokens and appropriate early stopping criteria.

Limitations Addressed

Overthinking and verbose outputs from LRMs,High inference costs associated with long CoT traces,Degradation of accuracy with longer reasoning paths,Infeasibility of exhaustive search in reasoning spaces

Technical Tags

Large Reasoning Models (LRMs)Decoding Tree Sketching (DTS)Chain-of-Thought (CoT)OverthinkingInference CostReasoning LengthHigh-Entropy TokensEarly StoppingModel-AgnosticReasoning Space Enumeration

Research Topics

Improving LLM ReasoningEfficient Inference for LLMsControlling LLM Output LengthDecoding StrategiesUncertainty in LLM Reasoning

Methods & Architectures

Decoding Tree Sketching (DTS)Selective BranchingHigh-Entropy Token DetectionEarly StoppingModel-Agnostic Decoding Large Reasoning Models (LRMs)

Applications & Tasks

Natural Language Processing Artificial Intelligence Knowledge Representation Reasoning Systems Overthinking in LLMsExcessive inference costDegradation of accuracy due to long reasoning tracesInfeasibility of full reasoning space enumeration Complex reasoningProblem solvingGenerating concise and accurate reasoning paths

Related Fields

Natural Language ProcessingLarge Language ModelsArtificial IntelligenceMachine LearningComputational Linguistics

Keywords

Large Language ModelsReasoningDecoding StrategiesChain-of-ThoughtInference OptimizationOverthinkingModel-AgnosticEarly StoppingHigh-Entropy TokensLLM Efficiency

Academic Context

#Improving LLM Reasoning#Efficient Inference for LLMs#Controlling LLM Output Length#Decoding Strategies#Uncertainty in LLM Reasoning

Commercial Potential

Potential Products

Optimized LLM inference enginesAI reasoning APIs with reduced latencyTools for controlling LLM verbosity

Target Industries

TechnologyCustomer SupportEducationResearchFinance

Use Case Examples

Faster AI-powered customer service chatbotsMore efficient automated code generationQuicker analysis of complex documents

Competitive Edge

Offers a novel decoding strategy to specifically address the 'overthinking' problem in LRMs, providing a more efficient and accurate alternative to standard CoT decoding.

Resource Requirements

Compute Needs

Lower inference compute requirements compared to standard long CoT decoding, due to shorter reasoning paths and early stopping.

Data Requirements

Requires models capable of complex reasoning and potentially benchmark datasets for evaluating reasoning accuracy and efficiency.

Deployment Constraints

The effectiveness of DTS depends on the model's ability to generate diverse reasoning paths and the accuracy of identifying high-entropy tokens. May require fine-tuning of decoding parameters.

Scalability

Designed to improve the scalability of LRMs for complex reasoning tasks by reducing inference time and computational cost.

View Full Paper Back to Papers