Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper LLM Researchers,AI Engineers,Developers of reasoning systems,NLP Practitioners 1 day ago

DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching

large-language-models › reasoning
📄 Abstract

Abstract: Large Reasoning Models (LRMs) demonstrate strong performance on complex reasoning tasks, yet they often suffer from overthinking, producing excessively long chain-of-thought (CoT) traces that increase inference cost and may degrade accuracy. Our analysis reveals a clear anti-correlation between reasoning length and accuracy, where across multiple stochastic decodes, the short reasoning paths consistently achieve the highest correctness, while longer ones accumulate errors and repetitions. These short optimal reasoning paths can be found ideally through full enumeration of the reasoning space. However, the tree-structured reasoning space grows exponentially with sequence length, rendering exhaustive exploration infeasible. To address this, we propose DTS, a model-agnostic decoding framework that sketches the reasoning space by selectively branching at high-entropy tokens and applies early stopping to select the shortest completed reasoning path. This approach approximates the optimal solution that enhances both efficiency and accuracy, without requiring additional training or supervision. Experiments on AIME2024 and AIME2025 datasets with DeepSeek-R1-Distill-Qwen-7B and 1.5B show that DTS improves accuracy by up to 8%, reduces average reasoning length by 23%, and decreases repetition frequency by 12%, demonstrating DTS's ability for scalable and efficient LRM reasoning.
Authors (7)
Zicheng Xu
Guanchu Wang
Yu-Neng Chuang
Guangyao Zheng
Alexander S. Szalay
Zirui Liu
+1 more
Submitted
November 1, 2025
arXiv Category
cs.AI
arXiv PDF

Key Contributions

DTS is a model-agnostic decoding framework designed to enhance Large Reasoning Models (LRMs) by addressing 'overthinking' and excessive inference cost. It sketches the exponential reasoning space by selectively branching at high-entropy tokens and employs early stopping to find the shortest, most accurate reasoning path, mitigating the anti-correlation between reasoning length and accuracy.

Business Value

Reduces the computational cost and latency of complex reasoning tasks performed by LLMs, making advanced AI reasoning capabilities more accessible and practical for real-time applications and resource-constrained environments.