Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI Researchers,Machine Learning Engineers,Cognitive Scientists,Developers of AI reasoning systems 1 week ago

Controlling Thinking Speed in Reasoning Models

large-language-models › reasoning
📄 Abstract

Abstract: Human cognition is theorized to operate in two modes: fast, intuitive System 1 thinking and slow, deliberate System 2 thinking. While current Large Reasoning Models (LRMs) excel at System 2 thinking, their inability to perform fast thinking leads to high computational overhead and latency. In this work, we enable LRMs to approximate human intelligence through dynamic thinking speed adjustment, optimizing accuracy-efficiency trade-offs. Our approach addresses two key questions: (1) how to control thinking speed in LRMs, and (2) when to adjust it for optimal performance. For the first question, we identify the steering vector that governs slow-fast thinking transitions in LRMs' representation space. Using this vector, we achieve the first representation editing-based test-time scaling effect, outperforming existing prompt-based scaling methods. For the second question, we apply real-time difficulty estimation to signal reasoning segments of varying complexity. Combining these techniques, we propose the first reasoning strategy that enables fast processing of easy steps and deeper analysis for complex reasoning. Without any training or additional cost, our plug-in module delivers an average +1.3% accuracy with -8.6% token usage across leading LRMs and advanced reasoning benchmarks. All of our algorithms are implemented based on vLLM and are expected to support broader applications and inspire future research.
Authors (9)
Zhengkai Lin
Zhihang Fu
Ze Chen
Chao Chen
Liang Xie
Wenxiao Wang
+3 more
Submitted
July 4, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This work enables Large Reasoning Models (LRMs) to approximate human intelligence by dynamically adjusting thinking speed, optimizing accuracy-efficiency trade-offs. It introduces a representation editing-based approach using a steering vector for test-time scaling, outperforming prompt-based methods, and utilizes real-time difficulty estimation to signal when to adjust speed, achieving the first controllable fast-thinking capability in LRMs.

Business Value

Reducing latency and computational overhead in reasoning models can significantly lower operational costs and improve user experience for AI-powered applications requiring complex reasoning, such as advanced chatbots, decision support systems, and automated analysis tools.