Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Human cognition is theorized to operate in two modes: fast, intuitive System
1 thinking and slow, deliberate System 2 thinking. While current Large
Reasoning Models (LRMs) excel at System 2 thinking, their inability to perform
fast thinking leads to high computational overhead and latency. In this work,
we enable LRMs to approximate human intelligence through dynamic thinking speed
adjustment, optimizing accuracy-efficiency trade-offs. Our approach addresses
two key questions: (1) how to control thinking speed in LRMs, and (2) when to
adjust it for optimal performance. For the first question, we identify the
steering vector that governs slow-fast thinking transitions in LRMs'
representation space. Using this vector, we achieve the first representation
editing-based test-time scaling effect, outperforming existing prompt-based
scaling methods. For the second question, we apply real-time difficulty
estimation to signal reasoning segments of varying complexity. Combining these
techniques, we propose the first reasoning strategy that enables fast
processing of easy steps and deeper analysis for complex reasoning. Without any
training or additional cost, our plug-in module delivers an average +1.3%
accuracy with -8.6% token usage across leading LRMs and advanced reasoning
benchmarks. All of our algorithms are implemented based on vLLM and are
expected to support broader applications and inspire future research.
Authors (9)
Zhengkai Lin
Zhihang Fu
Ze Chen
Chao Chen
Liang Xie
Wenxiao Wang
+3 more
Key Contributions
This work enables Large Reasoning Models (LRMs) to approximate human intelligence by dynamically adjusting thinking speed, optimizing accuracy-efficiency trade-offs. It introduces a representation editing-based approach using a steering vector for test-time scaling, outperforming prompt-based methods, and utilizes real-time difficulty estimation to signal when to adjust speed, achieving the first controllable fast-thinking capability in LRMs.
Business Value
Reducing latency and computational overhead in reasoning models can significantly lower operational costs and improve user experience for AI-powered applications requiring complex reasoning, such as advanced chatbots, decision support systems, and automated analysis tools.