arxiv_ai 95% Match Research Paper AI Researchers,Machine Learning Engineers,Cognitive Scientists,Developers of AI reasoning systems 1 week ago

Controlling Thinking Speed in Reasoning Models

large-language-models › reasoning

📄 Abstract

Abstract: Human cognition is theorized to operate in two modes: fast, intuitive System 1 thinking and slow, deliberate System 2 thinking. While current Large Reasoning Models (LRMs) excel at System 2 thinking, their inability to perform fast thinking leads to high computational overhead and latency. In this work, we enable LRMs to approximate human intelligence through dynamic thinking speed adjustment, optimizing accuracy-efficiency trade-offs. Our approach addresses two key questions: (1) how to control thinking speed in LRMs, and (2) when to adjust it for optimal performance. For the first question, we identify the steering vector that governs slow-fast thinking transitions in LRMs' representation space. Using this vector, we achieve the first representation editing-based test-time scaling effect, outperforming existing prompt-based scaling methods. For the second question, we apply real-time difficulty estimation to signal reasoning segments of varying complexity. Combining these techniques, we propose the first reasoning strategy that enables fast processing of easy steps and deeper analysis for complex reasoning. Without any training or additional cost, our plug-in module delivers an average +1.3% accuracy with -8.6% token usage across leading LRMs and advanced reasoning benchmarks. All of our algorithms are implemented based on vLLM and are expected to support broader applications and inspire future research.

Authors (9)

Zhengkai Lin

Zhihang Fu

Ze Chen

Chao Chen

Liang Xie

Wenxiao Wang

+3 more

Submitted

July 4, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This work enables Large Reasoning Models (LRMs) to approximate human intelligence by dynamically adjusting thinking speed, optimizing accuracy-efficiency trade-offs. It introduces a representation editing-based approach using a steering vector for test-time scaling, outperforming prompt-based methods, and utilizes real-time difficulty estimation to signal when to adjust speed, achieving the first controllable fast-thinking capability in LRMs.

Business Value

Reducing latency and computational overhead in reasoning models can significantly lower operational costs and improve user experience for AI-powered applications requiring complex reasoning, such as advanced chatbots, decision support systems, and automated analysis tools.

Paper Metadata

Innovation Type

Algorithmic Improvement and Control Mechanism

Deployment Feasibility

Moderate to High. Requires modification of model inference process and potentially fine-tuning for difficulty estimation, but the core mechanism is test-time adaptation.

Limitations Addressed

Inability of current LRMs to perform fast thinking, leading to high computational overhead and latency.

Performance Gains

Outperforms existing prompt-based scaling methods; achieves optimized accuracy-efficiency trade-offs.

Technical Tags

Reasoning ModelsThinking Speed ControlSystem 1 ThinkingSystem 2 ThinkingRepresentation EditingSteering VectorTest-Time ScalingDifficulty EstimationAccuracy-Efficiency Trade-offLatency Reduction

Research Topics

Artificial IntelligenceMachine LearningCognitive ScienceLarge Language ModelsReasoning

Methods & Architectures

Representation EditingSteering Vector IdentificationTest-Time ScalingReal-time Difficulty EstimationComparative Analysis (vs. prompt-based methods) Large Reasoning Models (LRMs)Large Language Models (LLMs)

Applications & Tasks

Natural Language Processing Artificial Intelligence Cognitive Modeling Controlling Thinking SpeedOptimizing Accuracy-Efficiency Trade-offsReducing Latency in Reasoning ModelsApproximating Human-like Cognition Enabling fast thinking in LRMsDynamically adjusting thinking speed based on task difficultyImproving the efficiency of complex reasoning tasks

Related Fields

Cognitive ScienceNeuroscienceArtificial IntelligenceMachine LearningNatural Language Processing

Keywords

Reasoning ModelsThinking SpeedSystem 1System 2LLMRepresentation EditingLatencyEfficiencyCognitive AITest-Time Adaptation

Academic Context

#Artificial Intelligence#Machine Learning#Cognitive Science#Large Language Models#Reasoning

Commercial Potential

Potential Products

Faster and more efficient AI reasoning enginesAI systems that mimic human cognitive flexibilityTools for optimizing AI inference costs

Target Industries

TechnologyAI DevelopmentCustomer Service (AI agents)Finance (algorithmic trading, risk analysis)

Use Case Examples

Developing AI agents that can respond quickly to simple queries and deliberate on complex onesOptimizing AI-powered legal or medical diagnostic tools for speed and accuracyCreating more responsive AI assistants

Competitive Edge

This work offers a novel representation editing approach for controlling thinking speed, claiming superiority over existing prompt-based methods by directly manipulating model representations.

Market Opportunity

Large and growing market for efficient and capable AI reasoning systems.

Revenue Models

Licensing of the technologyintegration into AI platformsperformance optimization services.

Resource Requirements

Compute Needs

Requires computational resources for running LRMs and potentially for estimating difficulty in real-time.

Data Requirements

Likely requires datasets suitable for reasoning tasks and potentially data for training difficulty estimators.

Deployment Constraints

Integration into existing LRM inference pipelines; potential overhead from difficulty estimation.

Scalability

The approach is designed for test-time adaptation, suggesting scalability to different model sizes and inference scenarios.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years for integration into production systems.

Patent Potential

Potential for patents on the representation editing technique and difficulty estimation methods.

View Full Paper Back to Papers