Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper LLM researchers,AI safety researchers,ML engineers,AI ethicists 2 weeks ago

Understanding Reasoning in Thinking Language Models via Steering Vectors

large-language-models โ€บ reasoning
๐Ÿ“„ Abstract

Abstract: Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these models achieve improved performance, controlling their reasoning processes remains challenging. This work presents a steering approach for thinking LLMs by analyzing and manipulating specific reasoning behaviors in DeepSeek-R1-Distill models. Through a systematic experiment on 500 tasks across 10 diverse categories, we identify several reasoning behaviors exhibited by thinking models, including expressing uncertainty, generating examples for hypothesis validation, and backtracking in reasoning chains. We demonstrate that these behaviors are mediated by linear directions in the model's activation space and can be controlled using steering vectors. By extracting and applying these vectors, we provide a method to modulate specific aspects of the model's reasoning process, such as its tendency to backtrack or express uncertainty. Our approach offers practical tools for steering reasoning processes in thinking models in a controlled and interpretable manner. We validate our steering method using three DeepSeek-R1-Distill models, demonstrating consistent control across different model architectures.
Authors (5)
Constantin Venhoff
Ivรกn Arcuschin
Philip Torr
Arthur Conmy
Neel Nanda
Submitted
June 22, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Presents a steering approach for 'thinking' LLMs using steering vectors derived from activation space analysis. Identifies and demonstrates control over specific reasoning behaviors (e.g., expressing uncertainty, backtracking) in DeepSeek-R1-Distill models, offering a method to modulate their reasoning processes.

Business Value

Increases the reliability and predictability of LLM outputs, crucial for applications requiring trustworthy reasoning, such as legal analysis or complex problem-solving.