Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 90% Match Research Paper AI Researchers,Machine Learning Engineers,LLM Developers 2 weeks ago

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

large-language-models › reasoning
📄 Abstract

Abstract: Large language models (LLMs) demonstrate strong reasoning abilities with chain-of-thought prompting, but explicit reasoning introduces substantial computational overhead. Recent work on latent reasoning reduces this cost by reasoning in latent space without explicit supervision, but performance drops significantly. Our preliminary experiments suggest that this degradation stems from the unstructured latent space, which makes fitting latent tokens difficult. To address this, we restrict the latent space to the column space of the LLM vocabulary, treating latent reasoning as a superposition over vocabulary probabilities. Once latent reasoning concludes, it collapses into an eigenstate of explicit reasoning to yield the final answer. Based on this idea, we propose Latent-SFT, a two-stage learning framework. In the first stage, we design two specialized attention masks to guide the Latent Token Encoder in generating latent tokens, allowing the LLM to produce the correct answer conditioned on them. In the second stage, the Latent Token Encoder is discarded, and the LLM is directly trained to generate these latent tokens autonomously for latent reasoning, optimized with KL and CE losses. Latent-SFT sets a new state of the art on GSM8k, matching explicit SFT performance while cutting reasoning chains by up to 4 times and outperforming prior latent methods. On Math500 and AIME24, lexical probability-based latent reasoning also clearly surpasses hidden-state-based approaches. Our metrics of effective compression rate and effective global parallelism further show that latent reasoning is both the compression of a single path and the superposition of multiple paths.
Authors (9)
Jingcheng Deng
Liang Pang
Zihao Wei
Shichen Xu
Zenghao Duan
Kun Xu
+3 more
Submitted
October 17, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper proposes treating latent reasoning in LLMs as a superposition over vocabulary probabilities within the LLM's vocabulary space. It introduces the Latent-SFT framework with specialized attention masks to guide latent token generation, aiming to reduce computational overhead while maintaining reasoning performance.

Business Value

Enables more computationally efficient LLMs for complex reasoning tasks, potentially leading to faster and cheaper AI solutions for problem-solving and decision support.