arxiv_cl 90% Match Research Paper AI Researchers,Machine Learning Engineers,LLM Developers 2 weeks ago

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

large-language-models › reasoning

📄 Abstract

Abstract: Large language models (LLMs) demonstrate strong reasoning abilities with chain-of-thought prompting, but explicit reasoning introduces substantial computational overhead. Recent work on latent reasoning reduces this cost by reasoning in latent space without explicit supervision, but performance drops significantly. Our preliminary experiments suggest that this degradation stems from the unstructured latent space, which makes fitting latent tokens difficult. To address this, we restrict the latent space to the column space of the LLM vocabulary, treating latent reasoning as a superposition over vocabulary probabilities. Once latent reasoning concludes, it collapses into an eigenstate of explicit reasoning to yield the final answer. Based on this idea, we propose Latent-SFT, a two-stage learning framework. In the first stage, we design two specialized attention masks to guide the Latent Token Encoder in generating latent tokens, allowing the LLM to produce the correct answer conditioned on them. In the second stage, the Latent Token Encoder is discarded, and the LLM is directly trained to generate these latent tokens autonomously for latent reasoning, optimized with KL and CE losses. Latent-SFT sets a new state of the art on GSM8k, matching explicit SFT performance while cutting reasoning chains by up to 4 times and outperforming prior latent methods. On Math500 and AIME24, lexical probability-based latent reasoning also clearly surpasses hidden-state-based approaches. Our metrics of effective compression rate and effective global parallelism further show that latent reasoning is both the compression of a single path and the superposition of multiple paths.

Authors (9)

Jingcheng Deng

Liang Pang

Zihao Wei

Shichen Xu

Zenghao Duan

Kun Xu

+3 more

Submitted

October 17, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

This paper proposes treating latent reasoning in LLMs as a superposition over vocabulary probabilities within the LLM's vocabulary space. It introduces the Latent-SFT framework with specialized attention masks to guide latent token generation, aiming to reduce computational overhead while maintaining reasoning performance.

Business Value

Enables more computationally efficient LLMs for complex reasoning tasks, potentially leading to faster and cheaper AI solutions for problem-solving and decision support.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

Moderate. Requires modifications to LLM training and inference processes, but builds on existing LLM capabilities.

Limitations Addressed

Addresses the significant performance drop in latent reasoning and the unstructured nature of latent spaces, aiming to reduce the computational overhead associated with explicit chain-of-thought reasoning.

Technical Tags

latent reasoningLLM reasoningvocabulary spacesuperpositionchain-of-thoughtattention masksLatent Token Encodereigenstatecomputational overhead

Research Topics

Large Language ModelsReasoningMachine Learning TheoryModel InterpretabilityComputational Efficiency

Methods & Architectures

Latent-SFT frameworkVocabulary-space superpositionSpecialized attention masksLatent Token Encoder Large Language Models

Applications & Tasks

AI Reasoning Problem Solving Computational Overhead of Explicit ReasoningPerformance Degradation in Latent ReasoningUnstructured Latent Space Efficient Latent ReasoningImproving Latent Space StructureReducing Reasoning Computational Cost

Related Fields

Artificial IntelligenceMachine LearningCognitive ScienceComputational Linguistics

Keywords

Latent ReasoningLarge Language ModelsLLM ReasoningVocabulary SpaceSuperpositionChain-of-ThoughtAttention MasksLatent-SFTComputational EfficiencyAIMachine Learning

Academic Context

#Large Language Models#Reasoning#Machine Learning Theory#Model Interpretability#Computational Efficiency

Commercial Potential

Potential Products

More efficient AI reasoning enginesLLM-powered problem-solving tools

Target Industries

TechnologyResearch & DevelopmentConsulting

Use Case Examples

Faster AI-driven scientific discoveryMore efficient AI assistants for complex queriesOptimized AI for game playing or strategic analysis

Competitive Edge

Offers a novel theoretical framework and practical method (Latent-SFT) for improving the efficiency and effectiveness of latent reasoning in LLMs.

Market Opportunity

Significant market interest in advancing AI reasoning capabilities.

Revenue Models

Licensing of the Latent-SFT frameworkintegration into commercial LLM products.

Resource Requirements

Compute Needs

Standard LLM training/inference hardware, potentially optimized for latent space operations.

Data Requirements

Large text corpora for LLM pre-training and fine-tuning.

Deployment Constraints

Complexity of implementing the specialized attention masks and latent space operations.

Scalability

Scales with the underlying LLM architecture and the efficiency of the proposed latent reasoning mechanism.

Production Readiness

Maturity Level

Research

Time to Market

Medium to Long

Patent Potential

Moderate

View Full Paper Back to Papers