arxiv_ml 95% Match Research Paper LLM researchers,NLP engineers,AI developers 2 weeks ago

No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models

large-language-models › reasoning

📄 Abstract

Abstract: Masked diffusion language models (MDLMs) are trained to in-fill positions in randomly masked sequences, in contrast to next-token prediction models. Discussions around MDLMs focus on two benefits: (1) any-order decoding and 2) multi-token decoding. However, we observe that for math and coding tasks, any-order algorithms often underperform or behave similarly to left-to-right sampling, and standard multi-token decoding significantly degrades performance. At inference time, MDLMs compute the conditional distribution of all masked positions. A natural question is: How can we justify this additional compute when left-to-right one-token-at-a-time decoding is on par with any-order decoding algorithms? First, we propose reasoning-as-infilling. By using MDLMs to infill a reasoning template, we can structure outputs and distinguish between reasoning and answer tokens. In turn, this enables measuring answer uncertainty during reasoning, and early exits when the model converges on an answer. Next, given an answer, reasoning-as-infilling enables sampling from the MDLM posterior over reasoning traces conditioned on the answer, providing a new source of high-quality data for post-training. On GSM8k, we observe that fine-tuning LLaDA-8B Base on its posterior reasoning traces provides a performance boost on par with fine-tuning on human-written reasoning traces. Additionally, given an answer, reasoning-as-infilling provides a method for scoring the correctness of the reasoning process at intermediate steps. Second, we propose multi-token entropy decoding (MED), a simple adaptive sampler that minimizes the error incurred by decoding positions in parallel based on the conditional entropies of those positions. MED preserves performance across benchmarks and leads to 2.7x fewer steps. Our work demonstrates that the training and compute used by MDLMs unlock many new inference and post-training methods.

Authors (7)

Zachary Horvitz

Raghav Singhal

Hao Zou

Carles Domingo-Enrich

Zhou Yu

Rajesh Ranganath

+1 more

Submitted

October 22, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Proposes 'reasoning-as-infilling' for Masked Diffusion Language Models (MDLMs), enabling structured outputs and distinguishing reasoning from answers. This allows for measuring answer uncertainty and implementing early exits, justifying the compute cost of MDLMs beyond simple left-to-right decoding.

Business Value

Enhances the capabilities of LLMs for complex reasoning tasks like solving math problems or generating code, leading to more powerful AI assistants and tools.

Paper Metadata

Innovation Type

Methodological

Deployment Feasibility

Moderate, requires specific model architectures (MDLMs) and inference strategies.

Limitations Addressed

Underperformance of any-order decoding and degradation with multi-token decoding in MDLMs for math/coding tasks, and the justification for their computational cost.

Performance Gains

Enables structured reasoning, uncertainty measurement, and early exits, potentially improving performance and efficiency for specific tasks.

Technical Tags

Masked Diffusion Language Models (MDLMs)InfillingAny-order DecodingMulti-token DecodingReasoningSamplingConditional DistributionLanguage Models

Research Topics

Language Model ArchitecturesReasoning in LLMsDecoding StrategiesConditional GenerationComputational Efficiency

Methods & Architectures

Reasoning-as-infillingStructured output generationAnswer uncertainty measurementEarly exit mechanisms Masked Diffusion Language Models (MDLMs)

Applications & Tasks

Natural Language Processing AI Reasoning Code Generation Mathematical Problem Solving ReasoningConditional GenerationInfilling Improving reasoning and sampling in MDLMs, particularly for math and coding tasks.

Related Fields

Natural Language ProcessingDeep LearningArtificial IntelligenceComputational Linguistics

Keywords

Masked Diffusion Language ModelsMDLMInfillingReasoningLLMDecodingConditional GenerationUncertainty QuantificationEarly ExitMath ProblemsCode Generation

Academic Context

#Language Model Architectures#Reasoning in LLMs#Decoding Strategies#Conditional Generation#Computational Efficiency

Commercial Potential

Potential Products

Advanced AI reasoning enginesCode generation toolsMathematical problem solvers

Target Industries

TechnologySoftware DevelopmentEducationResearch

Use Case Examples

AI assistants that can solve complex math problems step-by-stepAutomated code generation and debugging tools

Competitive Edge

Offers a novel approach to leverage MDLMs for reasoning tasks, potentially outperforming standard autoregressive models in specific domains.

Market Opportunity

Large and growing market for advanced LLM capabilities.

Revenue Models

API accessspecialized AI services.

Resource Requirements

Compute Needs

High, as MDLMs are computationally intensive.

Data Requirements

Large text and code datasets, potentially structured reasoning datasets.

Deployment Constraints

High computational cost, specialized model architecture.

Scalability

Scalability is a challenge due to the computational demands of MDLMs.

Production Readiness

Maturity Level

Research

Time to Market

3-5 years for practical, widespread deployment.

View Full Paper Back to Papers