arxiv_ml 95% Match Research Paper Generative AI Researchers,NLP Researchers,Machine Learning Engineers,Data Scientists 2 weeks ago

Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking

generative-ai › diffusion

📄 Abstract

Abstract: Masked diffusion models (MDM) are powerful generative models for discrete data that generate samples by progressively unmasking tokens in a sequence. Each token can take one of two states: masked or unmasked. We observe that token sequences often remain unchanged between consecutive sampling steps; consequently, the model repeatedly processes identical inputs, leading to redundant computation. To address this inefficiency, we propose the Partial masking scheme (Prime), which augments MDM by allowing tokens to take intermediate states interpolated between the masked and unmasked states. This design enables the model to make predictions based on partially observed token information, and facilitates a fine-grained denoising process. We derive a variational training objective and introduce a simple architectural design to accommodate intermediate-state inputs. Our method demonstrates superior performance across a diverse set of generative modeling tasks. On text data, it achieves a perplexity of 15.36 on OpenWebText, outperforming previous MDM (21.52), autoregressive models (17.54), and their hybrid variants (17.58), without relying on an autoregressive formulation. On image data, it attains competitive FID scores of 3.26 on CIFAR-10 and 6.98 on ImageNet-32, comparable to leading continuous generative models.

Authors (5)

Chen-Hao Chao

Wei-Fang Sun

Hanwen Liang

Chun-Yi Lee

Rahul G. Krishnan

Submitted

May 24, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper proposes a Partial Masking scheme (Prime) for discrete diffusion models, enhancing Masked Diffusion Models (MDM) by allowing tokens to take intermediate states. This 'Prime' scheme enables finer-grained denoising and reduces redundant computation by avoiding repeated processing of identical inputs. The method demonstrates superior performance across diverse generative modeling tasks, particularly on text data.

Business Value

More efficient and effective generation of discrete data, such as text, can lead to improved AI writing assistants, code generation tools, and more sophisticated content creation platforms.

Paper Metadata

Innovation Type

Algorithmic Innovation

Deployment Feasibility

The proposed scheme is integrated into existing MDM architectures with a simple design, suggesting good feasibility for implementation.

Limitations Addressed

Inefficiency in standard MDM due to repeated processing of unchanged tokens,Redundant computation in diffusion models for discrete data,Lack of fine-grained control in the denoising process

Performance Gains

Superior performance across diverse generative modeling tasks, particularly on text data, due to improved efficiency and fine-grained denoising.

Technical Tags

Discrete Diffusion ModelsPartial MaskingMasked Diffusion Models (MDM)Generative ModelingSequence GenerationText GenerationVariational TrainingIntermediate StatesDenoising

Research Topics

Generative AIDiffusion ModelsDiscrete Data GenerationSequence ModelingMachine Learning Efficiency

Methods & Architectures

Partial Masking scheme (Prime)Augmenting MDM with intermediate token statesVariational training objectiveArchitectural design for intermediate states Masked Diffusion Model (MDM)Discrete Diffusion Model

Applications & Tasks

Natural Language Processing Generative AI Sequence Modeling Improving efficiency of discrete diffusion modelsReducing redundant computation in MDMEnabling fine-grained denoising processes Generating discrete sequences (e.g., text)

Related Fields

Machine LearningDeep LearningGenerative ModelsNatural Language ProcessingDiffusion Models

Keywords

Discrete DiffusionMasked Diffusion ModelsPartial MaskingGenerative AIText GenerationSequence ModelingDenoisingEfficiencyPrimeIntermediate StatesVariational Training

Academic Context

#Generative AI#Diffusion Models#Discrete Data Generation#Sequence Modeling#Machine Learning Efficiency

Commercial Potential

Potential Products

Advanced text generation modelsMore efficient discrete generative systems

Target Industries

TechnologyMediaPublishingSoftware Development

Use Case Examples

Generating creative text contentAssisting in code writingCreating synthetic datasets for training other models

Competitive Edge

Offers a significant efficiency improvement over standard MDMs by introducing partial masking, leading to better performance and faster generation of discrete sequences.

Market Opportunity

Large and growing market for generative AI, particularly in text generation.

Revenue Models

Improved performance and efficiency of AI-powered content generation and text-based services.

Resource Requirements

Compute Needs

Reduced computational requirements during inference compared to standard MDM due to efficiency gains.

Data Requirements

Requires datasets suitable for discrete sequence generation tasks (e.g., text corpora).

Deployment Constraints

Training complexity,Integration into existing generation pipelines

Scalability

The efficiency improvements suggest better scalability for generation tasks.

Production Readiness

Maturity Level

Research/Algorithmic

Time to Market

Medium term, as it offers a direct improvement to existing generative models.

Patent Potential

Potential for patents on the partial masking scheme and related architectures.

View Full Paper Back to Papers