arxiv_ai 95% Match Research Paper Generative AI Researchers,NLP Researchers,Machine Learning Engineers,Deep Learning Scientists 2 weeks ago

Latent Discrete Diffusion Models

generative-ai › diffusion

📄 Abstract

Abstract: We study discrete diffusion for language and other categorical data and focus on a common limitation of masked denoisers: reverse transitions typically factorize across positions, which can weaken joint structure and degrade quality in few-step generation. We propose \emph{Latent Discrete Diffusion Models} (LDDMs), which couple a masked discrete diffusion over tokens with a continuous diffusion over latent embeddings. The latent channel provides a softer signal and carries cross-token dependencies that help resolve ambiguities. We present two instantiations: (i) FUJI-LDDMs, which perform fully joint denoising of tokens and latents, and (ii) SEQ-LDDMs, which sequentially resolve the latent and then the discrete chain conditionally on it. For both variants we derive ELBO-style objectives and discuss design choices to learn informative latents yet amenable to diffusoin modeling. In experiments, LDDMs yield improvements on unconditional generation metrics as compared to state-of-the-art masked discrete diffusion baselines, and are effective at lower sampling budgets, where unmasking many tokens per step is desirable.

Authors (3)

Dario Shariatian

Alain Durmus

Stefano Peluchetti

Submitted

October 20, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Latent Discrete Diffusion Models (LDDMs) address limitations in discrete diffusion by coupling a masked discrete diffusion over tokens with a continuous diffusion over latent embeddings. This latent channel captures cross-token dependencies, improving joint structure and few-step generation quality for categorical data like language, outperforming existing methods.

Business Value

Enables the generation of more coherent and higher-quality text and other discrete data, potentially leading to better AI writing assistants, creative tools, and more robust data augmentation techniques.

Paper Metadata

Innovation Type

Algorithmic Framework

Deployment Feasibility

Moderate. Requires significant computational resources for training diffusion models, but the approach offers improved generation quality.

Limitations Addressed

Factorization across positions in discrete diffusion models,Weak joint structure and degraded quality in few-step generation,Difficulty in learning informative latents for diffusion models

Performance Gains

Improvements on unconditional generation metrics compared to state-of-the-art.

Technical Tags

discrete diffusionlatent diffusion modelscategorical datalanguage generationmasked denoiserscontinuous diffusionlatent embeddingscross-token dependenciesfew-step generationELBO objective

Research Topics

Generative ModelsDiffusion ModelsNatural Language ProcessingDeep LearningRepresentation Learning

Methods & Architectures

Latent Discrete Diffusion Models (LDDMs)Coupled discrete and continuous diffusionJoint denoising (FUJI-LDDMs)Sequential denoising (SEQ-LDDMs)ELBO-style objectives Latent Diffusion ModelsDiscrete Diffusion Models

Applications & Tasks

Natural Language Generation Text Generation Categorical Data Generation Weak joint structure in discrete diffusion modelsDegraded quality in few-step generationFactorization across positions in reverse transitionsLearning informative latents for diffusion Generating high-quality discrete data (e.g., text)Improving few-step generation qualityCapturing cross-token dependencies in generative models

Related Fields

Machine LearningDeep LearningNatural Language ProcessingGenerative AIDiffusion Models

Keywords

discrete diffusionlatent diffusioncategorical datalanguage generationtext generationdiffusion modelsgenerative modelsfew-step generationcross-token dependencyELBOmasked denoiser

Academic Context

#Generative Models#Diffusion Models#Natural Language Processing#Deep Learning#Representation Learning

Commercial Potential

Potential Products

Advanced text generation modelsTools for creative writing and content generationData augmentation systems for discrete data

Target Industries

TechnologyMediaPublishingMarketing

Use Case Examples

Generating coherent and contextually relevant long-form textCreating diverse synthetic datasets for training NLP modelsDeveloping AI-powered writing assistants

Competitive Edge

Offers an advancement over standard discrete diffusion models by incorporating latent continuous diffusion to better capture dependencies and improve generation quality, particularly for text.

Market Opportunity

Rapidly growing market for generative AI and NLP solutions.

Revenue Models

SaaS platforms for text generationAPI access to modelslicensing of technology.

Resource Requirements

Compute Needs

High (for training diffusion models)

Data Requirements

Large text corpora or other categorical datasets.

Deployment Constraints

Computational cost of inference, generation quality consistency.

Scalability

Scalability depends on the efficiency of the underlying diffusion process and latent space modeling.

Regulatory Considerations

Potential for misuse in generating misinformation or spam.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years (for integration into generative AI products)

Patent Potential

Moderate (novel combination of discrete and continuous diffusion)

View Full Paper Back to Papers