arxiv_ml 95% Match Research Paper ML Researchers,Generative AI Researchers,Students in ML Theory 1 week ago

Information-Theoretic Discrete Diffusion

generative-ai › diffusion

📄 Abstract

Abstract: We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. Inspired by the I-MMSE identity for the Gaussian setup, we derive analogous results for the discrete setting. Specifically, we introduce the Information-Minimum Denoising Score Entropy (I-MDSE) relation, which links mutual information between data and its diffused version to the minimum denoising score entropy (DSE) loss. We extend this theory to masked diffusion and establish the Information-Minimum Denoising Cross-Entropy (I-MDCE) relation, connecting cross-entropy losses to mutual information in discrete masked processes. These results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses, showing that commonly used losses such as DSE and DCE are not merely variational bounds but tight and principled estimators of log-likelihood. The I-MDCE decomposition further enables practical extensions, including time-free formula, conditional likelihood estimation in prompt-response tasks, and coupled Monte Carlo estimation of likelihood ratios. Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators. The code is publicly available at https://github.com/Dongjae0324/infodis.

Authors (4)

Moongyu Jeon

Sangwoo Shin

Dongjae Jeon

Albert No

Submitted

October 28, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper introduces an information-theoretic framework for discrete diffusion models, providing principled estimators of log-likelihood using score-matching losses. It derives the Information-Minimum Denoising Score Entropy (I-MDSE) and Information-Minimum Denoising Cross-Entropy (I-MDCE) relations, showing that common losses like DSE and DCE are tight estimators of log-likelihood, not just variational bounds.

Business Value

Enables more reliable and interpretable generative models for discrete data, potentially improving applications in areas like natural language generation and molecular design.

Paper Metadata

Innovation Type

Theoretical/Algorithmic

Deployment Feasibility

High. The theoretical advancements can be directly integrated into existing discrete diffusion model implementations.

Limitations Addressed

Lack of principled estimators for log-likelihood in discrete diffusion models; understanding the relationship between common loss functions and true likelihood.

Performance Gains

Provides theoretical guarantees for log-likelihood estimation, improving the principled nature of discrete diffusion models.

Technical Tags

discrete diffusion modelsinformation theoryscore matchinglog-likelihood estimationI-MMSE identityI-MDSE relationmasked diffusionI-MDCE relationdenoising score entropy (DSE)denoising cross-entropy (DCE)

Research Topics

Generative ModelsDiffusion ModelsInformation TheoryMachine Learning TheoryLikelihood Estimation

Methods & Architectures

Information-theoretic frameworkScore-matching lossesI-MDSE relation derivationI-MDCE relation derivationTime-integral decomposition of log-likelihood Discrete Diffusion ModelsMasked Diffusion Models

Applications & Tasks

Machine Learning Generative Modeling Likelihood Estimation for Discrete DataPrincipled Loss Functions for Diffusion Models Generative ModelingDensity Estimation

Related Fields

Machine LearningGenerative ModelsDiffusion ModelsInformation TheoryProbability

Keywords

Discrete Diffusion ModelsInformation TheoryScore MatchingLog-LikelihoodGenerative ModelsI-MDSEI-MDCEDenoising Score EntropyMasked DiffusionMachine Learning Theory

Academic Context

#Generative Models#Diffusion Models#Information Theory#Machine Learning Theory#Likelihood Estimation

Commercial Potential

Potential Products

Improved generative models for discrete dataLibraries for principled likelihood estimation in diffusion models

Target Industries

TechnologyAI Research

Use Case Examples

Generating realistic text sequencesDesigning novel molecular structuresCreating synthetic datasets for discrete variables

Competitive Edge

Provides a strong theoretical foundation for discrete diffusion models, offering a more principled approach to likelihood estimation compared to ad-hoc methods.

Market Opportunity

Growing interest in diffusion models for various data types.

Revenue Models

Integration into existing generative AI platformsresearch contributions.

Resource Requirements

Compute Needs

Moderate, depends on the scale of diffusion models being trained/evaluated.

Data Requirements

Discrete data for training diffusion models.

Deployment Constraints

Requires understanding of information theory and diffusion model principles.

Scalability

Scales with the complexity and size of the discrete diffusion models.

Production Readiness

Maturity Level

Theoretical/Research

Time to Market

1-2 years

Patent Potential

Low, primarily theoretical contributions.

View Full Paper Back to Papers