Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We present an information-theoretic framework for discrete diffusion models
that yields principled estimators of log-likelihood using score-matching
losses. Inspired by the I-MMSE identity for the Gaussian setup, we derive
analogous results for the discrete setting. Specifically, we introduce the
Information-Minimum Denoising Score Entropy (I-MDSE) relation, which links
mutual information between data and its diffused version to the minimum
denoising score entropy (DSE) loss. We extend this theory to masked diffusion
and establish the Information-Minimum Denoising Cross-Entropy (I-MDCE)
relation, connecting cross-entropy losses to mutual information in discrete
masked processes. These results provide a time-integral decomposition of the
log-likelihood of the data in terms of optimal score-based losses, showing that
commonly used losses such as DSE and DCE are not merely variational bounds but
tight and principled estimators of log-likelihood. The I-MDCE decomposition
further enables practical extensions, including time-free formula, conditional
likelihood estimation in prompt-response tasks, and coupled Monte Carlo
estimation of likelihood ratios. Experiments on synthetic and real-world data
confirm the accuracy, variance stability, and utility of our estimators. The
code is publicly available at https://github.com/Dongjae0324/infodis.
Authors (4)
Moongyu Jeon
Sangwoo Shin
Dongjae Jeon
Albert No
Submitted
October 28, 2025
Key Contributions
This paper introduces an information-theoretic framework for discrete diffusion models, providing principled estimators of log-likelihood using score-matching losses. It derives the Information-Minimum Denoising Score Entropy (I-MDSE) and Information-Minimum Denoising Cross-Entropy (I-MDCE) relations, showing that common losses like DSE and DCE are tight estimators of log-likelihood, not just variational bounds.
Business Value
Enables more reliable and interpretable generative models for discrete data, potentially improving applications in areas like natural language generation and molecular design.