Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper deep learning researchers,generative AI practitioners,machine learning theorists 1 week ago

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

generative-ai β€Ί diffusion
πŸ“„ Abstract

Abstract: Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data and allow generalization. In this work, we investigate the role of the training dynamics in the transition from generalization to memorization. Through extensive experiments and theoretical analysis, we identify two distinct timescales: an early time $\tau_\mathrm{gen}$ at which models begin to generate high-quality samples, and a later time $\tau_\mathrm{mem}$ beyond which memorization emerges. Crucially, we find that $\tau_\mathrm{mem}$ increases linearly with the training set size $n$, while $\tau_\mathrm{gen}$ remains constant. This creates a growing window of training times with $n$ where models generalize effectively, despite showing strong memorization if training continues beyond it. It is only when $n$ becomes larger than a model-dependent threshold that overfitting disappears at infinite training times. These findings reveal a form of implicit dynamical regularization in the training dynamics, which allow to avoid memorization even in highly overparameterized settings. Our results are supported by numerical experiments with standard U-Net architectures on realistic and synthetic datasets, and by a theoretical analysis using a tractable random features model studied in the high-dimensional limit.
Authors (4)
Tony Bonnaire
RaphaΓ«l Urfin
Giulio Biroli
Marc MΓ©zard
Submitted
May 23, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

This paper provides a theoretical and empirical explanation for why diffusion models generalize well and resist memorization. It identifies distinct training timescales (generalization vs. memorization) and demonstrates that memorization emerges only after a certain training duration, which itself scales linearly with dataset size, creating a robust window for generalization.

Business Value

Deepens the understanding of diffusion models, enabling more reliable development and deployment of generative AI applications with guaranteed generalization, crucial for creative industries, content generation, and synthetic data creation.