Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
π Abstract
Abstract: Diffusion models have achieved remarkable success across a wide range of
generative tasks. A key challenge is understanding the mechanisms that prevent
their memorization of training data and allow generalization. In this work, we
investigate the role of the training dynamics in the transition from
generalization to memorization. Through extensive experiments and theoretical
analysis, we identify two distinct timescales: an early time
$\tau_\mathrm{gen}$ at which models begin to generate high-quality samples, and
a later time $\tau_\mathrm{mem}$ beyond which memorization emerges. Crucially,
we find that $\tau_\mathrm{mem}$ increases linearly with the training set size
$n$, while $\tau_\mathrm{gen}$ remains constant. This creates a growing window
of training times with $n$ where models generalize effectively, despite showing
strong memorization if training continues beyond it. It is only when $n$
becomes larger than a model-dependent threshold that overfitting disappears at
infinite training times. These findings reveal a form of implicit dynamical
regularization in the training dynamics, which allow to avoid memorization even
in highly overparameterized settings. Our results are supported by numerical
experiments with standard U-Net architectures on realistic and synthetic
datasets, and by a theoretical analysis using a tractable random features model
studied in the high-dimensional limit.
Authors (4)
Tony Bonnaire
RaphaΓ«l Urfin
Giulio Biroli
Marc MΓ©zard
Key Contributions
This paper provides a theoretical and empirical explanation for why diffusion models generalize well and resist memorization. It identifies distinct training timescales (generalization vs. memorization) and demonstrates that memorization emerges only after a certain training duration, which itself scales linearly with dataset size, creating a robust window for generalization.
Business Value
Deepens the understanding of diffusion models, enabling more reliable development and deployment of generative AI applications with guaranteed generalization, crucial for creative industries, content generation, and synthetic data creation.