Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: This paper presents a novel theoretical framework for understanding how
diffusion models can learn disentangled representations. Within this framework,
we establish identifiability conditions for general disentangled latent
variable models, analyze training dynamics, and derive sample complexity bounds
for disentangled latent subspace models. To validate our theory, we conduct
disentanglement experiments across diverse tasks and modalities, including
subspace recovery in latent subspace Gaussian mixture models, image
colorization, image denoising, and voice conversion for speech classification.
Additionally, our experiments show that training strategies inspired by our
theory, such as style guidance regularization, consistently enhance
disentanglement performance.