Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: In spite of the remarkable potential of Latent Diffusion Models (LDMs) in
image generation, the desired properties and optimal design of the autoencoders
have been underexplored. In this work, we analyze the role of autoencoders in
LDMs and identify three key properties: latent smoothness, perceptual
compression quality, and reconstruction quality. We demonstrate that existing
autoencoders fail to simultaneously satisfy all three properties, and propose
Variational Masked AutoEncoders (VMAEs), taking advantage of the hierarchical
features maintained by Masked AutoEncoders. We integrate VMAEs into the LDM
framework, introducing Latent Diffusion Models with Masked AutoEncoders
(LDMAEs). Our code is available at https://github.com/isno0907/ldmae.
Authors (4)
Junho Lee
Jeongwoo Shin
Hyungwook Choi
Joonseok Lee
Key Contributions
This paper analyzes the crucial role of autoencoders in Latent Diffusion Models (LDMs) and identifies key properties (latent smoothness, perceptual compression, reconstruction quality). It proposes Variational Masked Autoencoders (VMAEs), which leverage hierarchical features from Masked Autoencoders, to address the limitations of existing autoencoders and improve LDM performance.
Business Value
Leads to more efficient and higher-quality image generation systems, benefiting applications in digital art, content creation, and synthetic data generation. Improved latent space representations can also aid in downstream tasks.