Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: In this paper we tackle a fundamental question: "Can we train latent
diffusion models together with the variational auto-encoder (VAE) tokenizer in
an end-to-end manner?" Traditional deep-learning wisdom dictates that
end-to-end training is often preferable when possible. However, for latent
diffusion transformers, it is observed that end-to-end training both VAE and
diffusion-model using standard diffusion-loss is ineffective, even causing a
degradation in final performance. We show that while diffusion loss is
ineffective, end-to-end training can be unlocked through the
representation-alignment (REPA) loss -- allowing both VAE and diffusion model
to be jointly tuned during the training process. Despite its simplicity, the
proposed training recipe (REPA-E) shows remarkable performance; speeding up
diffusion model training by over 17x and 45x over REPA and vanilla training
recipes, respectively. Interestingly, we observe that end-to-end tuning with
REPA-E also improves the VAE itself; leading to improved latent space structure
and downstream generation performance. In terms of final performance, our
approach sets a new state-of-the-art; achieving FID of 1.12 and 1.69 with and
without classifier-free guidance on ImageNet 256 x 256. Code is available at
https://end2end-diffusion.github.io.
Authors (6)
Xingjian Leng
Jaskirat Singh
Yunzhong Hou
Zhenchang Xing
Saining Xie
Liang Zheng
Key Contributions
This paper introduces REPA-E, a novel training recipe that enables end-to-end training of Latent Diffusion Models (LDMs) with their VAE tokenizer. By using a Representation Alignment (REPA) loss instead of standard diffusion loss, REPA-E significantly speeds up training (up to 45x) and improves VAE performance.
Business Value
Drastically reduces the time and computational resources needed to train powerful generative models like LDMs, making them more accessible and faster to iterate on for various applications.