Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: High-resolution image synthesis remains a core challenge in generative
modeling, particularly in balancing computational efficiency with the
preservation of fine-grained visual detail. We present Latent Wavelet Diffusion
(LWD), a lightweight training framework that significantly improves detail and
texture fidelity in ultra-high-resolution (2K-4K) image synthesis. LWD
introduces a novel, frequency-aware masking strategy derived from wavelet
energy maps, which dynamically focuses the training process on detail-rich
regions of the latent space. This is complemented by a scale-consistent VAE
objective to ensure high spectral fidelity. The primary advantage of our
approach is its efficiency: LWD requires no architectural modifications and
adds zero additional cost during inference, making it a practical solution for
scaling existing models. Across multiple strong baselines, LWD consistently
improves perceptual quality and FID scores, demonstrating the power of
signal-driven supervision as a principled and efficient path toward
high-resolution generative modeling.