Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Diffusion models have achieved impressive success in high-fidelity image
generation but suffer from slow sampling due to their inherently iterative
denoising process. While recent one-step methods accelerate inference by
learning direct noise-to-image mappings, they sacrifice the interpretability
and fine-grained control intrinsic to diffusion dynamics, key advantages that
enable applications like editable generation. To resolve this dichotomy, we
introduce \textbf{Hierarchical Koopman Diffusion}, a novel framework that
achieves both one-step sampling and interpretable generative trajectories.
Grounded in Koopman operator theory, our method lifts the nonlinear diffusion
dynamics into a latent space where evolution is governed by globally linear
operators, enabling closed-form trajectory solutions. This formulation not only
eliminates iterative sampling but also provides full access to intermediate
states, allowing manual intervention during generation. To model the
multi-scale nature of images, we design a hierarchical architecture that
disentangles generative dynamics across spatial resolutions via scale-specific
Koopman subspaces, capturing coarse-to-fine details systematically. We
empirically show that the Hierarchical Koopman Diffusion not only achieves
competitive one-step generation performance but also provides a principled
mechanism for interpreting and manipulating the generative process through
spectral analysis. Our framework bridges the gap between fast sampling and
interpretability in diffusion models, paving the way for explainable image
synthesis in generative modeling.
Authors (3)
Hanru Bai
Weiyang Ding
Difan Zou
Submitted
October 14, 2025
Key Contributions
Introduces Hierarchical Koopman Diffusion (HKD), a novel framework that achieves both one-step sampling and interpretable generative trajectories for diffusion models. By grounding in Koopman operator theory, HKD lifts nonlinear diffusion dynamics into a latent space governed by linear operators, enabling closed-form solutions and full access to intermediate states for manual intervention.
Business Value
Enables faster and more controllable image generation, which can be valuable for creative industries, content creation, and personalized design applications.