Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Diffusion Transformer models can generate images with remarkable fidelity and
detail, yet training them at ultra-high resolutions remains extremely costly
due to the self-attention mechanism's quadratic scaling with the number of
image tokens. In this paper, we introduce Dynamic Position Extrapolation
(DyPE), a novel, training-free method that enables pre-trained diffusion
transformers to synthesize images at resolutions far beyond their training
data, with no additional sampling cost. DyPE takes advantage of the spectral
progression inherent to the diffusion process, where low-frequency structures
converge early, while high-frequencies take more steps to resolve.
Specifically, DyPE dynamically adjusts the model's positional encoding at each
diffusion step, matching their frequency spectrum with the current stage of the
generative process. This approach allows us to generate images at resolutions
that exceed the training resolution dramatically, e.g., 16 million pixels using
FLUX. On multiple benchmarks, DyPE consistently improves performance and
achieves state-of-the-art fidelity in ultra-high-resolution image generation,
with gains becoming even more pronounced at higher resolutions. Project page is
available at https://noamissachar.github.io/DyPE/.
Authors (6)
Noam Issachar
Guy Yariv
Sagie Benaim
Yossi Adi
Dani Lischinski
Raanan Fattal
Submitted
October 23, 2025
Key Contributions
Introduces Dynamic Position Extrapolation (DyPE), a novel, training-free method that allows pre-trained diffusion transformers to generate images at ultra-high resolutions far beyond their training data, without additional sampling cost. DyPE dynamically adjusts positional encodings based on the diffusion process stage.
Business Value
Democratizes the creation of ultra-high resolution imagery, enabling applications in fields requiring extreme detail, such as high-fidelity art, detailed scientific visualizations, and immersive virtual environments.