Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper AI Researchers,Computer Vision Engineers,Graphics Developers,ML Engineers 2 weeks ago

NFIG: Autoregressive Image Generation with Next-Frequency Prediction

generative-ai › autoregressive
📄 Abstract

Abstract: Autoregressive models have achieved significant success in image generation. However, unlike the inherent hierarchical structure of image information in the spectral domain, standard autoregressive methods typically generate pixels sequentially in a fixed spatial order. To better leverage this spectral hierarchy, we introduce NextFrequency Image Generation (NFIG). NFIG is a novel framework that decomposes the image generation process into multiple frequency-guided stages. NFIG aligns the generation process with the natural image structure. It does this by first generating low-frequency components, which efficiently capture global structure with significantly fewer tokens, and then progressively adding higher-frequency details. This frequency-aware paradigm offers substantial advantages: it not only improves the quality of generated images but crucially reduces inference cost by efficiently establishing global structure early on. Extensive experiments on the ImageNet-256 benchmark validate NFIG's effectiveness, demonstrating superior performance (FID: 2.81) and a notable 1.25x speedup compared to the strong baseline VAR-d20. The source code is available at https://github.com/Pride-Huang/NFIG.
Authors (8)
Zhihao Huang
Xi Qiu
Yukuo Ma
Yifu Zhou
Junjie Chen
Hongyuan Zhang
+2 more
Submitted
March 10, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

NFIG introduces a novel autoregressive image generation framework that leverages the spectral hierarchy of images by decomposing generation into frequency-guided stages. By first generating low-frequency components for global structure and then adding high-frequency details, NFIG improves image quality and significantly reduces inference cost compared to standard spatial-order autoregressive methods.

Business Value

Enables faster and more efficient generation of high-quality synthetic images for applications like game development, virtual reality, and creative content creation, potentially lowering production costs.