arxiv_ai 95% Match Research Paper AI Researchers,Computer Vision Engineers,Graphics Developers,ML Engineers 2 weeks ago

NFIG: Autoregressive Image Generation with Next-Frequency Prediction

generative-ai › autoregressive

📄 Abstract

Abstract: Autoregressive models have achieved significant success in image generation. However, unlike the inherent hierarchical structure of image information in the spectral domain, standard autoregressive methods typically generate pixels sequentially in a fixed spatial order. To better leverage this spectral hierarchy, we introduce NextFrequency Image Generation (NFIG). NFIG is a novel framework that decomposes the image generation process into multiple frequency-guided stages. NFIG aligns the generation process with the natural image structure. It does this by first generating low-frequency components, which efficiently capture global structure with significantly fewer tokens, and then progressively adding higher-frequency details. This frequency-aware paradigm offers substantial advantages: it not only improves the quality of generated images but crucially reduces inference cost by efficiently establishing global structure early on. Extensive experiments on the ImageNet-256 benchmark validate NFIG's effectiveness, demonstrating superior performance (FID: 2.81) and a notable 1.25x speedup compared to the strong baseline VAR-d20. The source code is available at https://github.com/Pride-Huang/NFIG.

Authors (8)

Zhihao Huang

Xi Qiu

Yukuo Ma

Yifu Zhou

Junjie Chen

Hongyuan Zhang

+2 more

Submitted

March 10, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

NFIG introduces a novel autoregressive image generation framework that leverages the spectral hierarchy of images by decomposing generation into frequency-guided stages. By first generating low-frequency components for global structure and then adding high-frequency details, NFIG improves image quality and significantly reduces inference cost compared to standard spatial-order autoregressive methods.

Business Value

Enables faster and more efficient generation of high-quality synthetic images for applications like game development, virtual reality, and creative content creation, potentially lowering production costs.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

Moderate, requires GPU resources for generation.

Limitations Addressed

Standard autoregressive models generate pixels sequentially in a fixed spatial order,Failure to leverage the hierarchical structure of image information in the spectral domain,High inference cost and slow generation,Inefficient establishment of global structure

Performance Gains

Improves image quality and reduces inference cost.

Technical Tags

Autoregressive Image GenerationNext-Frequency PredictionSpectral HierarchyFrequency-guided StagesLow-frequency ComponentsHigh-frequency DetailsInference Cost ReductionGlobal StructureImageNet-256Generative Models

Research Topics

Computer VisionGenerative ModelsDeep LearningImage SynthesisMachine Learning

Methods & Architectures

Frequency DecompositionMulti-stage GenerationAutoregressive ModelingNext-Frequency Prediction Autoregressive ModelsFrequency-guided Network

Applications & Tasks

Image Synthesis Computer Graphics Creative AI Data Augmentation Fixed Spatial Order GenerationInefficient Capture of Spectral HierarchyHigh Inference CostDifficulty in Establishing Global Structure Early Image GenerationSynthesizing High-Quality ImagesEfficient Image Synthesis

Datasets & Benchmarks

Datasets

ImageNet-256

Benchmarks

ImageNet-256 benchmark

Related Fields

Computer VisionGenerative AIDeep LearningImage ProcessingSignal Processing

Keywords

Image GenerationAutoregressive ModelsFrequency DomainSpectral AnalysisGenerative AIDeep LearningNFIGInference SpeedGlobal StructureImage SynthesisComputer VisionLow-frequency

Academic Context

#Computer Vision#Generative Models#Deep Learning#Image Synthesis#Machine Learning

Commercial Potential

Potential Products

High-fidelity image generation toolsAI-powered asset creation platformsTools for generating synthetic datasets

Target Industries

GamingMedia & EntertainmentAdvertisingE-commerceVirtual Reality

Use Case Examples

Generating realistic textures for 3D modelsCreating diverse datasets for training other AI modelsProducing unique artwork and visual content

Competitive Edge

Offers a more efficient approach to autoregressive image generation by exploiting spectral properties, leading to faster inference and potentially higher quality compared to purely spatial methods.

Market Opportunity

Growing market for generative AI tools and synthetic media.

Revenue Models

Licensing of the NFIG model/APIintegration into creative software.

Resource Requirements

Compute Needs

High (for training and generation)

Data Requirements

Large image datasets (e.g., ImageNet).

Deployment Constraints

Requires significant computational resources.

Scalability

Scalability in terms of image resolution and complexity is a key advantage.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years

Patent Potential

Moderate

View Full Paper Back to Papers