arxiv_cv 97% Match Research Paper AI researchers,Machine learning engineers,Developers of generative AI tools 2 weeks ago

Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

generative-ai › autoregressive

📄 Abstract

Abstract: While inference-time scaling through search has revolutionized Large Language Models, translating these gains to image generation has proven difficult. Recent attempts to apply search strategies to continuous diffusion models show limited benefits, with simple random sampling often performing best. We demonstrate that the discrete, sequential nature of visual autoregressive models enables effective search for image generation. We show that beam search substantially improves text-to-image generation, enabling a 2B parameter autoregressive model to outperform a 12B parameter diffusion model across benchmarks. Systematic ablations show that this advantage comes from the discrete token space, which allows early pruning and computational reuse, and our verifier analysis highlights trade-offs between speed and reasoning capability. These findings suggest that model architecture, not just scale, is critical for inference-time optimization in visual generation.

Authors (3)

Erik Riise

Mehmet Onurcan Kaya

Dim P. Papadopoulos

Submitted

October 19, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Demonstrates that visual autoregressive models, due to their discrete token space, are more amenable to inference-time scaling via search strategies like beam search compared to diffusion models. This allows smaller autoregressive models to outperform larger diffusion models in text-to-image generation.

Business Value

Significantly reduces the computational cost and time required for generating high-quality images, making advanced image generation more accessible for various applications.

Paper Metadata

Innovation Type

Algorithmic/Architectural

Deployment Feasibility

High. The findings suggest architectural choices can lead to more efficient deployment, especially for real-time applications.

Limitations Addressed

Limited benefits of search strategies for diffusion models,Difficulty in achieving inference-time scaling for image generation,Performance gap between smaller autoregressive models and larger diffusion models

Performance Gains

Enables a 2B parameter autoregressive model to outperform a 12B parameter diffusion model across benchmarks, primarily through improved inference time scaling.

Technical Tags

visual autoregressive modelsdiffusion modelsinference time scalingbeam searchtext-to-image generationdiscrete token spacecomputational reusemodel architecture

Research Topics

Generative ModelsImage GenerationModel EfficiencyDeep Learning ArchitecturesComputational Optimization

Methods & Architectures

Beam SearchVisual Autoregressive ModelsDiffusion ModelsText-to-Image Generation Visual Autoregressive ModelsDiffusion Models

Applications & Tasks

Computer Graphics Creative Arts Content Generation Image GenerationInference Speed OptimizationText-to-Image Synthesis Generating high-quality images from text promptsImproving inference time for image generation models

Datasets & Benchmarks

Benchmarks

Outperforms a 12B parameter diffusion model with a 2B parameter autoregressive model across benchmarks.

Image generation qualityInference speedText-to-image performance metrics

Related Fields

Generative AIDeep LearningComputer VisionNatural Language Processing

Keywords

autoregressive modelsdiffusion modelsimage generationinference timebeam searchtext-to-imagecomputational efficiencymodel architecturegenerative AIdiscrete tokensscaling

Academic Context

#Generative Models#Image Generation#Model Efficiency#Deep Learning Architectures#Computational Optimization

Commercial Potential

Potential Products

Faster text-to-image generation servicesMore efficient creative tools for artists and designers

Target Industries

Media and EntertainmentAdvertisingGamingDesign

Use Case Examples

Rapid prototyping of visual conceptsGenerating diverse visual assets for marketing campaignsAssisting artists with image creation

Competitive Edge

Challenges the dominance of diffusion models by highlighting the efficiency and scalability advantages of autoregressive models for specific image generation tasks.

Market Opportunity

Rapidly growing market for generative AI and image synthesis tools.

Revenue Models

API accessSaaS platforms

Resource Requirements

Compute Needs

Moderate to High (depending on model size and task)

Data Requirements

Large-scale image-text datasets.

Deployment Constraints

Model size and inference speed are key considerations.

Scalability

Autoregressive models show better scalability for inference through search.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years

Patent Potential

Low

View Full Paper Back to Papers