Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 97% Match Research Paper AI researchers,Machine learning engineers,Developers of generative AI tools 2 weeks ago

Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling

generative-ai › autoregressive
📄 Abstract

Abstract: While inference-time scaling through search has revolutionized Large Language Models, translating these gains to image generation has proven difficult. Recent attempts to apply search strategies to continuous diffusion models show limited benefits, with simple random sampling often performing best. We demonstrate that the discrete, sequential nature of visual autoregressive models enables effective search for image generation. We show that beam search substantially improves text-to-image generation, enabling a 2B parameter autoregressive model to outperform a 12B parameter diffusion model across benchmarks. Systematic ablations show that this advantage comes from the discrete token space, which allows early pruning and computational reuse, and our verifier analysis highlights trade-offs between speed and reasoning capability. These findings suggest that model architecture, not just scale, is critical for inference-time optimization in visual generation.
Authors (3)
Erik Riise
Mehmet Onurcan Kaya
Dim P. Papadopoulos
Submitted
October 19, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Demonstrates that visual autoregressive models, due to their discrete token space, are more amenable to inference-time scaling via search strategies like beam search compared to diffusion models. This allows smaller autoregressive models to outperform larger diffusion models in text-to-image generation.

Business Value

Significantly reduces the computational cost and time required for generating high-quality images, making advanced image generation more accessible for various applications.