arxiv_cv 95% Match Research Paper AI Researchers,Machine Learning Engineers,Developers of generative models 1 day ago

Scalable Autoregressive Image Generation with Mamba

generative-ai › autoregressive

📄 Abstract

Abstract: We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its exceptional performance for long-sequence modeling with linear time complexity, to supplant the commonly utilized Transformers in AR image generation models, aiming to achieve both superior generation quality and enhanced inference speed. Unlike existing methods that adapt Mamba to handle two-dimensional signals via multi-directional scan, AiM directly utilizes the next-token prediction paradigm for autoregressive image generation. This approach circumvents the need for extensive modifications to enable Mamba to learn 2D spatial representations. By implementing straightforward yet strategically targeted modifications for visual generative tasks, we preserve Mamba's core structure, fully exploiting its efficient long-sequence modeling capabilities and scalability. We provide AiM models in various scales, with parameter counts ranging from 148M to 1.3B. On the ImageNet1K 256*256 benchmark, our best AiM model achieves a FID of 2.21, surpassing all existing AR models of comparable parameter counts and demonstrating significant competitiveness against diffusion models, with 2 to 10 times faster inference speed. Code is available at https://github.com/hp-l33/AiM

Authors (7)

Haopeng Li

Jinyue Yang

Kexin Wang

Xuerui Qiu

Yuhong Chou

Xin Li

+1 more

Submitted

August 22, 2024

arXiv Category

cs.CV

arXiv PDF

Key Contributions

AiM introduces a Mamba-based autoregressive model for image generation, replacing Transformers to achieve superior quality and enhanced inference speed due to Mamba's linear time complexity for long sequences. It directly uses next-token prediction for 2D signals, avoiding complex adaptations.

Business Value

Enables faster and more scalable generation of high-quality images, beneficial for applications requiring rapid content creation or large-scale synthetic data generation.

Paper Metadata

Innovation Type

Novel Architecture Application

Deployment Feasibility

High, particularly for inference, due to Mamba's linear complexity, making it suitable for resource-constrained environments or real-time applications.

Limitations Addressed

The quadratic complexity of Transformers limits scalability and inference speed in autoregressive image generation. Existing Mamba adaptations for 2D signals are complex.

Performance Gains

Enhanced inference speed,Improved generation quality (claimed)

Technical Tags

autoregressive generationimage generationMamba architecturestate-space modelslong-sequence modelinglinear time complexityscalabilityinference speed2D signal processingnext-token prediction

Research Topics

Generative AIImage GenerationDeep Learning ArchitecturesSequence ModelingEfficient AI

Methods & Architectures

AiM (Mamba-based autoregressive model)Direct next-token prediction for 2D signalsModifications for visual generative tasks MambaState-Space Models (SSMs)

Applications & Tasks

Image Synthesis Creative AI Data Augmentation High-quality Image GenerationEfficient Autoregressive Modeling Generating images autoregressively using Mamba for improved quality and speed

Related Fields

Deep LearningGenerative ModelsComputer VisionSequence Modeling

Keywords

image generationautoregressiveMambastate-space modelsefficient AIscalabilityinference speeddeep learninggenerative modelstransformers

Academic Context

#Generative AI#Image Generation#Deep Learning Architectures#Sequence Modeling#Efficient AI

Commercial Potential

Potential Products

High-performance image generation APIsTools for creating synthetic datasetsCreative AI applications

Target Industries

Media & EntertainmentGamingAdvertisingE-commerceAI Research

Use Case Examples

Generating diverse product images for online catalogsCreating unique visual assets for marketing campaignsSynthesizing training data for other computer vision models

Competitive Edge

Offers a potentially more efficient and scalable alternative to Transformer-based autoregressive models for image generation.

Market Opportunity

Rapidly growing market for generative AI and image synthesis tools.

Revenue Models

API access feessoftware licensingcloud-based generation services.

Resource Requirements

Compute Needs

Potentially lower than Transformers for training and inference due to linear complexity, but still requires significant resources for large models.

Data Requirements

Requires large-scale image datasets for training.

Deployment Constraints

The effectiveness of Mamba for 2D spatial representations needs thorough validation across various image generation tasks.

Scalability

Mamba's linear time complexity is a key advantage for scalability in long-sequence generation.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years for integration into generative AI platforms.

Patent Potential

Moderate, for the specific application and modifications of Mamba for image generation.

View Full Paper Back to Papers