arxiv_ml 95% Match Research Paper AI Researchers,ML Engineers,Computer Vision Engineers 1 week ago

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching

generative-ai › autoregressive

📄 Abstract

Abstract: Autoregressive (AR) models have achieved state-of-the-art performance in text and image generation but suffer from slow generation due to the token-by-token process. We ask an ambitious question: can a pre-trained AR model be adapted to generate outputs in just one or two steps? If successful, this would significantly advance the development and deployment of AR models. We notice that existing works that try to speed up AR generation by generating multiple tokens at once fundamentally cannot capture the output distribution due to the conditional dependencies between tokens, limiting their effectiveness for few-step generation. To address this, we propose Distilled Decoding (DD), which uses flow matching to create a deterministic mapping from Gaussian distribution to the output distribution of the pre-trained AR model. We then train a network to distill this mapping, enabling few-step generation. DD doesn't need the training data of the original AR model, making it more practical. We evaluate DD on state-of-the-art image AR models and present promising results on ImageNet-256. For VAR, which requires 10-step generation, DD enables one-step generation (6.3$\times$ speed-up), with an acceptable increase in FID from 4.19 to 9.96. For LlamaGen, DD reduces generation from 256 steps to 1, achieving an 217.8$\times$ speed-up with a comparable FID increase from 4.11 to 11.35. In both cases, baseline methods completely fail with FID>100. DD also excels on text-to-image generation, reducing the generation from 256 steps to 2 for LlamaGen with minimal FID increase from 25.70 to 28.95. As the first work to demonstrate the possibility of one-step generation for image AR models, DD challenges the prevailing notion that AR models are inherently slow, and opens up new opportunities for efficient AR generation. The project website is at https://imagination-research.github.io/distilled-decoding.

Authors (4)

Enshu Liu

Xuefei Ning

Yu Wang

Zinan Lin

Submitted

December 22, 2024

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Proposes Distilled Decoding (DD), a method using flow matching to create a deterministic mapping from Gaussian noise to the output distribution of a pre-trained autoregressive model, enabling few-step generation. This approach significantly speeds up generation without needing the original training data.

Business Value

Dramatically reduces generation time for high-quality images, making AR models more practical for real-time applications and reducing computational costs.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

High, as it directly addresses a major deployment bottleneck (speed).

Limitations Addressed

Addresses the slow, token-by-token generation process of autoregressive models, which limits their development and deployment.

Performance Gains

Enables generation in just one or two steps, significantly advancing development and deployment of AR models.

Technical Tags

autoregressive modelsimage generationflow matchingdistilled decodingone-step samplingfew-step generationdeterministic mappingdeep learning

Research Topics

Generative ModelsImage GenerationEfficient SamplingDeep Learning

Methods & Architectures

Distilled Decoding (DD)Flow MatchingNeural Network Distillation Autoregressive ModelsFlow Models

Applications & Tasks

Image Generation Computer Vision Generative AI Slow GenerationSampling Efficiency Image GenerationFew-step Sampling

Related Fields

Generative ModelsDeep LearningComputer VisionMachine Learning Theory

Keywords

autoregressive modelsimage generationflow matchingdistilled decodingsamplinggeneration speeddeep learningfew-stepdeterministicone-stepgenerative AI

Academic Context

#Generative Models#Image Generation#Efficient Sampling#Deep Learning

Commercial Potential

Potential Products

Real-time image generation toolsAccelerated generative AI platforms

Target Industries

MediaEntertainmentAdvertisingDesign

Use Case Examples

Generating diverse image variations rapidlyEnabling interactive image creation toolsAccelerating content creation pipelines

Competitive Edge

Offers a novel approach to accelerate autoregressive generation, overcoming limitations of existing multi-token generation methods.

Market Opportunity

Large and growing market for generative AI and image synthesis.

Revenue Models

Licensing of the DD techniqueintegration into generative AI services.

Resource Requirements

Compute Needs

Training the distillation network requires compute; inference is significantly faster.

Data Requirements

Does not require the original AR model's training data, but requires data to train the distillation network.

Scalability

The method's focus on speed suggests good scalability for deployment.

Production Readiness

Maturity Level

Research

Time to Market

Medium

View Full Paper Back to Papers