arxiv_cv 85% Match Research Paper Machine Learning Researchers,Generative AI Developers,Computer Vision Engineers 2 days ago

Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals

generative-ai › diffusion

📄 Abstract

Abstract: Distribution Matching Distillation (DMD) distills score-based generative models into efficient one-step generators, without requiring a one-to-one correspondence with the sampling trajectories of their teachers. However, limited model capacity causes one-step distilled models underperform on complex generative tasks, e.g., synthesizing intricate object motions in text-to-video generation. Directly extending DMD to multi-step distillation increases memory usage and computational depth, leading to instability and reduced efficiency. While prior works propose stochastic gradient truncation as a potential solution, we observe that it substantially reduces the generation diversity of multi-step distilled models, bringing it down to the level of their one-step counterparts. To address these limitations, we propose Phased DMD, a multi-step distillation framework that bridges the idea of phase-wise distillation with Mixture-of-Experts (MoE), reducing learning difficulty while enhancing model capacity. Phased DMD is built upon two key ideas: progressive distribution matching and score matching within subintervals. First, our model divides the SNR range into subintervals, progressively refining the model to higher SNR levels, to better capture complex distributions. Next, to ensure the training objective within each subinterval is accurate, we have conducted rigorous mathematical derivations. We validate Phased DMD by distilling state-of-the-art image and video generation models, including Qwen-Image (20B parameters) and Wan2.2 (28B parameters). Experimental results demonstrate that Phased DMD preserves output diversity better than DMD while retaining key generative capabilities. We will release our code and models.

Authors (9)

Xiangyu Fan

Zesong Qiu

Zhuguanyu Wu

Fanzhou Wang

Zhiqian Lin

Tianxiang Ren

+3 more

Submitted

October 31, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Phased DMD is a multi-step distillation framework that combines phase-wise distillation with Mixture-of-Experts (MoE) to improve the performance of distilled generative models. It addresses the limitations of one-step distillation (limited capacity) and multi-step distillation (instability, reduced diversity) by reducing learning difficulty and enhancing generation quality.

Business Value

Enables the creation of more capable and efficient generative models for tasks like video synthesis, potentially lowering the barrier to entry for high-quality content creation tools.

Paper Metadata

Innovation Type

Algorithmic Framework

Deployment Feasibility

Moderate, requires understanding of distillation techniques and MoE architectures.

Limitations Addressed

Underperformance of one-step distilled models on complex tasks, instability and reduced efficiency of multi-step distillation, and reduced generation diversity caused by stochastic gradient truncation.

Performance Gains

Improves generation quality and diversity for multi-step distilled models compared to prior methods.

Technical Tags

distribution matching distillationscore matchinggenerative modelsmulti-step distillationMixture-of-Expertstext-to-video generationfew-step generationmodel compression

Research Topics

Generative Model DistillationScore-Based ModelsEfficient Generative AIVideo Generation

Methods & Architectures

Phased DMDDistribution Matching Distillation (DMD)Score MatchingMixture-of-Experts (MoE)Phase-wise distillation Score-based generative modelsMixture-of-Experts (MoE)

Applications & Tasks

Generative AI Computer Vision Multimedia Generation Content Creation Efficient DistillationImproving Generator CapacityStable Multi-step DistillationGenerating Complex Motions Text-to-Video GenerationOne-step Generator DistillationMulti-step Generator Distillation

Related Fields

Machine LearningGenerative ModelsComputer VisionDeep Learning

Keywords

distribution matching distillationscore matchinggenerative modelsmulti-step distillationMixture-of-Expertstext-to-videofew-stepmodel compressiondeep learningAI

Academic Context

#Generative Model Distillation#Score-Based Models#Efficient Generative AI#Video Generation

Commercial Potential

Potential Products

Efficient video generation modelsTools for distilling large generative modelsFrameworks for text-to-video synthesis

Target Industries

Media and EntertainmentAdvertisingGamingContent Creation Platforms

Use Case Examples

Generating short video clips from text descriptionsCreating animated sequences for marketing materialsDeveloping tools for rapid prototyping of visual content

Competitive Edge

Offers a novel multi-step distillation approach that balances efficiency, stability, and generation quality, potentially outperforming existing DMD techniques and single-step distillation.

Market Opportunity

Rapidly growing market for generative AI, especially in video and multimedia.

Revenue Models

Licensing of distilled modelsAPI services for content generation.

Resource Requirements

Compute Needs

Requires significant compute for training teacher models and for the distillation process.

Data Requirements

Requires datasets suitable for the target generative task (e.g., text-video pairs).

Deployment Constraints

Complexity of the MoE architecture might impact inference efficiency.

Scalability

Scalability depends on the MoE implementation and the underlying score-based model.

Production Readiness

Maturity Level

Research

Time to Market

Medium (requires implementation and optimization for specific applications)

View Full Paper Back to Papers