Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Flow Matching (FM) underpins many state-of-the-art generative models, yet
recent results indicate that Transition Matching (TM) can achieve higher
quality with fewer sampling steps. This work answers the question of when and
why TM outperforms FM. First, when the target is a unimodal Gaussian
distribution, we prove that TM attains strictly lower KL divergence than FM for
finite number of steps. The improvement arises from stochastic difference
latent updates in TM, which preserve target covariance that deterministic FM
underestimates. We then characterize convergence rates, showing that TM
achieves faster convergence than FM under a fixed compute budget, establishing
its advantage in the unimodal Gaussian setting. Second, we extend the analysis
to Gaussian mixtures and identify local-unimodality regimes in which the
sampling dynamics approximate the unimodal case, where TM can outperform FM.
The approximation error decreases as the minimal distance between component
means increases, highlighting that TM is favored when the modes are well
separated. However, when the target variance approaches zero, each TM update
converges to the FM update, and the performance advantage of TM diminishes. In
summary, we show that TM outperforms FM when the target distribution has
well-separated modes and non-negligible variances. We validate our theoretical
results with controlled experiments on Gaussian distributions, and extend the
comparison to real-world applications in image and video generation.
Authors (4)
Jaihoon Kim
Rajarshi Saha
Minhyuk Sung
Youngsuk Park
Submitted
October 20, 2025
Key Contributions
Provides a theoretical analysis explaining when and why Transition Matching (TM) outperforms Flow Matching (FM). It proves TM achieves lower KL divergence and faster convergence for unimodal Gaussians by preserving target covariance, and identifies regimes where TM excels for Gaussian mixtures.
Business Value
Leads to more efficient and higher-quality generative models, enabling faster and better data synthesis for applications like synthetic data generation, creative content creation, and simulation.