Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Recently, Flow Matching models have pushed the boundaries of high-fidelity
data generation across a wide range of domains. It typically employs a single
large network to learn the entire generative trajectory from noise to data.
Despite their effectiveness, this design struggles to capture distinct signal
characteristics across timesteps simultaneously and incurs substantial
inference costs due to the iterative evaluation of the entire model. To address
these limitations, we propose Blockwise Flow Matching (BFM), a novel framework
that partitions the generative trajectory into multiple temporal segments, each
modeled by smaller but specialized velocity blocks. This blockwise design
enables each block to specialize effectively in its designated interval,
improving inference efficiency and sample quality. To further enhance
generation fidelity, we introduce a Semantic Feature Guidance module that
explicitly conditions velocity blocks on semantically rich features aligned
with pretrained representations. Additionally, we propose a lightweight Feature
Residual Approximation strategy that preserves semantic quality while
significantly reducing inference cost. Extensive experiments on ImageNet
256x256 demonstrate that BFM establishes a substantially improved Pareto
frontier over existing Flow Matching methods, achieving 2.1x to 4.9x
accelerations in inference complexity at comparable generation performance.
Code is available at https://github.com/mlvlab/BFM.
Authors (4)
Dogyun Park
Taehoon Lee
Minseok Joo
Hyunwoo J. Kim
Submitted
October 24, 2025
Key Contributions
Blockwise Flow Matching (BFM) partitions the generative trajectory into temporal segments modeled by specialized velocity blocks, improving inference efficiency and sample quality. A Semantic Feature Guidance module further enhances generation fidelity by conditioning blocks on semantically rich features.
Business Value
Enables more efficient and higher-quality generation of complex data, which can be applied in areas like synthetic data generation for training other models or creating realistic media content.