Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Diffusion models excel in noise-to-data generation tasks, providing a mapping
from a Gaussian distribution to a more complex data distribution. However they
struggle to model translations between complex distributions, limiting their
effectiveness in data-to-data tasks. While Bridge Matching (BM) models address
this by finding the translation between data distributions, their application
to time-correlated data sequences remains unexplored. This is a critical
limitation for video generation and manipulation tasks, where maintaining
temporal coherence is particularly important. To address this gap, we propose
Time-Correlated Video Bridge Matching (TCVBM), a framework that extends BM to
time-correlated data sequences in the video domain. TCVBM explicitly models
inter-sequence dependencies within the diffusion bridge, directly incorporating
temporal correlations into the sampling process. We compare our approach to
classical methods based on bridge matching and diffusion models for three
video-related tasks: frame interpolation, image-to-video generation, and video
super-resolution. TCVBM achieves superior performance across multiple
quantitative metrics, demonstrating enhanced generation quality and
reconstruction fidelity.
Authors (5)
Viacheslav Vasilev
Arseny Ivanov
Nikita Gushchin
Maria Kovaleva
Alexander Korotin
Submitted
October 14, 2025
Key Contributions
Introduces Time-Correlated Video Bridge Matching (TCVBM), a framework extending Bridge Matching (BM) to time-correlated video data. TCVBM explicitly models inter-sequence dependencies within the diffusion bridge, directly incorporating temporal correlations into the sampling process, which addresses the limitations of diffusion models in data-to-data tasks and BM models in time-correlated sequences.
Business Value
Enables the creation of more realistic and temporally consistent synthetic videos, useful for content creation, special effects, and data augmentation in video-based AI systems.