Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 96% Match Research Paper Computer Vision Researchers,AI Researchers,Robotics Engineers,Video Analysis Specialists 1 week ago

Generative Point Tracking with Flow Matching

generative-ai › flow-models
📄 Abstract

Abstract: Tracking a point through a video can be a challenging task due to uncertainty arising from visual obfuscations, such as appearance changes and occlusions. Although current state-of-the-art discriminative models excel in regressing long-term point trajectory estimates -- even through occlusions -- they are limited to regressing to a mean (or mode) in the presence of uncertainty, and fail to capture multi-modality. To overcome this limitation, we introduce Generative Point Tracker (GenPT), a generative framework for modelling multi-modal trajectories. GenPT is trained with a novel flow matching formulation that combines the iterative refinement of discriminative trackers, a window-dependent prior for cross-window consistency, and a variance schedule tuned specifically for point coordinates. We show how our model's generative capabilities can be leveraged to improve point trajectory estimates by utilizing a best-first search strategy on generated samples during inference, guided by the model's own confidence of its predictions. Empirically, we evaluate GenPT against the current state of the art on the standard PointOdyssey, Dynamic Replica, and TAP-Vid benchmarks. Further, we introduce a TAP-Vid variant with additional occlusions to assess occluded point tracking performance and highlight our model's ability to capture multi-modality. GenPT is capable of capturing the multi-modality in point trajectories, which translates to state-of-the-art tracking accuracy on occluded points, while maintaining competitive tracking accuracy on visible points compared to extant discriminative point trackers.
Authors (5)
Mattie Tesfaldet
Adam W. Harley
Konstantinos G. Derpanis
Derek Nowrouzezahrai
Christopher Pal
Submitted
October 23, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

This paper introduces GenPT, a generative framework for point tracking that uses flow matching to model multi-modal trajectories, overcoming the limitations of discriminative trackers that regress to a single mean. GenPT combines iterative refinement, cross-window consistency, and a specialized variance schedule, and leverages generative capabilities for improved trajectory estimation via best-first search.

Business Value

Enhances the accuracy and robustness of video analysis systems, enabling applications in autonomous driving, surveillance, robotics, and augmented reality.