Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
This paper proposes a rigorous evaluation protocol to benchmark text-to-video (T2V) and image-to-video (I2V) models as implicit simulators of pedestrian dynamics. It introduces a method to reconstruct 2D bird's-eye view trajectories from pixel space and analyzes the plausibility of multi-agent behavior in generated videos, revealing that leading models have learned effective priors but still exhibit failure modes.
Enables the development of more realistic and useful video generation models that can serve as simulators for training autonomous systems, testing urban planning scenarios, or creating immersive virtual environments.