Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Recent advances in probabilistic generative models have extended capabilities
from static image synthesis to text-driven video generation. However, the
inherent randomness of their generation process can lead to unpredictable
artifacts, such as impossible physics and temporal inconsistency. Progress in
addressing these challenges requires systematic benchmarks, yet existing
datasets primarily focus on generative images due to the unique spatio-temporal
complexities of videos. To bridge this gap, we introduce GeneVA, a large-scale
artifact dataset with rich human annotations that focuses on spatio-temporal
artifacts in videos generated from natural text prompts. We hope GeneVA can
enable and assist critical applications, such as benchmarking model performance
and improving generative video quality.