Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
This paper proposes using text-to-video diffusion models to generate annotated data for training action understanding models, addressing data scarcity. It introduces an 'information enhancement strategy' and 'uncertainty-based label smoothing' to improve the quality and utility of generated data, demonstrating that generated data can significantly boost performance.
Enables the development of more capable video understanding systems even with limited real-world data, accelerating AI deployment in areas like autonomous driving, robotics, and content analysis.