Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Recent advances in image and video generation raise hopes that these models
possess world modeling capabilities, the ability to generate realistic,
physically plausible videos. This could revolutionize applications in robotics,
autonomous driving, and scientific simulation. However, before treating these
models as world models, we must ask: Do they adhere to physical conservation
laws? To answer this, we introduce Morpheus, a benchmark for evaluating video
generation models on physical reasoning. It features 80 real-world videos
capturing physical phenomena, guided by conservation laws. Since artificial
generations lack ground truth, we assess physical plausibility using
physics-informed metrics evaluated with respect to infallible conservation laws
known per physical setting, leveraging advances in physics-informed neural
networks and vision-language foundation models. Our findings reveal that even
with advanced prompting and video conditioning, current models struggle to
encode physical principles despite generating aesthetically pleasing videos.
All data, leaderboard, and code are open-sourced at our project page.
Authors (10)
Chenyu Zhang
Daniil Cherniavskii
Antonios Tragoudaras
Antonios Vozikis
Thijmen Nijdam
Derck W. E. Prinzhorn
+4 more
Key Contributions
Introduces Morpheus, a novel benchmark for evaluating the physical reasoning capabilities of video generative models using real-world physics experiments and conservation laws. It employs physics-informed metrics to assess physical plausibility, revealing that current models struggle with adhering to fundamental physical principles.
Business Value
Crucial for developing trustworthy generative models for safety-critical applications like robotics and autonomous driving, ensuring generated scenarios are physically realistic.