Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We present Stable Video Materials 3D (SViM3D), a framework to predict
multi-view consistent physically based rendering (PBR) materials, given a
single image. Recently, video diffusion models have been successfully used to
reconstruct 3D objects from a single image efficiently. However, reflectance is
still represented by simple material models or needs to be estimated in
additional steps to enable relighting and controlled appearance edits. We
extend a latent video diffusion model to output spatially varying PBR
parameters and surface normals jointly with each generated view based on
explicit camera control. This unique setup allows for relighting and generating
a 3D asset using our model as neural prior. We introduce various mechanisms to
this pipeline that improve quality in this ill-posed setting. We show
state-of-the-art relighting and novel view synthesis performance on multiple
object-centric datasets. Our method generalizes to diverse inputs, enabling the
generation of relightable 3D assets useful in AR/VR, movies, games and other
visual media.
Key Contributions
Presents SViM3D, a framework extending latent video diffusion models to predict multi-view consistent Physically Based Rendering (PBR) materials from a single image. It enables relighting and controlled appearance edits by jointly generating PBR parameters and surface normals, acting as a neural prior for 3D asset generation.
Business Value
Significantly streamlines the creation of realistic 3D assets with controllable materials, accelerating workflows in game development, VFX, and AR/VR content creation.