Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: This paper aims to tackle the problem of photorealistic view synthesis from
vehicle sensor data. Recent advancements in neural scene representation have
achieved notable success in rendering high-quality autonomous driving scenes,
but the performance significantly degrades as the viewpoint deviates from the
training trajectory. To mitigate this problem, we introduce StreetCrafter, a
novel controllable video diffusion model that utilizes LiDAR point cloud
renderings as pixel-level conditions, which fully exploits the generative prior
for novel view synthesis, while preserving precise camera control. Moreover,
the utilization of pixel-level LiDAR conditions allows us to make accurate
pixel-level edits to target scenes. In addition, the generative prior of
StreetCrafter can be effectively incorporated into dynamic scene
representations to achieve real-time rendering. Experiments on Waymo Open
Dataset and PandaSet demonstrate that our model enables flexible control over
viewpoint changes, enlarging the view synthesis regions for satisfying
rendering, which outperforms existing methods.