Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Novel view synthesis from images, for example, with 3D Gaussian splatting,
has made great progress. Rendering fidelity and speed are now ready even for
demanding virtual reality applications. However, the problem of assisting
humans in collecting the input images for these rendering algorithms has
received much less attention. High-quality view synthesis requires uniform and
dense view sampling. Unfortunately, these requirements are not easily addressed
by human camera operators, who are in a hurry, impatient, or lack understanding
of the scene structure and the photographic process. Existing approaches to
guide humans during image acquisition concentrate on single objects or neglect
view-dependent material characteristics. We propose a novel situated
visualization technique for scanning at multiple scales. During the scanning of
a scene, our method identifies important objects that need extended image
coverage to properly represent view-dependent appearance. To this end, we
leverage semantic segmentation and category identification, ranked by a
vision-language model. Spherical proxies are generated around highly ranked
objects to guide the user during scanning. Our results show superior
performance in real scenes compared to conventional view sampling strategies.