Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We present PercHead, a method for single-image 3D head reconstruction and
semantic 3D editing - two tasks that are inherently challenging due to severe
view occlusions, weak perceptual supervision, and the ambiguity of editing in
3D space. We develop a unified base model for reconstructing view-consistent 3D
heads from a single input image. The model employs a dual-branch encoder
followed by a ViT-based decoder that lifts 2D features into 3D space through
iterative cross-attention. Rendering is performed using Gaussian Splatting. At
the heart of our approach is a novel perceptual supervision strategy based on
DINOv2 and SAM2.1, which provides rich, generalized signals for both geometric
and appearance fidelity. Our model achieves state-of-the-art performance in
novel-view synthesis and, furthermore, exhibits exceptional robustness to
extreme viewing angles compared to established baselines. Furthermore, this
base model can be seamlessly extended for semantic 3D editing by swapping the
encoder and finetuning the network. In this variant, we disentangle geometry
and style through two distinct input modalities: a segmentation map to control
geometry and either a text prompt or a reference image to specify appearance.
We highlight the intuitive and powerful 3D editing capabilities of our model
through a lightweight, interactive GUI, where users can effortlessly sculpt
geometry by drawing segmentation maps and stylize appearance via natural
language or image prompts.
Project Page: https://antoniooroz.github.io/PercHead Video:
https://www.youtube.com/watch?v=4hFybgTk4kE
Key Contributions
Presents PercHead, a unified method for single-image 3D head reconstruction and semantic editing, overcoming challenges of view occlusion and weak perceptual supervision. It employs a novel dual-branch encoder, ViT-based decoder, Gaussian Splatting rendering, and a unique perceptual supervision strategy using DINOv2 and SAM2.1 for high fidelity.
Business Value
Enables creation of realistic 3D digital avatars and assets from single photos, impacting virtual reality, gaming, and personalized digital experiences.