Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We propose DenseMarks - a new learned representation for human heads,
enabling high-quality dense correspondences of human head images. For a 2D
image of a human head, a Vision Transformer network predicts a 3D embedding for
each pixel, which corresponds to a location in a 3D canonical unit cube. In
order to train our network, we collect a dataset of pairwise point matches,
estimated by a state-of-the-art point tracker over a collection of diverse
in-the-wild talking heads videos, and guide the mapping via a contrastive loss,
encouraging matched points to have close embeddings. We further employ
multi-task learning with face landmarks and segmentation constraints, as well
as imposing spatial continuity of embeddings through latent cube features,
which results in an interpretable and queryable canonical space. The
representation can be used for finding common semantic parts, face/head
tracking, and stereo reconstruction. Due to the strong supervision, our method
is robust to pose variations and covers the entire head, including hair.
Additionally, the canonical space bottleneck makes sure the obtained
representations are consistent across diverse poses and individuals. We
demonstrate state-of-the-art results in geometry-aware point matching and
monocular head tracking with 3D Morphable Models. The code and the model
checkpoint will be made available to the public.
Key Contributions
This paper introduces DenseMarks, a new learned representation for human heads that enables high-quality dense correspondences. Using a Vision Transformer, it predicts 3D embeddings for each pixel, mapping them to a canonical 3D cube, trained via contrastive loss on point tracks and augmented with face landmarks and segmentation constraints.
Business Value
Facilitates more realistic and interactive virtual/augmented reality experiences, improved avatar creation, and advanced facial analysis applications in areas like gaming, social media, and virtual try-on.