Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Deterministic models for 3D hand pose reconstruction, whether single-staged
or cascaded, struggle with pose ambiguities caused by self-occlusions and
complex hand articulations. Existing cascaded approaches refine predictions in
a coarse-to-fine manner but remain deterministic and cannot capture pose
uncertainties. Recent probabilistic methods model pose distributions yet are
restricted to single-stage estimation, which often fails to produce accurate 3D
reconstructions without refinement. To address these limitations, we propose a
coarse-to-fine cascaded diffusion framework that combines probabilistic
modeling with cascaded refinement. The first stage is a joint diffusion model
that samples diverse 3D joint hypotheses, and the second stage is a Mesh Latent
Diffusion Model (Mesh LDM) that reconstructs a 3D hand mesh conditioned on a
joint sample. By training Mesh LDM with diverse joint hypotheses in a learned
latent space, our framework learns distribution-aware joint-mesh relationships
and robust hand priors. Furthermore, the cascaded design mitigates the
difficulty of directly mapping 2D images to dense 3D poses, enhancing accuracy
through sequential refinement. Experiments on FreiHAND and HO3Dv2 demonstrate
that our method achieves state-of-the-art performance while effectively
modeling pose distributions.
Key Contributions
Proposes a coarse-to-fine cascaded diffusion framework for probabilistic 3D hand pose estimation, combining probabilistic modeling with cascaded refinement. It uses a joint diffusion model to sample hypotheses and a Mesh LDM to reconstruct a 3D mesh, effectively addressing pose ambiguities and capturing uncertainties.
Business Value
Enables more realistic and robust human-hand interaction in virtual and augmented reality, improves robotic manipulation capabilities by providing better hand tracking, and aids in clinical analysis of hand movements.