Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Animation retargetting applies sparse motion description (e.g., keypoint
sequences) to a character mesh to produce a semantically plausible and
temporally coherent full-body mesh sequence. Existing approaches come with
restrictions -- they require access to template-based shape priors or
artist-designed deformation rigs, suffer from limited generalization to unseen
motion and/or shapes, or exhibit motion jitter. We propose Self-supervised
Motion Fields (SMF), a self-supervised framework that is trained with only
sparse motion representations, without requiring dataset-specific annotations,
templates, or rigs. At the heart of our method are Kinetic Codes, a novel
autoencoder-based sparse motion encoding, that exposes a semantically rich
latent space, simplifying large-scale training. Our architecture comprises
dedicated spatial and temporal gradient predictors, which are jointly trained
in an end-to-end fashion. The combined network, regularized by the Kinetic
Codes' latent space, has good generalization across both unseen shapes and new
motions. We evaluated our method on unseen motion sampled from AMASS, D4D,
Mixamo, and raw monocular video for animation transfer on various characters
with varying shapes and topology. We report a new SoTA on the AMASS dataset in
the context of generalization to unseen motion. Code, weights, and
supplementary are available on the project webpage at
https://motionfields.github.io/
Key Contributions
Self-supervised Motion Fields (SMF) is a template-free and rig-free framework for animation retargeting that uses Kinetic Codes, a novel autoencoder-based sparse motion encoding, to create a semantically rich latent space. This allows for large-scale training without dataset-specific annotations, enabling the generation of plausible and temporally coherent full-body mesh sequences from sparse motion descriptions while avoiding motion jitter.
Business Value
Significantly speeds up and democratizes the animation process for games, films, and virtual experiences by reducing the need for manual rigging and complex motion capture cleanup.