Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
π Abstract
Abstract: Markerless multiview motion capture is often constrained by the need for
precise camera calibration, limiting accessibility for non-experts and
in-the-wild captures. Existing calibration-free approaches mitigate this
requirement but suffer from high computational cost and reduced reconstruction
accuracy.
We present Kineo, a fully automatic, calibration-free pipeline for markerless
motion capture from videos captured by unsynchronized, uncalibrated,
consumer-grade RGB cameras. Kineo leverages 2D keypoints from off-the-shelf
detectors to simultaneously calibrate cameras, including Brown-Conrady
distortion coefficients, and reconstruct 3D keypoints and dense scene point
maps at metric scale. A confidence-driven spatio-temporal keypoint sampling
strategy, combined with graph-based global optimization, ensures robust
calibration at a fixed computational cost independent of sequence length. We
further introduce a pairwise reprojection consensus score to quantify 3D
reconstruction reliability for downstream tasks.
Evaluations on EgoHumans and Human3.6M demonstrate substantial improvements
over prior calibration-free methods. Compared to previous state-of-the-art
approaches, Kineo reduces camera translation error by approximately 83-85%,
camera angular error by 86-92%, and world mean-per-joint error (W-MPJPE) by
83-91%.
Kineo is also efficient in real-world scenarios, processing multi-view
sequences faster than their duration in specific configuration (e.g., 36min to
process 1h20min of footage). The full pipeline and evaluation code are openly
released to promote reproducibility and practical adoption at
https://liris-xr.github.io/kineo/.
Authors (3)
Charles Javerliat
Pierre Raimbaud
Guillaume LavouΓ©
Submitted
October 28, 2025
Key Contributions
Presents Kineo, a fully automatic, calibration-free pipeline for markerless motion capture using sparse RGB cameras. Kineo simultaneously calibrates cameras (including distortion) and reconstructs 3D keypoints and dense scene point maps at metric scale, overcoming the need for precise calibration and reducing computational cost.
Business Value
Democratizes motion capture technology by enabling accurate 3D reconstruction from readily available consumer-grade cameras without complex calibration, opening up applications for smaller studios, researchers, and in-the-wild scenarios.