Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper Computer vision researchers,Robotics engineers,AR/VR developers,3D artists 2 weeks ago

OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects

computer-vision › 3d-vision
📄 Abstract

Abstract: Free-moving object reconstruction from monocular video remains challenging, particularly without reliable pose or depth cues and under arbitrary object motion. We introduce OnlineSplatter, a novel online feed-forward framework generating high-quality, object-centric 3D Gaussians directly from RGB frames without requiring camera pose, depth priors, or bundle optimization. Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field, maintaining constant computational cost regardless of video sequence length. Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys, robustly fusing current frame features with temporally aggregated object states. This design enables effective handling of free-moving objects via spatial-guided memory readout and an efficient sparsification mechanism, ensuring comprehensive yet compact object coverage. Evaluations on real-world datasets demonstrate that OnlineSplatter significantly outperforms state-of-the-art pose-free reconstruction baselines, consistently improving with more observations while maintaining constant memory and runtime.
Authors (5)
Mark He Huang
Lin Geng Foo
Christian Theobalt
Ying Sun
De Wen Soh
Submitted
October 23, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Introduces OnlineSplatter, an online feed-forward framework for pose-free 3D reconstruction of free-moving objects from monocular video using 3D Gaussians. It employs a novel dual-key memory module to fuse appearance and geometry information robustly.

Business Value

Enables real-time 3D scanning and reconstruction of dynamic objects using readily available monocular cameras, opening up applications in AR/VR, robotics, and content creation.