arxiv_ai 95% Match Research Paper Computer vision researchers,Robotics engineers,AR/VR developers,3D artists 2 weeks ago

OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects

computer-vision › 3d-vision

📄 Abstract

Abstract: Free-moving object reconstruction from monocular video remains challenging, particularly without reliable pose or depth cues and under arbitrary object motion. We introduce OnlineSplatter, a novel online feed-forward framework generating high-quality, object-centric 3D Gaussians directly from RGB frames without requiring camera pose, depth priors, or bundle optimization. Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field, maintaining constant computational cost regardless of video sequence length. Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys, robustly fusing current frame features with temporally aggregated object states. This design enables effective handling of free-moving objects via spatial-guided memory readout and an efficient sparsification mechanism, ensuring comprehensive yet compact object coverage. Evaluations on real-world datasets demonstrate that OnlineSplatter significantly outperforms state-of-the-art pose-free reconstruction baselines, consistently improving with more observations while maintaining constant memory and runtime.

Authors (5)

Mark He Huang

Lin Geng Foo

Christian Theobalt

Ying Sun

De Wen Soh

Submitted

October 23, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Introduces OnlineSplatter, an online feed-forward framework for pose-free 3D reconstruction of free-moving objects from monocular video using 3D Gaussians. It employs a novel dual-key memory module to fuse appearance and geometry information robustly.

Business Value

Enables real-time 3D scanning and reconstruction of dynamic objects using readily available monocular cameras, opening up applications in AR/VR, robotics, and content creation.

Paper Metadata

Innovation Type

Algorithmic and Architectural

Deployment Feasibility

High, as it's designed for online processing and doesn't require external pose information, making it suitable for real-time applications.

Limitations Addressed

Challenges in reconstructing free-moving objects from monocular video without reliable pose or depth cues; high computational cost and non-real-time nature of existing methods.

Performance Gains

Generates high-quality, object-centric 3D Gaussians; maintains constant computational cost regardless of video length.

Technical Tags

3D ReconstructionMonocular VideoNeural Radiance Fields (NeRF)3D GaussiansOnline ProcessingPose-FreeObject-CentricFeed-ForwardMemory ModuleAppearance-Geometry Keys

Research Topics

3D Computer VisionNovel View SynthesisReal-time ReconstructionGenerative ModelsScene Representation

Methods & Architectures

Online Feed-Forward Framework3D Gaussian SplattingDual-Key Memory ModuleLatent Appearance-Geometry KeysDirectional KeysSpatial-Guided Memory ReadoutSparsification Mechanism 3D Gaussian RepresentationFeed-Forward Neural Network

Applications & Tasks

Robotics Augmented Reality (AR) Virtual Reality (VR) 3D Content Creation Autonomous Driving Free-moving object reconstructionPose estimation ambiguityComputational cost of reconstructionReal-time processing Online 3D reconstruction from monocular videoGenerating object-centric 3D representations

Related Fields

Computer Vision3D GraphicsRoboticsAugmented RealityMachine Learning

Keywords

3D ReconstructionMonocular VideoOnline ProcessingPose-Free3D GaussiansNeural Radiance FieldsObject-CentricFeed-ForwardMemory ModuleARVRRobotics

Academic Context

#3D Computer Vision#Novel View Synthesis#Real-time Reconstruction#Generative Models#Scene Representation

Commercial Potential

Potential Products

Real-time 3D scanning apps for mobile devicesAR/VR content creation toolsRobotic perception systems

Target Industries

GamingEntertainmentE-commerce (3D product visualization)RoboticsArchitecture

Use Case Examples

Scanning objects in real-time for AR placementCreating 3D models of products from phone videosEnabling robots to build 3D maps of their environment dynamically

Competitive Edge

Provides a pose-free, online, and efficient method for 3D reconstruction of dynamic objects, overcoming limitations of traditional methods and NeRF-based approaches.

Resource Requirements

Compute Needs

Requires GPU for real-time inference.

Data Requirements

Monocular video streams of free-moving objects.

Deployment Constraints

Performance depends on video quality and object motion complexity; requires sufficient GPU power for real-time operation.

Scalability

Maintains constant computational cost per frame, making it scalable with video length.

Production Readiness

Maturity Level

Research

View Full Paper Back to Papers