Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Current 3D scene understanding methods are limited by offline-collected
multi-view data or pre-constructed 3D geometry. In this paper, we present
ExtractAnything3D (EA3D), a unified online framework for open-world 3D object
extraction that enables simultaneous geometric reconstruction and holistic
scene understanding. Given a streaming video, EA3D dynamically interprets each
frame using vision-language and 2D vision foundation encoders to extract
object-level knowledge. This knowledge is integrated and embedded into a
Gaussian feature map via a feed-forward online update strategy. We then
iteratively estimate visual odometry from historical frames and incrementally
update online Gaussian features with new observations. A recurrent joint
optimization module directs the model's attention to regions of interest,
simultaneously enhancing both geometric reconstruction and semantic
understanding. Extensive experiments across diverse benchmarks and tasks,
including photo-realistic rendering, semantic and instance segmentation, 3D
bounding box and semantic occupancy estimation, and 3D mesh generation,
demonstrate the effectiveness of EA3D. Our method establishes a unified and
efficient framework for joint online 3D reconstruction and holistic scene
understanding, enabling a broad range of downstream tasks.
Authors (6)
Xiaoyu Zhou
Jingqi Wang
Yuang Jia
Yongtao Wang
Deqing Sun
Ming-Hsuan Yang
Submitted
October 29, 2025
Key Contributions
EA3D presents a unified online framework for open-world 3D object extraction from streaming videos, enabling simultaneous geometric reconstruction and scene understanding. It dynamically interprets frames using foundation models, integrates knowledge into a Gaussian feature map, and iteratively refines geometry and semantics, overcoming limitations of offline methods.
Business Value
Enables real-time 3D mapping and understanding of dynamic environments, crucial for autonomous systems, robotics, and immersive experiences.