Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Articulated objects are prevalent in daily life. Interactable digital twins
of such objects have numerous applications in embodied AI and robotics.
Unfortunately, current methods to digitize articulated real-world objects
require carefully captured data, preventing practical, scalable, and
generalizable acquisition. We focus on motion analysis and part-level
segmentation of an articulated object from a casually captured RGBD video shot
with a hand-held camera. A casually captured video of an interaction with an
articulated object is easy to obtain at scale using smartphones. However, this
setting is challenging due to simultaneous object and camera motion and
significant occlusions as the person interacts with the object. To tackle these
challenges, we introduce iTACO: a coarse-to-fine framework that infers joint
parameters and segments movable parts of the object from a dynamic RGBD video.
To evaluate our method under this new setting, we build a dataset of 784 videos
containing 284 objects across 11 categories that is 20$\times$ larger than
available in prior work. We then compare our approach with existing methods
that also take video as input. Our experiments show that iTACO outperforms
existing articulated object digital twin methods on both synthetic and real
casually captured RGBD videos.