Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 90% Match Research Paper Robotics Researchers,ML Engineers,AI Scientists,Automation Engineers 2 weeks ago

S$^2$-Diffusion: Generalizing from Instance-level to Category-level Skills in Robot Manipulation

robotics › manipulation
📄 Abstract

Abstract: Recent advances in skill learning has propelled robot manipulation to new heights by enabling it to learn complex manipulation tasks from a practical number of demonstrations. However, these skills are often limited to the particular action, object, and environment \textit{instances} that are shown in the training data, and have trouble transferring to other instances of the same category. In this work we present an open-vocabulary Spatial-Semantic Diffusion policy (S$^2$-Diffusion) which enables generalization from instance-level training data to category-level, enabling skills to be transferable between instances of the same category. We show that functional aspects of skills can be captured via a promptable semantic module combined with a spatial representation. We further propose leveraging depth estimation networks to allow the use of only a single RGB camera. Our approach is evaluated and compared on a diverse number of robot manipulation tasks, both in simulation and in the real world. Our results show that S$^2$-Diffusion is invariant to changes in category-irrelevant factors as well as enables satisfying performance on other instances within the same category, even if it was not trained on that specific instance. Project website: https://s2-diffusion.github.io.
Authors (4)
Quantao Yang
Michael C. Welle
Danica Kragic
Olov Andersson
Submitted
February 13, 2025
arXiv Category
cs.RO
arXiv PDF

Key Contributions

S$^2$-Diffusion enables robot manipulation skills to generalize from specific instances to entire categories by using an open-vocabulary spatial-semantic diffusion policy. It leverages a promptable semantic module and depth estimation from a single RGB camera, allowing skills to transfer between different objects and environments within the same category.

Business Value

Significantly enhances the adaptability and reusability of robot skills, reducing the need for extensive retraining for new objects or slightly different environments, thus accelerating robot deployment in diverse settings.