Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Agents that understand objects and their interactions can learn policies that
are more robust and transferable. However, most object-centric RL methods
factor state by individual objects while leaving interactions implicit. We
introduce the Factored Interactive Object-Centric World Model (FIOC-WM), a
unified framework that learns structured representations of both objects and
their interactions within a world model. FIOC-WM captures environment dynamics
with disentangled and modular representations of object interactions, improving
sample efficiency and generalization for policy learning. Concretely, FIOC-WM
first learns object-centric latents and an interaction structure directly from
pixels, leveraging pre-trained vision encoders. The learned world model then
decomposes tasks into composable interaction primitives, and a hierarchical
policy is trained on top: a high level selects the type and order of
interactions, while a low level executes them. On simulated robotic and
embodied-AI benchmarks, FIOC-WM improves policy-learning sample efficiency and
generalization over world-model baselines, indicating that explicit, modular
interaction learning is crucial for robust control.
Key Contributions
Introduces FIOC-WM, a unified framework that learns structured representations of both objects and their interactions from pixels, enabling more sample-efficient and generalizable policies. The model decomposes tasks into composable interaction primitives, facilitating hierarchical policy learning for complex robotic tasks.
Business Value
Enables robots to learn complex tasks more efficiently and generalize better to new situations by understanding object interactions, leading to more capable and adaptable robotic systems in manufacturing, logistics, and service industries.