arxiv_ml 90% Match Research Paper Robotics researchers,AI researchers,RL practitioners,Embodied AI developers 20 hours ago

Learning Interactive World Model for Object-Centric Reinforcement Learning

reinforcement-learning › robotics-rl

📄 Abstract

Abstract: Agents that understand objects and their interactions can learn policies that are more robust and transferable. However, most object-centric RL methods factor state by individual objects while leaving interactions implicit. We introduce the Factored Interactive Object-Centric World Model (FIOC-WM), a unified framework that learns structured representations of both objects and their interactions within a world model. FIOC-WM captures environment dynamics with disentangled and modular representations of object interactions, improving sample efficiency and generalization for policy learning. Concretely, FIOC-WM first learns object-centric latents and an interaction structure directly from pixels, leveraging pre-trained vision encoders. The learned world model then decomposes tasks into composable interaction primitives, and a hierarchical policy is trained on top: a high level selects the type and order of interactions, while a low level executes them. On simulated robotic and embodied-AI benchmarks, FIOC-WM improves policy-learning sample efficiency and generalization over world-model baselines, indicating that explicit, modular interaction learning is crucial for robust control.

Key Contributions

Introduces FIOC-WM, a unified framework that learns structured representations of both objects and their interactions from pixels, enabling more sample-efficient and generalizable policies. The model decomposes tasks into composable interaction primitives, facilitating hierarchical policy learning for complex robotic tasks.

Business Value

Enables robots to learn complex tasks more efficiently and generalize better to new situations by understanding object interactions, leading to more capable and adaptable robotic systems in manufacturing, logistics, and service industries.

Paper Metadata

Innovation Type

Algorithmic and Architectural Innovation

Deployment Feasibility

Moderate. Requires significant computational resources for training and integration with robotic hardware.

Limitations Addressed

Implicit modeling of interactions in existing object-centric RL methods, improving sample efficiency and generalization.

Technical Tags

object-centric RLworld modelinteraction learningdisentangled representationsmodular representationspixel-based learningvision encodershierarchical policycomposable primitives

Research Topics

Reinforcement LearningWorld ModelsObject-Centric RepresentationRoboticsGeneralization

Methods & Architectures

Object-centric Representation LearningWorld Model LearningHierarchical Reinforcement LearningPre-trained Vision Encoders Factored Interactive Object-Centric World Model (FIOC-WM)Hierarchical Policy Network

Applications & Tasks

Robotics Embodied AI Simulation Environments Learning robust and transferable policiesModeling object interactionsSample-efficient RL Policy LearningWorld ModelingInteraction Prediction

Related Fields

Computer VisionRoboticsReinforcement LearningRepresentation Learning

Keywords

object-centric RLworld modelinteraction learningreinforcement learningroboticsrepresentation learninghierarchical RLsample efficiencygeneralizationembodied AI

Academic Context

#Reinforcement Learning#World Models#Object-Centric Representation#Robotics#Generalization

Technology Stack

Frameworks & Libraries

PyTorch

Commercial Potential

Potential Products

Robotic control softwareSimulation platforms for robotics training

Target Industries

ManufacturingLogisticsAutomotiveConsumer Electronics

Use Case Examples

Robots learning to assemble complex objectsAutonomous navigation in dynamic environmentsHuman-robot collaboration

Competitive Edge

Advances object-centric RL by explicitly modeling interactions within a unified world model, aiming for better sample efficiency and generalization compared to methods that treat interactions implicitly.

Market Opportunity

Growing market for intelligent automation and robotics.

Revenue Models

Licensing of AI modelsdevelopment of specialized robotic systems.

Resource Requirements

Compute Needs

High (for training vision encoders and RL agent)

Data Requirements

Pixel data from simulated or real-world environments.

Deployment Constraints

Real-time inference speed, integration with robotic hardware.

Scalability

Scalability depends on the complexity of object interactions and the environment.

Production Readiness

Maturity Level

Research

Time to Market

3-5 years

Patent Potential

Moderate

View Full Paper Back to Papers