Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 90% Match Research Paper Robotics Engineers,AI Researchers,Computer Vision Engineers,Machine Learning Practitioners 2 weeks ago

PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning

robotics β€Ί manipulation
πŸ“„ Abstract

Abstract: Robotic manipulation systems benefit from complementary sensing modalities, where each provides unique environmental information. Point clouds capture detailed geometric structure, while RGB images provide rich semantic context. Current point cloud methods struggle to capture fine-grained detail, especially for complex tasks, which RGB methods lack geometric awareness, which hinders their precision and generalization. We introduce PointMapPolicy, a novel approach that conditions diffusion policies on structured grids of points without downsampling. The resulting data type makes it easier to extract shape and spatial relationships from observations, and can be transformed between reference frames. Yet due to their structure in a regular grid, we enable the use of established computer vision techniques directly to 3D data. Using xLSTM as a backbone, our model efficiently fuses the point maps with RGB data for enhanced multi-modal perception. Through extensive experiments on the RoboCasa and CALVIN benchmarks and real robot evaluations, we demonstrate that our method achieves state-of-the-art performance across diverse manipulation tasks. The overview and demos are available on our project page: https://point-map.github.io/Point-Map/
Authors (15)
Xiaogang Jia
Qian Wang
Anrui Wang
Han A. Wang
BalΓ‘zs Gyenes
Emiliyan Gospodinov
+9 more
Submitted
October 23, 2025
arXiv Category
cs.RO
arXiv PDF

Key Contributions

PointMapPolicy is a novel approach for robotic manipulation that conditions diffusion policies on structured grids of points, avoiding downsampling and enabling direct application of computer vision techniques to 3D data. It efficiently fuses point cloud and RGB data using an xLSTM backbone, improving fine-grained detail capture and spatial relationship understanding for complex tasks.

Business Value

Enables more capable and precise robotic systems for tasks like assembly, pick-and-place, and inspection, leading to increased automation in manufacturing and logistics.