Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: While spatial reasoning has made progress in object localization
relationships, it often overlooks object orientation-a key factor in 6-DoF
fine-grained manipulation. Traditional pose representations rely on pre-defined
frames or templates, limiting generalization and semantic grounding. In this
paper, we introduce the concept of semantic orientation, which defines object
orientations using natural language in a reference-frame-free manner (e.g., the
"plug-in" direction of a USB or the "handle" direction of a cup). To support
this, we construct OrienText300K, a large-scale dataset of 3D objects annotated
with semantic orientations, and develop PointSO, a general model for zero-shot
semantic orientation prediction. By integrating semantic orientation into VLM
agents, our SoFar framework enables 6-DoF spatial reasoning and generates
robotic actions. Extensive experiments demonstrated the effectiveness and
generalization of our SoFar, e.g., zero-shot 48.7% successful rate on Open6DOR
and zero-shot 74.9% successful rate on SIMPLER-Env.