Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Recent advances in vision-language models (VLMs) have enabled
instruction-conditioned robotic systems with improved generalization. However,
most existing work focuses on reactive System 1 policies, underutilizing VLMs'
strengths in semantic reasoning and long-horizon planning. These System 2
capabilities-characterized by deliberative, goal-directed thinking-remain under
explored due to the limited temporal scale and structural complexity of current
benchmarks. To address this gap, we introduce RoboCerebra, a benchmark for
evaluating high-level reasoning in long-horizon robotic manipulation.
RoboCerebra includes: (1) a large-scale simulation dataset with extended task
horizons and diverse subtask sequences in household environments; (2) a
hierarchical framework combining a high-level VLM planner with a low-level
vision-language-action (VLA) controller; and (3) an evaluation protocol
targeting planning, reflection, and memory through structured System 1-System 2
interaction. The dataset is constructed via a top-down pipeline, where GPT
generates task instructions and decomposes them into subtask sequences. Human
operators execute the subtasks in simulation, yielding high-quality
trajectories with dynamic object variations. Compared to prior benchmarks,
RoboCerebra features significantly longer action sequences and denser
annotations. We further benchmark state-of-the-art VLMs as System 2 modules and
analyze their performance across key cognitive dimensions, advancing the
development of more capable and generalizable robotic planners.
Authors (7)
Songhao Han
Boxiang Qiu
Yue Liao
Siyuan Huang
Chen Gao
Shuicheng Yan
+1 more
Key Contributions
RoboCerebra introduces a large-scale benchmark for evaluating long-horizon robotic manipulation, focusing on high-level reasoning capabilities beyond reactive policies. It provides a simulation dataset, a hierarchical VLM-based framework, and an evaluation protocol to assess planning, reflection, and System 1-System 2 interaction.
Business Value
Accelerates the development of more intelligent and capable robots for complex tasks in homes, factories, and other environments, leading to increased automation and efficiency.