Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper Robotics Researchers,AI Researchers,ML Engineers,VLM Researchers 1 week ago

RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation

robotics › manipulation
📄 Abstract

Abstract: Recent advances in vision-language models (VLMs) have enabled instruction-conditioned robotic systems with improved generalization. However, most existing work focuses on reactive System 1 policies, underutilizing VLMs' strengths in semantic reasoning and long-horizon planning. These System 2 capabilities-characterized by deliberative, goal-directed thinking-remain under explored due to the limited temporal scale and structural complexity of current benchmarks. To address this gap, we introduce RoboCerebra, a benchmark for evaluating high-level reasoning in long-horizon robotic manipulation. RoboCerebra includes: (1) a large-scale simulation dataset with extended task horizons and diverse subtask sequences in household environments; (2) a hierarchical framework combining a high-level VLM planner with a low-level vision-language-action (VLA) controller; and (3) an evaluation protocol targeting planning, reflection, and memory through structured System 1-System 2 interaction. The dataset is constructed via a top-down pipeline, where GPT generates task instructions and decomposes them into subtask sequences. Human operators execute the subtasks in simulation, yielding high-quality trajectories with dynamic object variations. Compared to prior benchmarks, RoboCerebra features significantly longer action sequences and denser annotations. We further benchmark state-of-the-art VLMs as System 2 modules and analyze their performance across key cognitive dimensions, advancing the development of more capable and generalizable robotic planners.
Authors (7)
Songhao Han
Boxiang Qiu
Yue Liao
Siyuan Huang
Chen Gao
Shuicheng Yan
+1 more
Submitted
June 7, 2025
arXiv Category
cs.RO
arXiv PDF

Key Contributions

RoboCerebra introduces a large-scale benchmark for evaluating long-horizon robotic manipulation, focusing on high-level reasoning capabilities beyond reactive policies. It provides a simulation dataset, a hierarchical VLM-based framework, and an evaluation protocol to assess planning, reflection, and System 1-System 2 interaction.

Business Value

Accelerates the development of more intelligent and capable robots for complex tasks in homes, factories, and other environments, leading to increased automation and efficiency.