Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Training Vision-Language-Action (VLA) models for generalist robots typically
requires large-scale real-world robot data, which is expensive and
time-consuming to collect. The inefficiency of physical data collection
severely limits the scalability, and generalization capacity of current VLA
systems. To address this challenge, we introduce GigaBrain-0, a novel VLA
foundation model empowered by world model-generated data (e.g., video
generation, real2real transfer, human transfer, view transfer, sim2real
transfer data). By leveraging world models to generate diverse data at scale,
GigaBrain-0 significantly reduces reliance on real robot data while improving
cross-task generalization. Our approach further improves policy robustness
through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision,
enabling the model to reason about spatial geometry, object states, and
long-horizon dependencies during task execution. This leads to substantial
gains in real-world performance on dexterous, long-horizon, and mobile
manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves
superior generalization across variations in appearances (e.g., textures,
colors), object placements, and camera viewpoints. Additionally, we present
GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently
on devices such as the NVIDIA Jetson AGX Orin.
Authors (27)
GigaBrain Team
Angen Ye
Boyuan Wang
Chaojun Ni
Guan Huang
Guosheng Zhao
+21 more
Submitted
October 22, 2025
Key Contributions
GigaBrain-0 is a novel VLA foundation model that significantly reduces reliance on expensive real-world robot data by leveraging diverse data generated from world models. It improves cross-task generalization, policy robustness through RGBD input modeling, and reasoning capabilities via embodied Chain-of-Thought supervision, enabling more capable and data-efficient generalist robots.
Business Value
Accelerates the development and deployment of more capable and versatile robots by drastically reducing the data collection bottleneck, leading to wider adoption in various industries.