Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The generalization capabilities of vision-language-action (VLA) models to
unseen tasks are crucial to achieving general-purpose robotic manipulation in
open-world settings. However, the cross-task generalization capabilities of
existing VLA models remain significantly underexplored. To address this gap, we
introduce AGNOSTOS, a novel simulation benchmark designed to rigorously
evaluate cross-task zero-shot generalization in manipulation. AGNOSTOS
comprises 23 unseen manipulation tasks for testing, distinct from common
training task distributions, and incorporates two levels of generalization
difficulty to assess robustness. Our systematic evaluation reveals that current
VLA models, despite being trained on diverse datasets, struggle to generalize
effectively to these unseen tasks. To overcome this limitation, we propose
Cross-Task In-Context Manipulation (X-ICM), a method that conditions large
language models (LLMs) on in-context demonstrations from seen tasks to predict
action sequences for unseen tasks. Additionally, we introduce a dynamics-guided
sample selection strategy that identifies relevant demonstrations by capturing
cross-task dynamics. On AGNOSTOS, X-ICM significantly improves cross-task
zero-shot generalization performance over leading VLAs. We believe AGNOSTOS and
X-ICM will serve as valuable tools for advancing general-purpose robotic
manipulation.
Authors (9)
Jiaming Zhou
Ke Ye
Jiayi Liu
Teli Ma
Zifan Wang
Ronghe Qiu
+3 more
Key Contributions
Introduces AGNOSTOS, a novel simulation benchmark for rigorously evaluating cross-task zero-shot generalization in robotic manipulation. Proposes X-ICM, a method that leverages LLMs and in-context demonstrations from seen tasks to improve generalization to unseen manipulation tasks.
Business Value
Accelerates the development of more versatile robots capable of performing a wider range of tasks without explicit retraining, leading to more adaptable automation solutions.