Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Establishing a reliable and iteratively refined robotic system is essential
for deploying real-world applications. While Vision-Language-Action (VLA)
models are widely recognized as the foundation model for such robotic
deployment, their reliance on offline expert demonstrations critically limits
their capacity for post-deployment refinement. To mitigate this limitation, we
introduce Action Preference Optimization (APO), a method designed to refine VLA
models by human-assisted preference alignment gathered through interaction with
environments. This method begins with a human-robot collaboration framework for
reliable failure correction and interaction trajectory collection through human
intervention. However, directly leveraging these interaction trajectories for
preference optimization is non-trivial due to the challenges of irreversible
robotic actions and token distribution mismatch. To solve this, APO proposes an
adaptive reweighting algorithm with binary desirability signals derived from
interaction, empowering VLA models effectively suppress failure-prone actions
while enhancing corrective action adaptation. Ultimately, APO equips VLA models
with the crucial capability to learn from failure, paving the way for their
iterative refinement and reliable deployment in dynamic environments. The
experiments conducted in simulation and real-world scenarios prove superior
generalization and robustness of our human-assisted framework across a variety
of manipulation tasks. We believe this work could bring insights for efficient
and stable optimization of VLA models through human-robot collaboration. The
code and dataset are released at
https://github.com/GeWu-Lab/Action-Preference-Optimization
Authors (6)
Wenke Xia
Yichu Yang
Hongtao Wu
Xiao Ma
Tao Kong
Di Hu
Key Contributions
JSON parse error: Unexpected token ' in JSON at position 48079