Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 92% Match Research Paper Robotics Researchers,AI Engineers,Developers of Autonomous Systems 20 hours ago

Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots

robotics › manipulation
📄 Abstract

Abstract: Today's best-explored routes towards generalist robots center on collecting ever larger "observations-in actions-out" robotics datasets to train large end-to-end models, copying a recipe that has worked for vision-language models (VLMs). We pursue a road less traveled: building generalist policies directly around VLMs by augmenting their general capabilities with specific robot capabilities encapsulated in a carefully curated set of perception, planning, and control modules. In Maestro, a VLM coding agent dynamically composes these modules into a programmatic policy for the current task and scenario. Maestro's architecture benefits from a streamlined closed-loop interface without many manually imposed structural constraints, and a comprehensive and diverse tool repertoire. As a result, it largely surpasses today's VLA models for zero-shot performance on challenging manipulation skills. Further, Maestro is easily extensible to incorporate new modules, easily editable to suit new embodiments such as a quadruped-mounted arm, and even easily adapts from minimal real-world experiences through local code edits.

Key Contributions

Maestro orchestrates robotics modules using VLMs to create zero-shot generalist robots. A VLM coding agent dynamically composes perception, planning, and control modules into programmatic policies, surpassing current VLA models in zero-shot performance on challenging manipulation tasks.

Business Value

Accelerates the development of versatile robots capable of performing a wide range of tasks without task-specific training. This can lead to more adaptable automation solutions in manufacturing, logistics, and service industries.