Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 92% Match Research Paper Robotics Engineers,AI Researchers,Embodied AI Developers,VLM Researchers 1 week ago

Navigation with VLM framework: Towards Going to Any Language

robotics › navigation
📄 Abstract

Abstract: Navigating towards fully open language goals and exploring open scenes in an intelligent way have always raised significant challenges. Recently, Vision Language Models (VLMs) have demonstrated remarkable capabilities to reason with both language and visual data. Although many works have focused on leveraging VLMs for navigation in open scenes, they often require high computational cost, rely on object-centric approaches, or depend on environmental priors in detailed human instructions. We introduce Navigation with VLM (NavVLM), a training-free framework that harnesses open-source VLMs to enable robots to navigate effectively, even for human-friendly language goal such as abstract places, actions, or specific objects in open scenes. NavVLM leverages the VLM as its cognitive core to perceive environmental information and constantly provides exploration guidance achieving intelligent navigation with only a neat target rather than a detailed instruction with environment prior. We evaluated and validated NavVLM in both simulation and real-world experiments. In simulation, our framework achieves state-of-the-art performance in Success weighted by Path Length (SPL) on object-specifc tasks in richly detailed environments from Matterport 3D (MP3D), Habitat Matterport 3D (HM3D) and Gibson. With navigation episode reported, NavVLM demonstrates the capabilities to navigate towards any open-set languages. In real-world validation, we validated our framework's effectiveness in real-world robot at indoor scene.
Authors (4)
Zecheng Yin
Chonghao Cheng
and Yao Guo
Zhen Li
Submitted
September 18, 2024
arXiv Category
cs.CV
arXiv PDF

Key Contributions

NavVLM introduces a training-free framework that utilizes open-source VLMs to enable robots to navigate effectively in open scenes towards abstract language goals. It leverages the VLM as a cognitive core for perception and exploration guidance, eliminating the need for environmental priors or extensive training.

Business Value

Facilitates the development of more versatile and intuitive robots capable of understanding and acting upon natural language commands in complex environments, applicable in logistics, domestic assistance, and exploration.