Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Navigating towards fully open language goals and exploring open scenes in an
intelligent way have always raised significant challenges. Recently, Vision
Language Models (VLMs) have demonstrated remarkable capabilities to reason with
both language and visual data. Although many works have focused on leveraging
VLMs for navigation in open scenes, they often require high computational cost,
rely on object-centric approaches, or depend on environmental priors in
detailed human instructions. We introduce Navigation with VLM (NavVLM), a
training-free framework that harnesses open-source VLMs to enable robots to
navigate effectively, even for human-friendly language goal such as abstract
places, actions, or specific objects in open scenes. NavVLM leverages the VLM
as its cognitive core to perceive environmental information and constantly
provides exploration guidance achieving intelligent navigation with only a neat
target rather than a detailed instruction with environment prior. We evaluated
and validated NavVLM in both simulation and real-world experiments. In
simulation, our framework achieves state-of-the-art performance in Success
weighted by Path Length (SPL) on object-specifc tasks in richly detailed
environments from Matterport 3D (MP3D), Habitat Matterport 3D (HM3D) and
Gibson. With navigation episode reported, NavVLM demonstrates the capabilities
to navigate towards any open-set languages. In real-world validation, we
validated our framework's effectiveness in real-world robot at indoor scene.
Authors (4)
Zecheng Yin
Chonghao Cheng
and Yao Guo
Zhen Li
Submitted
September 18, 2024
Key Contributions
NavVLM introduces a training-free framework that utilizes open-source VLMs to enable robots to navigate effectively in open scenes towards abstract language goals. It leverages the VLM as a cognitive core for perception and exploration guidance, eliminating the need for environmental priors or extensive training.
Business Value
Facilitates the development of more versatile and intuitive robots capable of understanding and acting upon natural language commands in complex environments, applicable in logistics, domestic assistance, and exploration.