Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 96% Match Research Paper Robotics Engineers,AI Researchers,Embodied AI Developers,LVLM Researchers 2 weeks ago

General-Purpose Robotic Navigation via LVLM-Orchestrated Perception, Reasoning, and Acting

robotics › navigation
📄 Abstract

Abstract: Developing general-purpose navigation policies for unknown environments remains a core challenge in robotics. Most existing systems rely on task-specific neural networks and fixed information flows, limiting their generalizability. Large Vision-Language Models (LVLMs) offer a promising alternative by embedding human-like knowledge for reasoning and planning, but prior LVLM-robot integrations have largely depended on pre-mapped spaces, hard-coded representations, and rigid control logic. We introduce the Agentic Robotic Navigation Architecture (ARNA), a general-purpose framework that equips an LVLM-based agent with a library of perception, reasoning, and navigation tools drawn from modern robotic stacks. At runtime, the agent autonomously defines and executes task-specific workflows that iteratively query modules, reason over multimodal inputs, and select navigation actions. This agentic formulation enables robust navigation and reasoning in previously unmapped environments, offering a new perspective on robotic stack design. Evaluated in Habitat Lab on the HM-EQA benchmark, ARNA outperforms state-of-the-art EQA-specific approaches. Qualitative results on RxR and custom tasks further demonstrate its ability to generalize across a broad range of navigation challenges.
Authors (6)
Bernard Lange
Anil Yildiz
Mansur Arief
Shehryar Khattak
Mykel Kochenderfer
Georgios Georgakis
Submitted
June 20, 2025
arXiv Category
cs.RO
arXiv PDF

Key Contributions

Introduces ARNA, a general-purpose framework that equips an LVLM agent with a library of robotic tools. This agentic formulation allows for autonomous definition and execution of task-specific workflows, enabling robust navigation and reasoning in previously unmapped environments without relying on pre-mapped spaces or rigid control logic.

Business Value

Enables the development of more versatile and adaptable robots capable of operating in diverse and unknown environments, reducing the need for extensive pre-programming and mapping for each new task or location.