Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Developing general-purpose navigation policies for unknown environments
remains a core challenge in robotics. Most existing systems rely on
task-specific neural networks and fixed information flows, limiting their
generalizability. Large Vision-Language Models (LVLMs) offer a promising
alternative by embedding human-like knowledge for reasoning and planning, but
prior LVLM-robot integrations have largely depended on pre-mapped spaces,
hard-coded representations, and rigid control logic. We introduce the Agentic
Robotic Navigation Architecture (ARNA), a general-purpose framework that equips
an LVLM-based agent with a library of perception, reasoning, and navigation
tools drawn from modern robotic stacks. At runtime, the agent autonomously
defines and executes task-specific workflows that iteratively query modules,
reason over multimodal inputs, and select navigation actions. This agentic
formulation enables robust navigation and reasoning in previously unmapped
environments, offering a new perspective on robotic stack design. Evaluated in
Habitat Lab on the HM-EQA benchmark, ARNA outperforms state-of-the-art
EQA-specific approaches. Qualitative results on RxR and custom tasks further
demonstrate its ability to generalize across a broad range of navigation
challenges.
Authors (6)
Bernard Lange
Anil Yildiz
Mansur Arief
Shehryar Khattak
Mykel Kochenderfer
Georgios Georgakis
Key Contributions
Introduces ARNA, a general-purpose framework that equips an LVLM agent with a library of robotic tools. This agentic formulation allows for autonomous definition and execution of task-specific workflows, enabling robust navigation and reasoning in previously unmapped environments without relying on pre-mapped spaces or rigid control logic.
Business Value
Enables the development of more versatile and adaptable robots capable of operating in diverse and unknown environments, reducing the need for extensive pre-programming and mapping for each new task or location.