Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 90% Match Research Paper Robotics engineers,AI researchers,Edge computing specialists,Mobile robot developers 20 hours ago

From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics

robotics › sim-to-real
📄 Abstract

Abstract: Video Understanding, Scene Interpretation and Commonsense Reasoning are highly challenging tasks enabling the interpretation of visual information, allowing agents to perceive, interact with and make rational decisions in its environment. Large Language Models (LLMs) and Visual Language Models (VLMs) have shown remarkable advancements in these areas in recent years, enabling domain-specific applications as well as zero-shot open vocabulary tasks, combining multiple domains. However, the required computational complexity poses challenges for their application on edge devices and in the context of Mobile Robotics, especially considering the trade-off between accuracy and inference time. In this paper, we investigate the capabilities of state-of-the-art VLMs for the task of Scene Interpretation and Action Recognition, with special regard to small VLMs capable of being deployed to edge devices in the context of Mobile Robotics. The proposed pipeline is evaluated on a diverse dataset consisting of various real-world cityscape, on-campus and indoor scenarios. The experimental evaluation discusses the potential of these small models on edge devices, with particular emphasis on challenges, weaknesses, inherent model biases and the application of the gained information. Supplementary material is provided via the following repository: https://datahub.rz.rptu.de/hstr-csrl-public/publications/scene-interpretation-on-edge-devices/

Key Contributions

Investigates the capabilities of state-of-the-art VLMs for scene interpretation and action recognition on edge devices for mobile robotics. It focuses on evaluating smaller VLMs that balance accuracy and inference time for real-world robotic applications, addressing the computational challenges of deploying large models.

Business Value

Enables more intelligent and adaptable mobile robots capable of understanding their environment and actions in real-time, even with limited onboard processing power, leading to safer and more versatile robotic applications.