Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research paper Robotics researchers,AI researchers,ML engineers working on embodied agents,Control engineers 2 weeks ago

Learning Affordances at Inference-Time for Vision-Language-Action Models

robotics › robotics-rl
📄 Abstract

Abstract: Solving complex real-world control tasks often takes multiple tries: if we fail at first, we reflect on what went wrong, and change our strategy accordingly to avoid making the same mistake. In robotics, Vision-Language-Action models (VLAs) offer a promising path towards solving complex control tasks, but lack the ability to contextually and dynamically readjust behavior when they fail to accomplish a task. In this work, we introduce Learning from Inference-Time Execution (LITEN), which connects a VLA low-level policy to a high-level VLM that conditions on past experiences by including them in-context, allowing it to learn the affordances and capabilities of the low-level VLA. Our approach iterates between a reasoning phase that generates and executes plans for the low-level VLA, and an assessment phase that reflects on the resulting execution and draws useful conclusions to be included in future reasoning contexts. Unlike similar approaches to self-refinement in non-robotics domains, LITEN must reflect on unstructured real-world robot trajectories (e.g., raw videos), which requires structured guiderails during assessment. Our experimental results demonstrate LITEN is able to effectively learn from past experience to generate plans that use high-affordance instructions to accomplish long-horizon tasks.
Authors (6)
Ameesh Shah
William Chen
Adwait Godbole
Federico Mora
Sanjit A. Seshia
Sergey Levine
Submitted
October 22, 2025
arXiv Category
cs.RO
arXiv PDF

Key Contributions

This paper introduces LITEN (Learning from Inference-Time Execution), a novel approach for Vision-Language-Action (VLA) models in robotics that enables them to learn affordances and dynamically readjust behavior after failures. It iterates between planning/execution and reflection phases, using past experiences in-context to improve future reasoning and task success.

Business Value

Enables robots to become more robust, adaptable, and efficient in performing complex real-world tasks, reducing the need for extensive pre-training and manual intervention.