Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper Robotics researchers,Embodied AI researchers,AI researchers focused on AGI,Developers of autonomous agents 2 weeks ago

EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence

robotics › embodied-agents
📄 Abstract

Abstract: The realization of Artificial General Intelligence (AGI) necessitates Embodied AI agents capable of robust spatial perception, effective task planning, and adaptive execution in physical environments. However, current large language models (LLMs) and multimodal LLMs (MLLMs) for embodied tasks suffer from key limitations, including a significant gap between model design and agent requirements, an unavoidable trade-off between real-time latency and performance, and the use of unauthentic, offline evaluation metrics. To address these challenges, we propose EmbodiedBrain, a novel vision-language foundation model available in both 7B and 32B parameter sizes. Our framework features an agent-aligned data structure and employs a powerful training methodology that integrates large-scale Supervised Fine-Tuning (SFT) with Step-Augumented Group Relative Policy Optimization (Step-GRPO), which boosts long-horizon task success by integrating preceding steps as Guided Precursors. Furthermore, we incorporate a comprehensive reward system, including a Generative Reward Model (GRM) accelerated at the infrastructure level, to improve training efficiency. For enable thorough validation, we establish a three-part evaluation system encompassing General, Planning, and End-to-End Simulation Benchmarks, highlighted by the proposal and open-sourcing of a novel, challenging simulation environment. Experimental results demonstrate that EmbodiedBrain achieves superior performance across all metrics, establishing a new state-of-the-art for embodied foundation models. Towards paving the way for the next generation of generalist embodied agents, we open-source all of our data, model weight, and evaluating methods, which are available at https://zterobot.github.io/EmbodiedBrain.github.io.
Authors (20)
Ding Zou
Feifan Wang
Mengyu Ge
Siyuan Fan
Zongbing Zhang
Wei Chen
+14 more
Submitted
October 23, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

EmbodiedBrain is a novel vision-language foundation model designed to expand performance boundaries for task planning in embodied intelligence. It features an agent-aligned data structure and a training methodology integrating SFT with Step-GRPO, which improves long-horizon task success by using preceding steps as guided precursors, addressing limitations in current LLMs/MLLMs for embodied tasks.

Business Value

Enables the development of more capable and adaptable AI agents for physical tasks, leading to advancements in robotics, automation, and human-robot interaction. Crucial for realizing more general-purpose AI agents.