Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Self-evolution, the ability of agents to autonomously improve their reasoning
and behavior, is essential for the embodied domain with long-horizon,
real-world tasks. Despite current advancements in reinforcement fine-tuning
(RFT) showing strong performance in enhancing reasoning in LLMs, its potential
to enable self-evolving embodied intelligence with multi-modal interactions
remains largely unexplored. Specifically, reinforcement fine-tuning faces two
fundamental obstacles in embodied settings: (i) the lack of accessible
intermediate rewards in multi-step reasoning tasks limits effective learning
signals, and (ii) reliance on hand-crafted reward functions restricts
generalization to novel tasks and environments. To address these challenges, we
present Self-Evolving Embodied Agents-R1, SEEA-R1, the first RFT framework
designed for enabling the self-evolving capabilities of embodied agents.
Specifically, to convert sparse delayed rewards into denser intermediate
signals that improve multi-step reasoning, we propose Tree-based group relative
policy optimization (Tree-GRPO) integrates Monte Carlo Tree Search into GRPO.
To generalize reward estimation across tasks and scenes, supporting autonomous
adaptation and reward-driven self-evolution, we further introduce Multi-modal
Generative Reward Model (MGRM). To holistically evaluate the effectiveness of
SEEA-R1, we evaluate on the ALFWorld benchmark, surpassing state-of-the-art
methods with scores of 85.07% (textual) and 46.27% (multi-modal), outperforming
prior models including GPT-4o. SEEA-R1 also achieves scores of 80.3% (textual)
and 44.03% (multi-modal) without ground truth reward, surpassing all
open-source baselines and highlighting its scalability as a self-evolving
embodied agent. Additional experiments and qualitative analysis further support
the potential of SEEA-R1 for future research in scalable embodied intelligence.
Authors (15)
Wanxin Tian
Shijie Zhang
Kevin Zhang
Xiaowei Chi
Chunkai Fan
Junyu Lu
+9 more
Key Contributions
SEEA-R1 is the first RFT framework designed for self-evolving embodied agents. It addresses the lack of accessible intermediate rewards and reliance on hand-crafted reward functions in embodied settings by converting sparse delayed rewards into denser intermediate signals, enabling more effective learning and generalization.
Business Value
Enables more autonomous and adaptable robots and AI agents capable of learning and improving in complex, real-world environments, reducing the need for extensive human supervision and manual programming.