Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 98% Match Research Paper Robotics researchers,AI researchers,Embodied AI developers,Computer vision engineers 1 week ago

BLM$_1$: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning

large-language-models › multimodal-llms
📄 Abstract

Abstract: Multimodal large language models (MLLMs) have advanced vision-language reasoning and are increasingly deployed in embodied agents. However, significant limitations remain: MLLMs generalize poorly across digital-physical spaces and embodiments; vision-language-action models (VLAs) produce low-level actions yet lack robust high-level embodied reasoning; and most embodied large language models (ELLMs) are constrained to digital-space with poor generalization to the physical world. Thus, unified models that operate seamlessly across digital and physical spaces while generalizing across embodiments and tasks remain absent. We introduce the \textbf{Boundless Large Model (BLM$_1$)}, a multimodal spatial foundation model that preserves instruction following and reasoning, incorporates embodied knowledge, and supports robust cross-embodiment control. BLM$_1$ integrates three key capabilities -- \textit{cross-space transfer, cross-task learning, and cross-embodiment generalization} -- via a two-stage training paradigm. Stage I injects embodied knowledge into the MLLM through curated digital corpora while maintaining language competence. Stage II trains a policy module through an intent-bridging interface that extracts high-level semantics from the MLLM to guide control, without fine-tuning the MLLM backbone. This process is supported by a self-collected cross-embodiment demonstration suite spanning four robot embodiments and six progressively challenging tasks. Evaluations across digital and physical benchmarks show that a single BLM$_1$ instance outperforms four model families -- MLLMs, ELLMs, VLAs, and GMLMs -- achieving $\sim\!\textbf{6%}$ gains in digital tasks and $\sim\!\textbf{3%}$ in physical tasks.
Authors (18)
Wentao Tan
Bowen Wang
Heng Zhi
Chenyu Liu
Zhe Li
Jian Liu
+12 more
Submitted
October 28, 2025
arXiv Category
cs.AI
arXiv PDF

Key Contributions

Introduces the Boundless Large Model (BLM$_1$), a multimodal spatial foundation model that unifies capabilities across digital-physical spaces, tasks, and embodiments. It preserves instruction following and reasoning, incorporates embodied knowledge, and supports robust cross-embodiment control, addressing key limitations of current MLLMs and VLAs.

Business Value

Enables the development of more versatile and adaptable robots and AI agents that can operate seamlessly in both virtual and real-world environments, accelerating progress in robotics and human-AI collaboration.