Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Semantic occupancy has emerged as a powerful representation in world models
for its ability to capture rich spatial semantics. However, most existing
occupancy world models rely on static and fixed embeddings or grids, which
inherently limit the flexibility of perception. Moreover, their "in-place
classification" over grids exhibits a potential misalignment with the dynamic
and continuous nature of real scenarios.In this paper, we propose SparseWorld,
a novel 4D occupancy world model that is flexible, adaptive, and efficient,
powered by sparse and dynamic queries. We propose a Range-Adaptive Perception
module, in which learnable queries are modulated by the ego vehicle states and
enriched with temporal-spatial associations to enable extended-range
perception. To effectively capture the dynamics of the scene, we design a
State-Conditioned Forecasting module, which replaces classification-based
forecasting with regression-guided formulation, precisely aligning the dynamic
queries with the continuity of the 4D environment. In addition, We specifically
devise a Temporal-Aware Self-Scheduling training strategy to enable smooth and
efficient training. Extensive experiments demonstrate that SparseWorld achieves
state-of-the-art performance across perception, forecasting, and planning
tasks. Comprehensive visualizations and ablation studies further validate the
advantages of SparseWorld in terms of flexibility, adaptability, and
efficiency. The code is available at https://github.com/MSunDYY/SparseWorld.
Authors (9)
Chenxu Dang
Haiyan Liu
Guangjun Bao
Pei An
Xinyue Tang
An Pan
+3 more
Submitted
October 20, 2025
Key Contributions
Introduces SparseWorld, a novel 4D occupancy world model using sparse and dynamic queries for flexibility and efficiency. It features a Range-Adaptive Perception module for extended-range sensing and a State-Conditioned Forecasting module that uses regression for accurate dynamic scene prediction, overcoming limitations of grid-based methods.
Business Value
Enhances the perception capabilities of autonomous systems, leading to safer and more reliable navigation and interaction in complex, dynamic environments.