Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 85% Match Research Paper Earth Scientists,Climate Researchers,AI Researchers in Geoscience,Remote Sensing Specialists,Data Scientists 20 hours ago

OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data

computer-vision › scene-understanding
📄 Abstract

Abstract: Existing benchmarks for multimodal learning in Earth science offer limited, siloed coverage of Earth's spheres and their cross-sphere interactions, typically restricting evaluation to the human-activity sphere of atmosphere and to at most 16 tasks. These limitations: \textit{narrow-source heterogeneity (single/few data sources), constrained scientific granularity, and limited-sphere extensibility}. Therefore, we introduce \textbf{OmniEarth-Bench}, the first multimodal benchmark that systematically spans all six spheres: atmosphere, lithosphere, oceanosphere, cryosphere, biosphere, and human-activity sphere, and cross-spheres. Built with a scalable, modular-topology data inference framework and native multi-observation sources and expert-in-the-loop curation, OmniEarth-Bench produces 29,855 standardized, expert-curated annotations. All annotations are organized into a four-level hierarchy (Sphere, Scenario, Ability, Task), encompassing 109 expert-curated evaluation tasks. Experiments on 9 state-of-the-art MLLMs reveal that even the most advanced models struggle with our benchmarks, where none of them reach 35\% accuracy, revealing systematic gaps in Earth-system cognitive ability. The dataset and evaluation code were released at OmniEarth-Bench (https://anonymous.4open.science/r/OmniEarth-Bench-B1BD).

Key Contributions

OmniEarth-Bench is introduced as the first multimodal benchmark to systematically span all six spheres of Earth (atmosphere, lithosphere, oceanosphere, cryosphere, biosphere, human-activity) and their interactions. It utilizes a scalable data inference framework and expert curation to provide 109 evaluation tasks organized hierarchically.

Business Value

Provides a standardized and comprehensive platform for developing and evaluating AI models for Earth science applications, accelerating progress in areas like climate change monitoring, disaster prediction, and resource management.