arxiv_ml 85% Match Research Paper Earth Scientists,Climate Researchers,AI Researchers in Geoscience,Remote Sensing Specialists,Data Scientists 20 hours ago

OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data

computer-vision › scene-understanding

📄 Abstract

Abstract: Existing benchmarks for multimodal learning in Earth science offer limited, siloed coverage of Earth's spheres and their cross-sphere interactions, typically restricting evaluation to the human-activity sphere of atmosphere and to at most 16 tasks. These limitations: \textit{narrow-source heterogeneity (single/few data sources), constrained scientific granularity, and limited-sphere extensibility}. Therefore, we introduce \textbf{OmniEarth-Bench}, the first multimodal benchmark that systematically spans all six spheres: atmosphere, lithosphere, oceanosphere, cryosphere, biosphere, and human-activity sphere, and cross-spheres. Built with a scalable, modular-topology data inference framework and native multi-observation sources and expert-in-the-loop curation, OmniEarth-Bench produces 29,855 standardized, expert-curated annotations. All annotations are organized into a four-level hierarchy (Sphere, Scenario, Ability, Task), encompassing 109 expert-curated evaluation tasks. Experiments on 9 state-of-the-art MLLMs reveal that even the most advanced models struggle with our benchmarks, where none of them reach 35\% accuracy, revealing systematic gaps in Earth-system cognitive ability. The dataset and evaluation code were released at OmniEarth-Bench (https://anonymous.4open.science/r/OmniEarth-Bench-B1BD).

Key Contributions

OmniEarth-Bench is introduced as the first multimodal benchmark to systematically span all six spheres of Earth (atmosphere, lithosphere, oceanosphere, cryosphere, biosphere, human-activity) and their interactions. It utilizes a scalable data inference framework and expert curation to provide 109 evaluation tasks organized hierarchically.

Business Value

Provides a standardized and comprehensive platform for developing and evaluating AI models for Earth science applications, accelerating progress in areas like climate change monitoring, disaster prediction, and resource management.

Paper Metadata

Innovation Type

Benchmark / Dataset Creation

Deployment Feasibility

High, as it is a benchmark/dataset, enabling research and development.

Limitations Addressed

Existing benchmarks in Earth science multimodal learning suffer from limited scope (few spheres, narrow tasks), narrow data sources, and lack of extensibility.

Performance Gains

Enables comprehensive, holistic evaluation of multimodal models across Earth's systems.

Technical Tags

Multimodal LearningEarth ScienceEarth's SpheresCross-Sphere InteractionsOmniEarth-BenchData Inference FrameworkExpert-in-the-LoopAnnotation HierarchyHolistic EvaluationRemote SensingClimate Science

Research Topics

Earth ScienceMultimodal LearningMachine Learning BenchmarkingClimate ModelingEnvironmental Science

Methods & Architectures

Multimodal Data IntegrationScalable Data Inference FrameworkExpert CurationHierarchical Annotation

Applications & Tasks

Earth Science Climate Science Environmental Monitoring Remote Sensing Geoscience Limited Coverage of Earth's Spheres in BenchmarksSiloed EvaluationNarrow-Source HeterogeneityConstrained Scientific GranularityLimited Sphere Extensibility Holistic Evaluation of Earth SystemsAssessing Cross-Sphere InteractionsMultimodal Earth Observation Analysis

Datasets & Benchmarks

Benchmarks

OmniEarth-Bench (109 expert-curated evaluation tasks)

Related Fields

Geospatial AnalysisEnvironmental ScienceClimate Change ResearchData ScienceMachine Learning

Keywords

Earth ScienceMultimodal LearningBenchmarkOmniEarth-BenchEarth's SpheresCross-Sphere InteractionsRemote SensingClimate ScienceEnvironmental MonitoringData AnnotationMachine Learning EvaluationHolisticGeoscience

Academic Context

#Earth Science#Multimodal Learning#Machine Learning Benchmarking#Climate Modeling#Environmental Science

Commercial Potential

Potential Products

AI-powered Earth observation platformsClimate modeling and prediction toolsEnvironmental monitoring systems

Target Industries

Environmental ServicesGovernment (Climate Agencies)AgricultureEnergyInsurance (Disaster Risk)

Use Case Examples

Predicting extreme weather events by analyzing interactions between atmosphere and oceanMonitoring deforestation and land use changes across biosphere and lithosphereAssessing the impact of climate change on polar ice caps (cryosphere)

Competitive Edge

Establishes a new standard for multimodal Earth science evaluation by providing unprecedented breadth and depth across all Earth spheres.

Market Opportunity

Significant investment in climate tech and Earth observation AI.

Revenue Models

N/A (Benchmark)

Resource Requirements

Compute Needs

High for training multimodal models on the benchmark, but the benchmark itself requires standard data storage and processing.

Data Requirements

Requires diverse multimodal observational Earth data covering all six spheres.

Deployment Constraints

The complexity of Earth systems means models trained on this benchmark will still require careful validation for specific real-world applications.

Scalability

The benchmark is designed with a scalable, modular topology to accommodate future data and tasks.

Regulatory Considerations

Data usage policies and potential restrictions on Earth observation data.

Production Readiness

Maturity Level

Research (Benchmark)

Time to Market

N/A (Benchmark)

Patent Potential

Low, as it is a benchmark dataset.

View Full Paper Back to Papers