Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Today, Earth Observation (EO) satellites generate massive volumes of data. To
fully exploit this, it is essential to pretrain EO Foundation Models (FMs) on
large unlabeled datasets, enabling efficient fine-tuning for downstream tasks
with minimal labeled data. In this paper, we study scaling-up FMs: we train our
models on the pretraining dataset MajorTOM 23TB which includes all regions, and
the performance on average is competitive versus models pretrained on more
specialized datasets which are substantially smaller and include only land. The
additional data of oceans and ice do not decrease the performance on
land-focused downstream tasks. These results indicate that large FMs trained on
global datasets for a wider variety of downstream tasks can be useful for
downstream applications that only require a subset of the information included
in their training. The second contribution is the exploration of U-Net
Convolutional Neural Network (CNN), Vision Transformers (ViT), and Mamba
State-Space Models (SSM) as FMs. U-Net captures local correlations amongst
pixels, while ViT and Mamba capture local and distant correlations. We develop
various models using different architectures, including U-Net, ViT, and Mamba,
and different number of parameters. We evaluate the FLoating-point OPerations
(FLOPs) needed by the models. We fine-tune on the PhilEO Bench for different
downstream tasks: roads, buildings, and land cover. For most n-shots for roads
and buildings, U-Net 200M-2T outperforms the other models. Using Mamba, we
achieve comparable results on the downstream tasks, with less computational
expenses. We also compare with the recent FM TerraMind which we evaluate on
PhilEO Bench.