EarthShift: a benchmark for measuring robustness to real-world distribution shifts in Earth observation
Quick Take
EarthShift is the first public benchmark for assessing robustness in Earth observation models against real-world distribution shifts. Testing on 8 geospatial foundation models reveals a consistent 15-20% performance drop in out-of-distribution scenarios, emphasizing the need for improved distributional robustness in future research. The code and datasets are publicly available to facilitate further advancements.
Key Points
- EarthShift benchmarks robustness across various distribution shifts in remote sensing.
- 8 geospatial foundation models tested show 15-20% worse performance out-of-distribution.
- Robustness findings are similar for generic vision foundation models and fully-supervised models.
- The benchmark aims to guide future research towards reliable real-world applications.
- Code and datasets are available at https://earthshift.github.io.
Article Content
From source RSS / original summaryarXiv:2605. 29330v1 Announce Type: new Abstract: Current Earth observation benchmarks focus on measuring performance on diverse tasks and applications, typically measuring generalization in-distribution. But when models are deployed, they must generalize to myriad out-of-distribution scenarios, such as new time periods, geographies, scales, and sensors. We introduce EarthShift: the first public testbed for benchmarking robustness across multiple realistic distribution shifts encountered in remote sensing.
EarthShift enables users to measure distributional robustness by comparing performance in- and out-of-distribution using datasets from paired datasets from different sources, temporal windows, geographic locations, and sensors. Our experiments on 8 geospatial foundation models (GFMs) and 11 tasks covering 5 shift types show that GFMs consistently perform 15-20% worse out-of-distribution on average regardless of model architecture, size, pre-training or fine-tuning strategy.
We show that GFM robustness is similar to that of generic vision foundation models, and even fully-supervised models. This highlights a need for future research to strive for improvements in distributional robustness, not just performance, which can be benchmarked using EarthShift. We release our code and datasets to provide a testbed to guide future work to create foundation models that are robust and reliable in real-world applications. Code and data for EarthShift are available at: https://earthshift. github. io
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.