OlmoEarth v1.1: A more efficient family of models

5/19/2026

·~4 min·5/19/2026·en·2

Quick Answer

OlmoEarth v1.1, released by Hugging Face, offers a 3x reduction in compute costs while maintaining performance on various benchmarks.

Quick Take

OlmoEarth v1.1, released by Hugging Face, offers a 3x reduction in compute costs while maintaining performance on various benchmarks. This new model family enhances efficiency for remote sensing tasks, enabling faster and more affordable satellite imagery processing for partners and developers.

Key Points

OlmoEarth v1.1 reduces compute costs by up to 3x compared to v1.
The model family supports efficient processing of satellite imagery across large areas.
Token design improvements enhance performance without increasing computational costs.
Developers can achieve similar results with one-third of the compute resources.
Research aims to isolate methodological impacts on performance differences.

📖 Reader Mode

~4 min read

Kyle Wiggers

Back to Articles

🧠 Models: https://huggingface.co/collections/allenai/olmoearth | 📄 Tech Report: https://allenai.org/papers/olmoearth_v1_1 | 💻 Code: https://github.com/allenai/olmoearth_pretrain

We released OlmoEarth (v1) in November 2025. Since then, partners have applied it across a wide range of tasks, from tracking mangrove change to classifying drivers of forest loss to producing country-scale crop-type maps in days, scaling deployments to national, continental, and global areas. Every release moves us closer to our mission: bringing state-of-the-art AI to organizations and communities working to protect people and our planet.

When OlmoEarth processes satellite imagery to make predictions across tens to hundreds of thousands of square kilometers, efficiency shapes what’s possible. Over the full lifecycle of running OlmoEarth – data export, preprocessing, inference, and post-processing – compute is by far the highest cost. A more efficient model means we can support more partners on the OlmoEarth Platform, and that anyone running OlmoEarth on their own can leverage this technology faster and at lower expense.

That’s why we built OlmoEarth v1.1: a new family of models that cuts compute costs by up to 3x while maintaining OlmoEarth v1's performance on a mix of research benchmarks and tasks we’ve constructed with partners.

Increasing efficiency by decreasing sequence lengths

The OlmoEarth models are transformer-based models, one of the dominant architectures in machine learning today. To process remote sensing data, we first convert it into a sequence of tokens the model can ingest.

Two important levers control efficiency in transformer-based models: model size (this is why we release a family of models, so users can pick the size that fits their compute budget) and token sequence length. Compute costs scale quadratically with the token sequence length, so even small reductions can meaningfully cut the cost of running the model.

MACs, or multiply-accumulate operations, estimate the computation needed for one model forward pass; lower MACs generally mean cheaper, faster inference. The y-axis is inverted because lower average rank is better. Labels show model family and size. All plotted points use the pasted MAC/rank values.

Designing the token

This raises an important question for transformer-based remote sensing models: what should a token represent?

Take Sentinel-2 imagery, a common modality we process. A Sentinel-2 input will be some tensor with a height and width (H, W representing the latitudinal and longitudinal pixels), a temporal dimension T, and 12 Sentinel-2 channels ([H, W, T, D=12]).

Currently, we split the data into resolution-based patches. Concretely, this means that we will pick some spatial patch size p, and split our overall Sentinel-2 image into patches of size p x p:

For each patch, we create a token per timestep per resolution. So a Sentinel-2 input with 2 timesteps yields 6 tokens per patch (2 timesteps x 3 resolutions, 10m, 20m, and 60m).

In total, a[H, W, T, D=12] Sentinel-2 input will yield H/p x W/p x T x 3 tokens.

Using a unique token per resolution is a common technique when processing Sentinel-2 data—Galileo and SatMAE both take this approach, and SatMAE shows significantly better results when doing it. However, it is not universal: CROMA is a model that only uses a single token for all bands, regardless of resolution. Because token counts compound multiplicatively, collapsing resolutions into a single token produces three times fewer tokens and material savings across pretraining, fine-tuning, and inference.

Naively combining the tokens in this way leads to significant performance drops, including a 10 ppt drop on m-eurosat kNN (a common benchmark task for remote sensing models). We hypothesize that separating Sentinel-2 bands into different tokens makes it easier for OlmoEarth to model important cross-band relationships.

Merging tokens without impacting performance required us to modify our pre-training regimen. We describe those changes in detail in our paper.

For developers

The result is a model family that does more with less. At every size, OlmoEarth v1.1 runs up to three times cheaper than OlmoEarth v1, making frequent, planet-scale map refreshes more affordable for every team running OlmoEarth. If you're using a model from the original OlmoEarth family, try OlmoEarth v1.1. It provides similar performance to OlmoEarth v1 while requiring one third of the compute, though we have seen some regressions (see our technical report for more details). If it works for your task, you should see a significant speedup during fine-tuning and inference.

For researchers

Pretrained remote sensing models have many degrees of freedom, which makes them hard to study. When performance shifts, is it the architecture, the dataset, or the pre-training algorithm?

We train OlmoEarth v1.1 on the same dataset as OlmoEarth v1, so any differences between the two isolate the effect of methodological changes. We hope this advances understanding of scientific principles when pretraining models for remote sensing.

Get started

Check out the OlmoEarth v1.1 weights and training code, including the weights for our Base, Tiny, and Nano models.

— Originally published at huggingface.co

Continue reading on huggingface.co

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from Hugging Face

See more →

Hugging Face

4d ago

FeaturedOriginal

Why Specialization Is Inevitable

AI Summary

The article argues that specialization in AI models is unavoidable due to the increasing complexity and performance demands of tasks. Companies like OpenAI and Google are developing tailored models, such as GPT-4 and PaLM, which outperform general-purpose models by significant margins. This trend necessitates a shift in how organizations approach AI deployment, focusing on specific applications rather than one-size-fits-all solutions.

#LLM #Open Source #AI Startup #Enterprise AI