Preprints
https://doi.org/10.5194/egusphere-2026-1401
https://doi.org/10.5194/egusphere-2026-1401
19 Mar 2026
 | 19 Mar 2026
Status: this preprint is open for discussion.

GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020

Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Abstract. The advent of big data in Earth Observation (EO), coupled with recent advances in Artificial Intelligence, has led to the development of geospatial embeddings that are compact, information-rich feature vectors designed to be ready-to-use in machine learning (ML) applications for a wide range of downstream tasks, including forest monitoring. Motivated by the limitations of existing global forest products and by policy requirements such as the EU Deforestation Regulation (EUDR), we assess whether lightweight classifiers applied to satellite embeddings from the Google DeepMind Alpha Earth Foundation (AEF) can accurately map global forest and tree crop extents. In this study, we introduce GEM-Forest, a global satellite embedding–based forest dataset in 10 m spatial resolution for 2020, and its associated products: GEM-FnF2020, a forest / non-forest (F/nF) classification, and GEM-TC2020, which further distinguishes non-forest areas containing tree crops. Using ∼47,000 globally distributed training samples covering all major biomes, collected through an automated approach combining multiple forest-related, land cover and tree crop datasets, we compared multiple ML approaches ranging from linear models to neural networks. Accuracy assessment on a global F/nF dataset with ∼21,000 samples showed similar performance across classifiers, with overall accuracies of 90–92 % and macro F1-scores of 0.89–0.90, while linear models often outperformed more complex approaches. The validation of the tree crop subclass across 10 datasets showed larger differences among different ML models, with the highest accuracies achieved mostly by linear models. This consistency indicates that the embeddings encode highly informative and linearly separable structure for global F/nF discrimination, including tree-crop separation. A linear Support Vector Machine was therefore used to generate GEM-FnF2020 that achieves a 91 % overall accuracy, a macro F1-score of 0.90, with balanced omission and commission error rates for forests (15 % and 13 %, respectively). These results match or exceed existing global products, with most errors occurring in open forests and forest–shrubland transition zones. Residual misclassifications of tree crops as forests in GEM-FnF2020 ranged from 0.5 % to 14.8 %, which demonstrates the importance of including the tree crop subclass in the GEM-TC2020 map. The GEM-TC2020 enables distinction of agricultural tree crops with an overall accuracy higher than 85 % for most tree crops, while the classification of European tree crops remains the most challenging. The classified tree crop class significantly improves the commission error rates in the main GEM-FnF2020 product (0.5–14.8 %). Our proposed approach demonstrates strong potential for temporal transferability across the 2017–2025 period covered by AEF embeddings. This capability allows multi-year applications and change detection based on models trained for a single year and represents a key next step in our research. Overall, the findings demonstrate that AEF embeddings combined with simple ML approaches support accurate, transferable, and computationally-efficient global forest mapping, with remaining limitations related to temporal resolution and feature interpretability. These results and the presented approach can support policy and regulatory decisions, including the EUDR, while the open-access release of the GEM-Forest datasets and trained models facilitates global use, further testing, and methodological development by the EO and forest monitoring communities.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Status: open

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Model code and software

Supporting codes for the GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020 Daniel Paluba and Valerio Marsocci https://github.com/palubad/GEM-Forest

Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Viewed

Total article views: 301 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
217 77 7 301 3 1
  • HTML: 217
  • PDF: 77
  • XML: 7
  • Total: 301
  • BibTeX: 3
  • EndNote: 1
Views and downloads (calculated since 19 Mar 2026)
Cumulative views and downloads (calculated since 19 Mar 2026)

Viewed (geographical distribution)

Total article views: 292 (including HTML, PDF, and XML) Thereof 292 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 22 Mar 2026
Download
Short summary
We created a new global map of forests and tree crops for 2020 at 10 meter resolution using satellite embeddings. After testing several machine learning methods, we found that simple linear models performed as well as or better than more complex ones. The map identifies forests with 91 % accuracy and separates tree crops with low confusion with forests. This shows that satellite embeddings can support reliable and efficient global forest monitoring and inform international and national policies.
Share