Preprints
https://doi.org/10.5194/egusphere-2026-1401
https://doi.org/10.5194/egusphere-2026-1401
19 Mar 2026
 | Subsequently updated
 | 19 Mar 2026 | Subsequently updated
Status: this preprint is open for discussion.

GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020

Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Abstract. The advent of big data in Earth observation, coupled with recent advances in Artificial Intelligence, has led to the development of geospatial embeddings. These compact, information-rich feature vectors are designed for direct use in machine learning (ML) applications across a wide range of downstream tasks, such as forest monitoring. Motivated by the limitations of existing global forest products and policy requirements like the EU Deforestation Regulation (EUDR), we (1) evaluate whether lightweight classifiers applied to satellite embeddings from the Google DeepMind Alpha Earth Foundation (AEF) can accurately map global forest and tree crop extents, and (2) we introduce the resulting GEM-Forest dataset. GEM-Forest is a global satellite embedding–based dataset at 10 m spatial resolution for 2020 that provides a consistent classification across three classes: forest, non-forest, and tree crops. Our comparison of multiple ML approaches ranging from linear models to neural networks showed similar performance across classifiers, while linear models often outperformed more complex models. This consistency indicates that the embeddings encode highly informative and linearly separable structures for global forest discrimination, which includes tree crop separation. Based on these findings, a linear Support Vector Machine was used to generate the final GEM-Forest dataset, which outperformed eight existing global forest, tree cover, or land cover maps on two global validation datasets, while it placed second on the JRC’s global forest validation dataset. Across all three datasets, the forest class achieved omission errors of 12–18% and commission errors of 16–21%, with overall accuracies from 88% to 92%. Misclassifications of tree crops as forests varied between 0.5% and 14.8%, with a producer’s accuracy above 85% for most tree crop datasets, whereas the classification of European tree crops remains the most challenging. Globally, GEM-Forest maps 3,919 million hectares (Mha) of forest for 2020, representing a 5.9% underestimation relative to FAO reports. This variance is partly attributed to the exclusion of unstocked forest areas from our forest definition, discrepancies in country-based forest definitions, and misclassification errors that occurred primarily within open forests and forest–shrubland transition zones. Overall, these results demonstrate that satellite embeddings combined with simple ML approaches support highly accurate, computationally efficient global forest and tree crop mapping. The open-access release of the GEM-Forest dataset and its ML model weights (both available in Paluba et al., 2026; DOI: https://doi.org/10.5281/zenodo.18921586) can support international policy decisions and allows direct and straightforward temporal transferability for other years.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Status: open

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • Version 2 | 19 Jun 2026

  • Version 1 | 19 Mar 2026

    CC1: 'Comment on egusphere-2026-1401', Vanna Teck, 08 May 2026 reply
Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Data sets

GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020 Daniel Paluba, Valerio Marsocci, Katarína Onačillová Yarin T. Puerta Quintana, Adam Hastie https://doi.org/10.5281/zenodo.18921586

Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Viewed

Total article views: 3,046 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
2,117 866 63 3,046 0 41 60
  • HTML: 2,117
  • PDF: 866
  • XML: 63
  • Total: 3,046
  • Supplement: 0
  • BibTeX: 41
  • EndNote: 60
Views and downloads (calculated since 19 Mar 2026)
Cumulative views and downloads (calculated since 19 Mar 2026)

Viewed (geographical distribution)

Total article views: 3,035 (including HTML, PDF, and XML) Thereof 3,035 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 02 Jul 2026
Download
Former versions
  • V1, 19 Mar 2026
Short summary
We created a new global map of forests and tree crops for 2020 at 10 m resolution using satellite embeddings. After testing many machine learning methods, we found that simple linear models performed as well as or better than more complex ones. Forest/non-forest map achieves 92% overall accuracy and separates tree crops with low confusion with forests. This shows that satellite embeddings can support reliable and efficient global forest monitoring and inform international and national policies.
Share