GEM-Forest: A Global satellite EMbedding&ndash;based map of forests and tree crops for 2020

Paluba, Daniel; Marsocci, Valerio; Onačillová, Katarína; Puerta Quintana, Yarin T.; Hastie, Adam

doi:10.5194/egusphere-2026-1401

Preprints

https://doi.org/10.5194/egusphere-2026-1401

Preprints

19 Mar 2026

| 19 Mar 2026

Status: this preprint is open for discussion.

GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020

Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Abstract. The advent of big data in Earth Observation (EO), coupled with recent advances in Artificial Intelligence, has led to the development of geospatial embeddings that are compact, information-rich feature vectors designed to be ready-to-use in machine learning (ML) applications for a wide range of downstream tasks, including forest monitoring. Motivated by the limitations of existing global forest products and by policy requirements such as the EU Deforestation Regulation (EUDR), we assess whether lightweight classifiers applied to satellite embeddings from the Google DeepMind Alpha Earth Foundation (AEF) can accurately map global forest and tree crop extents. In this study, we introduce GEM-Forest, a global satellite embedding–based forest dataset in 10 m spatial resolution for 2020, and its associated products: GEM-FnF2020, a forest / non-forest (F/nF) classification, and GEM-TC2020, which further distinguishes non-forest areas containing tree crops. Using ∼47,000 globally distributed training samples covering all major biomes, collected through an automated approach combining multiple forest-related, land cover and tree crop datasets, we compared multiple ML approaches ranging from linear models to neural networks. Accuracy assessment on a global F/nF dataset with ∼21,000 samples showed similar performance across classifiers, with overall accuracies of 90–92 % and macro F1-scores of 0.89–0.90, while linear models often outperformed more complex approaches. The validation of the tree crop subclass across 10 datasets showed larger differences among different ML models, with the highest accuracies achieved mostly by linear models. This consistency indicates that the embeddings encode highly informative and linearly separable structure for global F/nF discrimination, including tree-crop separation. A linear Support Vector Machine was therefore used to generate GEM-FnF2020 that achieves a 91 % overall accuracy, a macro F1-score of 0.90, with balanced omission and commission error rates for forests (15 % and 13 %, respectively). These results match or exceed existing global products, with most errors occurring in open forests and forest–shrubland transition zones. Residual misclassifications of tree crops as forests in GEM-FnF2020 ranged from 0.5 % to 14.8 %, which demonstrates the importance of including the tree crop subclass in the GEM-TC2020 map. The GEM-TC2020 enables distinction of agricultural tree crops with an overall accuracy higher than 85 % for most tree crops, while the classification of European tree crops remains the most challenging. The classified tree crop class significantly improves the commission error rates in the main GEM-FnF2020 product (0.5–14.8 %). Our proposed approach demonstrates strong potential for temporal transferability across the 2017–2025 period covered by AEF embeddings. This capability allows multi-year applications and change detection based on models trained for a single year and represents a key next step in our research. Overall, the findings demonstrate that AEF embeddings combined with simple ML approaches support accurate, transferable, and computationally-efficient global forest mapping, with remaining limitations related to temporal resolution and feature interpretability. These results and the presented approach can support policy and regulatory decisions, including the EUDR, while the open-access release of the GEM-Forest datasets and trained models facilitates global use, further testing, and methodological development by the EO and forest monitoring communities.

Received: 13 Mar 2026 – Discussion started: 19 Mar 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Status: open

Post a comment Subscribe to Comment Alert

Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Data sets

GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020 Daniel Paluba et al. https://zenodo.org/records/18921586?preview=1&token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6ImYxNGM0ZWI4LWY2OGYtNDlmOC1hMTUwLThkODRhNWQ4OTk0YSIsImRhdGEiOnt9LCJyYW5kb20iOiI3ZTY2MTQxNTZmNmJiMjEyMGFkNDVhOTI4MjljM2NlMSJ9.199SnzdIKByVuE8v1_908aJzYB9DqLjGCwzv0taD0nvHrpKBeP5X3xlOSscHP_m0WFE1hO5fIieKRE2kJFYEiQ

Model code and software

Supporting codes for the GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020 Daniel Paluba and Valerio Marsocci https://github.com/palubad/GEM-Forest

Daniel Paluba, Valerio Marsocci, Katarína Onačillová, Yarin T. Puerta Quintana, and Adam Hastie

Viewed

Total article views: 564 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
395	159	10	564	6	8

HTML: 395
PDF: 159
XML: 10
Total: 564
BibTeX: 6
EndNote: 8

Views and downloads (calculated since 19 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	347	132	10	489
Apr 2026	48	27	0	75

Cumulative views and downloads (calculated since 19 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	347	132	10	489
Apr 2026	48	27	0	75

Viewed (geographical distribution)

Total article views: 550 (including HTML, PDF, and XML) Thereof 550 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 12 Apr 2026

Short summary

We created a new global map of forests and tree crops for 2020 at 10 meter resolution using satellite embeddings. After testing several machine learning methods, we found that simple linear models performed as well as or better than more complex ones. The map identifies forests with 91 % accuracy and separates tree crops with low confusion with forests. This shows that satellite embeddings can support reliable and efficient global forest monitoring and inform international and national policies.


Total:	0
HTML:	0
PDF:	0
XML:	0