GEM-Forest: A Global satellite EMbedding–based map of forests and tree crops for 2020
Abstract. The advent of big data in Earth observation, coupled with recent advances in Artificial Intelligence, has led to the development of geospatial embeddings. These compact, information-rich feature vectors are designed for direct use in machine learning (ML) applications across a wide range of downstream tasks, such as forest monitoring. Motivated by the limitations of existing global forest products and policy requirements like the EU Deforestation Regulation (EUDR), we (1) evaluate whether lightweight classifiers applied to satellite embeddings from the Google DeepMind Alpha Earth Foundation (AEF) can accurately map global forest and tree crop extents, and (2) we introduce the resulting GEM-Forest dataset. GEM-Forest is a global satellite embedding–based dataset at 10 m spatial resolution for 2020 that provides a consistent classification across three classes: forest, non-forest, and tree crops. Our comparison of multiple ML approaches ranging from linear models to neural networks showed similar performance across classifiers, while linear models often outperformed more complex models. This consistency indicates that the embeddings encode highly informative and linearly separable structures for global forest discrimination, which includes tree crop separation. Based on these findings, a linear Support Vector Machine was used to generate the final GEM-Forest dataset, which outperformed eight existing global forest, tree cover, or land cover maps on two global validation datasets, while it placed second on the JRC’s global forest validation dataset. Across all three datasets, the forest class achieved omission errors of 12–18% and commission errors of 16–21%, with overall accuracies from 88% to 92%. Misclassifications of tree crops as forests varied between 0.5% and 14.8%, with a producer’s accuracy above 85% for most tree crop datasets, whereas the classification of European tree crops remains the most challenging. Globally, GEM-Forest maps 3,919 million hectares (Mha) of forest for 2020, representing a 5.9% underestimation relative to FAO reports. This variance is partly attributed to the exclusion of unstocked forest areas from our forest definition, discrepancies in country-based forest definitions, and misclassification errors that occurred primarily within open forests and forest–shrubland transition zones. Overall, these results demonstrate that satellite embeddings combined with simple ML approaches support highly accurate, computationally efficient global forest and tree crop mapping. The open-access release of the GEM-Forest dataset and its ML model weights (both available in Paluba et al., 2026; DOI: https://doi.org/10.5281/zenodo.18921586) can support international policy decisions and allows direct and straightforward temporal transferability for other years.
Dear Authors,
Thank you for sharing this impressive work on global satellite embedding-based forest and tree crop mapping. I found the dataset and methodology very valuable, especially for large-scale applications and regional analysis.
I would like to provide a suggestion regarding the Cambodia region. Based on visual inspection, some areas appear to show potential overestimation of tree crops, particularly where deciduous forests may have been classified as tree crops. This issue seems noticeable in several locations, including:
In Cambodia, deciduous forests and some plantation systems can have similar seasonal spectral characteristics, which may contribute to confusion between natural forests and tree crop classes.
Thank you again for this excellent contribution. I look forward to seeing the future development of this dataset.