2001–2022 global gross primary productivity dataset using an ensemble model based on random forest
Abstract. The continuous advancement of remote sensing technology has been instrumental in improving models for estimating terrestrial gross primary productivity (GPP). However, challenges arise from inconsistent spatial distributions and interannual variations in GPP datasets, impeding our comprehensive understanding of the entire terrestrial carbon cycle. In contrast to previous models relying on remote sensing and environmental variables, we developed a an ensemble model based on random forest named GPPERF. This model utilized the GPP outputs from established remote sensing-based models (EC-LUE, GPP-kNDVI, GPP-NIRv, Revised-EC-LUE) as inputs for GPP estimations. GPPERF demonstrated significant effectiveness by explaining 83.7 % of the monthly variation in GPP across 171 sites. This performance surpassed the selected remote sensing models (72.4 %–77.1 %) and an independent random forest model using remote sensing and environmental variables (77.7 %). Over the period from 2001 to 2022, the global estimated GPP value using the ensemble model based on random forest was 131.2 PgC yr-1, exhibiting a trend of 0.45 PgC yr-2. Furthermore, evaluation results utilizing flux sites from ChinaFlux indicated that the dateset exhibited good generalization. In summary, the machine learning-based ensemble method helps to reduce the uncertainty in the estimation of a single remote sensing model and provides a more reliable estimation of global GPP.
Viewed (geographical distribution)