Preprints
https://doi.org/10.5194/egusphere-2026-3281
https://doi.org/10.5194/egusphere-2026-3281
24 Jun 2026
 | 24 Jun 2026
Status: this preprint is open for discussion and under review for Biogeosciences (BG).

Machine-learning projections of boreal forest gross primary productivity under future climate change

Sandip Varkey George, Cameron F. T. Pope, Stella Jes Varghese, and Ron Sunny

Abstract. Modelling forest productivity under climate variability and change remains a key challenge in terrestrial ecology. The increasing availability of long-term eddy covariance data enables these relationships to be explored using data-driven machine-learning approaches. In this study, we model the relationship between four key environmental variables, namely, photosynthetically active radiation (PAR), atmospheric CO2 concentration, air temperature, and relative humidity, and gross primary productivity (GPP) using linear regression, eXtreme Gradient Boosting (XGBoost), and deep neural networks. To capture variability across temporal scales, the input variables are decomposed into trend, seasonal, and residual components representing long-term, seasonal, and inter-daily fluctuations. The models are trained on seventeen years (2003–2019) of daily observations from the boreal forest monitoring station at Hyytiälä, Finland, and evaluated on an independent five-year period (2020–2025). The trained models are then used to project GPP under four Shared Socioeconomic Pathway (SSP) scenarios from the Intergovernmental Panel on Climate Change using climate forcings derived from Coupled Model Intercomparison Project Phase 6 (CMIP6) simulations.

All models show strong predictive skill (R2=0.79–0.90; RMSE =1.03–1.49), with the DNN performing best overall. SHAP (SHapley Additive exPlanations) analysis identifies the residual component of PAR as the most influential predictor of GPP across all models. More broadly, residual components across multiple inputs show high predictive importance, suggesting that short-term variability in environmental conditions may play an important role in explaining modeled GPP fluctuations. Projections driven by an ensemble of four CMIP6 climate models suggest relatively stable GPP during the mid-century period across SSP scenarios, followed by an overall increase toward the end of the century, particularly under higher-emission pathways. Overall, this study demonstrates the potential of combining long-term ecosystem observations, engineered environmental features, and climate-model projections to generate localized forecasts of forest productivity. However, we highlight the need for cautious interpretation of these results, since these are data-driven models that are being extrapolated beyond historical climate conditions.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Sandip Varkey George, Cameron F. T. Pope, Stella Jes Varghese, and Ron Sunny

Status: open (until 05 Aug 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Sandip Varkey George, Cameron F. T. Pope, Stella Jes Varghese, and Ron Sunny
Sandip Varkey George, Cameron F. T. Pope, Stella Jes Varghese, and Ron Sunny
Metrics will be available soon.
Latest update: 24 Jun 2026
Download
Short summary
Forests play an important role in absorbing carbon dioxide, but their response to climate change is difficult to predict. We used long-term observations from a forest in Finland to train machine learning models of productivity and applied future climate projections to explore changes under different scenarios. We observe that the models capture seasonal influences well and provide useful estimates of future productivity, offering a framework for local predictions using large-scale climate data.
Share