Machine-learning projections of boreal forest gross primary productivity under future climate change
Abstract. Modelling forest productivity under climate variability and change remains a key challenge in terrestrial ecology. The increasing availability of long-term eddy covariance data enables these relationships to be explored using data-driven machine-learning approaches. In this study, we model the relationship between four key environmental variables, namely, photosynthetically active radiation (PAR), atmospheric CO2 concentration, air temperature, and relative humidity, and gross primary productivity (GPP) using linear regression, eXtreme Gradient Boosting (XGBoost), and deep neural networks. To capture variability across temporal scales, the input variables are decomposed into trend, seasonal, and residual components representing long-term, seasonal, and inter-daily fluctuations. The models are trained on seventeen years (2003–2019) of daily observations from the boreal forest monitoring station at Hyytiälä, Finland, and evaluated on an independent five-year period (2020–2025). The trained models are then used to project GPP under four Shared Socioeconomic Pathway (SSP) scenarios from the Intergovernmental Panel on Climate Change using climate forcings derived from Coupled Model Intercomparison Project Phase 6 (CMIP6) simulations.
All models show strong predictive skill (R2=0.79–0.90; RMSE =1.03–1.49), with the DNN performing best overall. SHAP (SHapley Additive exPlanations) analysis identifies the residual component of PAR as the most influential predictor of GPP across all models. More broadly, residual components across multiple inputs show high predictive importance, suggesting that short-term variability in environmental conditions may play an important role in explaining modeled GPP fluctuations. Projections driven by an ensemble of four CMIP6 climate models suggest relatively stable GPP during the mid-century period across SSP scenarios, followed by an overall increase toward the end of the century, particularly under higher-emission pathways. Overall, this study demonstrates the potential of combining long-term ecosystem observations, engineered environmental features, and climate-model projections to generate localized forecasts of forest productivity. However, we highlight the need for cautious interpretation of these results, since these are data-driven models that are being extrapolated beyond historical climate conditions.