the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Unrecognised water limitation is a main source of uncertainty for models of terrestrial photosynthesis
Abstract. Quantification of environmental controls on ecosystem photosynthesis is essential to understand the impacts of climate change and extreme events on the carbon cycle and the provisioning of ecosystem services. Machine learning models have become popular for simulating ecosystem terrestrial photosynthesis because of their predictive skill, but often do not consider temporal dependencies in the data, even though process understanding suggests that these should exist. Here, we investigate how models that account for temporal structure impact the prediction of ecosystem photosynthesis. Using time-series measurements of ecosystem fluxes paired with measurements of meteorological variables from a network of globally distributed sites (N = 109) and remotely sensed vegetation indices, we train three different models to predict ecosystem gross primary production (GPP): a mechanistic, theory-based photosynthesis model, a memoryless multilayer perceptron (MLP) and a recurrent neural network (Long Short-Term Memory, LSTM). Through comparisons of patterns in model error, we assess the ability of these models to predict GPP across a wide diversity of ecosystems and climates, and to account for temporal dependencies, with a focus on effects by low rooting zone moisture and freezing air temperatures. We find that both deep learning models outperform the mechanistic model, and that the LSTM performs best with an R2 of 0.74 for spatial out-of-sample predictions. In particular, model skill is consistently good across moist sites with strong seasonality. Model error tends to increase with increasing potential cumulative water deficits, in particular in ecosystems with evergreen vegetation. Generalisation patterns reveal that the LSTM tends to be more successful than the MLP in simulating GPP in dry environments, suggesting an advantage of recurrent models in those conditions. However, a large variability in model skill across relatively dry sites remained. Insufficient information on the exposure and response to water stress and related effects on GPP appear to be dominant sources of error for modelling ecosystem fluxes across the globe. With the increasing frequency of hydroclimatic extreme events, effects of water limitation are expected to become more prevalent, which calls for models that better represent its impact on ecosystem function.
- Preprint
(5518 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-1617', Anonymous Referee #1, 02 Jun 2025
- AC1: 'Reply on RC1', Samantha Biegel, 04 Jul 2025
-
RC2: 'Comment on egusphere-2025-1617', Anonymous Referee #2, 09 Jun 2025
Biegel et al. (2025) compare the performance of three modeling approaches for predicting gross primary productivity (GPP): an optimality-based photosynthesis model (the P-model), a memoryless multilayer perceptron (MLP), and a recurrent neural network (RNN) with long short-term memory (LSTM) that incorporates memory of past environmental conditions. The models are evaluated in two settings: (1) site-level training and evaluation to assess temporal dynamics, and (2) cross-site evaluation to assess spatial generalization. The authors report that both MLP and LSTM outperform the P-model, with LSTM showing the greatest advantage under dry conditions due to its ability to account for temporal memory. However, they also note that all models struggle to accurately simulate GPP at certain dry sites, likely due to limited representation of water stress and its impact on carbon fluxes.
While the modeling framework and experimental setup are generally sound and the results compelling, the manuscript revisits a well-studied topic in the eco-hydrological machine learning literature. Numerous recent studies have explored the use of RNNs and LSTMs to simulate ecological processes and quantify memory effects (e.g., Montero et al., 2024; Agarwal et al., 2023; Cattry et al., 2025; Kraft et al., 2019, 2021; Wesselkamp et al., 2025; Zhao et al., 2025). Many of these works directly compare memory-based RNNs to memoryless architectures in the context of simulating vegetation states or fluxes, often using methodologies closely aligned with the current study. This overlap does not diminish the potential contribution of Biegel et al., but it raises the bar for novelty and interpretability. Unfortunately, that potential remains underdeveloped in the current version of the manuscript.
For instance, in Figure 4, the authors compare absolute percentage errors of GPP predictions by aggregating results across PCWD bins. While this illustrates that LSTM and MLP models outperform the P-model under drier conditions, it does not clearly demonstrate that the LSTM's advantage stems specifically from its capacity to leverage temporal dependencies—such as drought memory or cold acclimation—rather than simply from its increased architectural complexity. To strengthen this argument, the authors could analyze model performance during known extreme events (e.g., multi-week droughts or cold spells) and assess whether LSTM models exhibit better generalization or resilience. Additionally, applying permutation-based methods (as in Kraft et al., 2019) or interpretable machine learning tools (e.g., Integrated Gradients, as in Zhao et al., 2025) could help identify which temporal features or variables most strongly drive model predictions under different environmental conditions.
The manuscript also includes a site-level versus global model comparison, which is a valuable angle. However, the practical implications of the observed performance gaps remain unclear. Such gaps could arise from various sources, including uneven spatial representation of ecosystems, differences in training data length, observational uncertainties, or intrinsic ecological variability. While Figure 8 suggests moisture index differences might explain some of the discrepancies, it is not clear whether these reflect genuine ecological signals or merely data sampling artifacts. These challenges—especially related to data imbalance and representativeness—are long-standing limitations in upscaling efforts like FLUXCOM and FLUXCOM-X. The authors could add considerable value by disentangling these effects and explicitly attributing performance gaps to either ecological complexity or sampling limitations.
Moreover, the discussion around the mechanistic model (P-model) and its comparison to data-driven approaches could be expanded. For instance, while the manuscript notes that the P-model encodes “rigid functional dependencies,” it was originally developed to reduce dependency on calibration by incorporating plant optimality principles. If the central conclusion of this study is that machine learning models (especially LSTMs) consistently outperform mechanistic models, then the manuscript should provide a deeper reflection on how data-driven insights might inform or improve process-based models. Could the identified memory effects or variable sensitivities be translated into new empirical or semi-mechanistic formulations? What implications do the observed model deficiencies under water stress have for future land surface model development?
In its current form, the manuscript presents an interesting comparison of modeling strategies but falls short of offering novel insights beyond existing literature. To warrant publication, the authors should:
- More rigorously establish the connection between memory mechanisms and improved performance under specific environmental stressors.
- Employ interpretable ML tools or event-based analyses to strengthen claims regarding temporal information use.
- Clarify the ecological or data-driven reasons behind site-global model performance differences.
- Reflect more deeply on how ML-based findings can inform mechanistic modeling efforts.
Addressing these points will substantially enhance the originality and relevance of the manuscript.
References:
Cattry, M., Zhao, W., Nathaniel, J., Qiu, J., Zhang, Y., & Gentine, P. (2025). EcoPro-LSTM 𝑣0: A Memory-based Machine Learning Approach to Predicting Ecosystem Dynamics across Time Scales in Mediterranean Environments. EGUsphere, 2025, 1-37.
Kraft, B., Jung, M., Körner, M., Requena Mesa, C., Cortés, J., & Reichstein, M. (2019). Identifying dynamic memory effects on vegetation state using recurrent neural networks. Frontiers in big Data, 2, 31.
Kraft, B., Besnard, S., & Koirala, S. (2021). Emulating ecological memory with recurrent neural networks. Deep Learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences, 269-281.
Wesselkamp, M., Chantry, M., Pinnington, E., Choulga, M., Boussetta, S., Kalweit, M., ... & Balsamo, G. (2025). Advances in land surface forecasting: a comparison of LSTM, gradient boosting, and feed-forward neural networks as prognostic state emulators in a case study with ecLand. Geoscientific Model Development, 18(4), 921-937.
Zhao, W., Winkler, A., Reichstein, M., Orth, R., & Gentine, P. (2025). Learning evaporative fraction with memory.
Citation: https://doi.org/10.5194/egusphere-2025-1617-RC2 - AC2: 'Reply on RC2', Samantha Biegel, 04 Jul 2025
Model code and software
Experiment code repository Samantha Biegel https://doi.org/10.5281/zenodo.15236497
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
484 | 79 | 18 | 581 | 23 | 43 |
- HTML: 484
- PDF: 79
- XML: 18
- Total: 581
- BibTeX: 23
- EndNote: 43
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
General comments:
To predict ecosystem gross primary productivity (GPP), this manuscript utilizes ecosystem flux data, meteorological measurements from 109 globally distributed sites, and remotely sensed vegetation indices to train three models: a mechanistic, theory-based photosynthesis model, a memoryless multilayer perceptron (MLP) and a recurrent neural network (Long Short-Term Memory, LSTM). The authors found that both deep learning models outperform the P-model, and the LSTM performs best. Particularly, model skill is consistently good across moist sites with strong seasonality. Model error tends to increase with increasing potential cumulative water deficits. The LSTM adapts better to arid environments affected by water stress, yet there is still a large variability in model skill across relatively arid sites.
This is an interesting analysis and the topic is pretty important. Overall, I find the paper compelling and fit for publication after revision. I include my comments below, which I hope help the authors to further strengthen the paper.
Minor Suggestions: