Unrecognised water limitation is a main source of uncertainty for models of terrestrial photosynthesis

Biegel, Samantha; Schindler, Konrad; Stocker, Benjamin D.

doi:10.5194/egusphere-2025-1617

Preprints

https://doi.org/10.5194/egusphere-2025-1617

Preprints

25 Apr 2025

| 25 Apr 2025

Unrecognised water limitation is a main source of uncertainty for models of terrestrial photosynthesis

Samantha Biegel, Konrad Schindler, and Benjamin D. Stocker

Abstract. Quantification of environmental controls on ecosystem photosynthesis is essential to understand the impacts of climate change and extreme events on the carbon cycle and the provisioning of ecosystem services. Machine learning models have become popular for simulating ecosystem terrestrial photosynthesis because of their predictive skill, but often do not consider temporal dependencies in the data, even though process understanding suggests that these should exist. Here, we investigate how models that account for temporal structure impact the prediction of ecosystem photosynthesis. Using time-series measurements of ecosystem fluxes paired with measurements of meteorological variables from a network of globally distributed sites (N = 109) and remotely sensed vegetation indices, we train three different models to predict ecosystem gross primary production (GPP): a mechanistic, theory-based photosynthesis model, a memoryless multilayer perceptron (MLP) and a recurrent neural network (Long Short-Term Memory, LSTM). Through comparisons of patterns in model error, we assess the ability of these models to predict GPP across a wide diversity of ecosystems and climates, and to account for temporal dependencies, with a focus on effects by low rooting zone moisture and freezing air temperatures. We find that both deep learning models outperform the mechanistic model, and that the LSTM performs best with an R² of 0.74 for spatial out-of-sample predictions. In particular, model skill is consistently good across moist sites with strong seasonality. Model error tends to increase with increasing potential cumulative water deficits, in particular in ecosystems with evergreen vegetation. Generalisation patterns reveal that the LSTM tends to be more successful than the MLP in simulating GPP in dry environments, suggesting an advantage of recurrent models in those conditions. However, a large variability in model skill across relatively dry sites remained. Insufficient information on the exposure and response to water stress and related effects on GPP appear to be dominant sources of error for modelling ecosystem fluxes across the globe. With the increasing frequency of hydroclimatic extreme events, effects of water limitation are expected to become more prevalent, which calls for models that better represent its impact on ecosystem function.

Received: 05 Apr 2025 – Discussion started: 25 Apr 2025

Competing interests: At least one of the (co-)authors is a member of the editorial board of Biogeosciences. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5518 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (5518 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

01 Dec 2025

Unrecognised water limitation is a main source of uncertainty for models of terrestrial photosynthesis

Samantha Biegel, Konrad Schindler, and Benjamin D. Stocker

Biogeosciences, 22, 7455–7481, https://doi.org/10.5194/bg-22-7455-2025,https://doi.org/10.5194/bg-22-7455-2025, 2025

Short summary

Samantha Biegel, Konrad Schindler, and Benjamin D. Stocker

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1617', Anonymous Referee #1, 02 Jun 2025
General comments:
To predict ecosystem gross primary productivity (GPP), this manuscript utilizes ecosystem flux data, meteorological measurements from 109 globally distributed sites, and remotely sensed vegetation indices to train three models: a mechanistic, theory-based photosynthesis model, a memoryless multilayer perceptron (MLP) and a recurrent neural network (Long Short-Term Memory, LSTM). The authors found that both deep learning models outperform the P-model, and the LSTM performs best. Particularly, model skill is consistently good across moist sites with strong seasonality. Model error tends to increase with increasing potential cumulative water deficits. The LSTM adapts better to arid environments affected by water stress, yet there is still a large variability in model skill across relatively arid sites.
This is an interesting analysis and the topic is pretty important. Overall, I find the paper compelling and fit for publication after revision. I include my comments below, which I hope help the authors to further strengthen the paper.
Some important details should be provided in this paper. Firstly, the methods and dataset for determining the optimal hyper-parameter of the proposed model are not provided. Additionally, the rationale behind selecting these three specific models, especially the combination of machine learning models (LSTM and MLP) with the process-based P-model, should be better justified. It is also recommended to explore and compare other state-of-the-art models, such as gated recurrent units (GRU), convolutional neural networks (CNN), and other sequence modeling approaches, to provide a more comprehensive evaluation.

In order to prove the superiority of the proposed models, this paper has carried out several forecasting experiments. Three models simulates GPP dynamics across a range of environmental conditions and vegetation types, please gives numbers of samples for each classification. Furthermore, figure descriptions must be more precise. For example, in Section 3.3 (Lines 240-255), the discussion of Figure 6 should clearly indicate that it comprises two sub-figures and describe each accordingly.

Please make sure all figures are clear. The parameters and other details of the proposed model and methods should be organized in some tables.

Minor Suggestions:
Please provide a description of the machine learning model construction, including the procedures for training and test dataset selection, normalization or preprocessing methods, and any other relevant implementation details necessary for reproducibility.

In Figure 1, it is recommended to include the number of observation sites corresponding to each aridity type.

Methods requires citing references, please check.

The explanation of part “3.3 Spatial patterns in model performance” (corresponding to Figure 6) is somewhat unclear. “Across sites with moisture index P/PET ≥ 0.75 the R² is 0.76, whereas it was only 0.57 for more arid sites (MI <0.75). The (normalised) RMSE follows a similar pattern, with a value of 0.88 for sites with MI <0.75, compared to 0.57 for moist sites.” However, these values are not directly visible in the two subplots on the left panel of Figure 6.
Citation: https://doi.org/10.5194/egusphere-2025-1617-RC1
- AC1: 'Reply on RC1', Samantha Biegel, 04 Jul 2025
  
  Thank you for your review. Please find our response in the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1617-AC1
RC2:
'Comment on egusphere-2025-1617', Anonymous Referee #2, 09 Jun 2025
Biegel et al. (2025) compare the performance of three modeling approaches for predicting gross primary productivity (GPP): an optimality-based photosynthesis model (the P-model), a memoryless multilayer perceptron (MLP), and a recurrent neural network (RNN) with long short-term memory (LSTM) that incorporates memory of past environmental conditions. The models are evaluated in two settings: (1) site-level training and evaluation to assess temporal dynamics, and (2) cross-site evaluation to assess spatial generalization. The authors report that both MLP and LSTM outperform the P-model, with LSTM showing the greatest advantage under dry conditions due to its ability to account for temporal memory. However, they also note that all models struggle to accurately simulate GPP at certain dry sites, likely due to limited representation of water stress and its impact on carbon fluxes.
While the modeling framework and experimental setup are generally sound and the results compelling, the manuscript revisits a well-studied topic in the eco-hydrological machine learning literature. Numerous recent studies have explored the use of RNNs and LSTMs to simulate ecological processes and quantify memory effects (e.g., Montero et al., 2024; Agarwal et al., 2023; Cattry et al., 2025; Kraft et al., 2019, 2021; Wesselkamp et al., 2025; Zhao et al., 2025). Many of these works directly compare memory-based RNNs to memoryless architectures in the context of simulating vegetation states or fluxes, often using methodologies closely aligned with the current study. This overlap does not diminish the potential contribution of Biegel et al., but it raises the bar for novelty and interpretability. Unfortunately, that potential remains underdeveloped in the current version of the manuscript.
For instance, in Figure 4, the authors compare absolute percentage errors of GPP predictions by aggregating results across PCWD bins. While this illustrates that LSTM and MLP models outperform the P-model under drier conditions, it does not clearly demonstrate that the LSTM's advantage stems specifically from its capacity to leverage temporal dependencies—such as drought memory or cold acclimation—rather than simply from its increased architectural complexity. To strengthen this argument, the authors could analyze model performance during known extreme events (e.g., multi-week droughts or cold spells) and assess whether LSTM models exhibit better generalization or resilience. Additionally, applying permutation-based methods (as in Kraft et al., 2019) or interpretable machine learning tools (e.g., Integrated Gradients, as in Zhao et al., 2025) could help identify which temporal features or variables most strongly drive model predictions under different environmental conditions.
The manuscript also includes a site-level versus global model comparison, which is a valuable angle. However, the practical implications of the observed performance gaps remain unclear. Such gaps could arise from various sources, including uneven spatial representation of ecosystems, differences in training data length, observational uncertainties, or intrinsic ecological variability. While Figure 8 suggests moisture index differences might explain some of the discrepancies, it is not clear whether these reflect genuine ecological signals or merely data sampling artifacts. These challenges—especially related to data imbalance and representativeness—are long-standing limitations in upscaling efforts like FLUXCOM and FLUXCOM-X. The authors could add considerable value by disentangling these effects and explicitly attributing performance gaps to either ecological complexity or sampling limitations.
Moreover, the discussion around the mechanistic model (P-model) and its comparison to data-driven approaches could be expanded. For instance, while the manuscript notes that the P-model encodes “rigid functional dependencies,” it was originally developed to reduce dependency on calibration by incorporating plant optimality principles. If the central conclusion of this study is that machine learning models (especially LSTMs) consistently outperform mechanistic models, then the manuscript should provide a deeper reflection on how data-driven insights might inform or improve process-based models. Could the identified memory effects or variable sensitivities be translated into new empirical or semi-mechanistic formulations? What implications do the observed model deficiencies under water stress have for future land surface model development?
In its current form, the manuscript presents an interesting comparison of modeling strategies but falls short of offering novel insights beyond existing literature. To warrant publication, the authors should:
More rigorously establish the connection between memory mechanisms and improved performance under specific environmental stressors.

Employ interpretable ML tools or event-based analyses to strengthen claims regarding temporal information use.

Clarify the ecological or data-driven reasons behind site-global model performance differences.

Reflect more deeply on how ML-based findings can inform mechanistic modeling efforts.

Addressing these points will substantially enhance the originality and relevance of the manuscript.

References:
Cattry, M., Zhao, W., Nathaniel, J., Qiu, J., Zhang, Y., & Gentine, P. (2025). EcoPro-LSTM 𝑣0: A Memory-based Machine Learning Approach to Predicting Ecosystem Dynamics across Time Scales in Mediterranean Environments. EGUsphere, 2025, 1-37.
Kraft, B., Jung, M., Körner, M., Requena Mesa, C., Cortés, J., & Reichstein, M. (2019). Identifying dynamic memory effects on vegetation state using recurrent neural networks. Frontiers in big Data, 2, 31.
Kraft, B., Besnard, S., & Koirala, S. (2021). Emulating ecological memory with recurrent neural networks. Deep Learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences, 269-281.
Wesselkamp, M., Chantry, M., Pinnington, E., Choulga, M., Boussetta, S., Kalweit, M., ... & Balsamo, G. (2025). Advances in land surface forecasting: a comparison of LSTM, gradient boosting, and feed-forward neural networks as prognostic state emulators in a case study with ecLand. Geoscientific Model Development, 18(4), 921-937.
Zhao, W., Winkler, A., Reichstein, M., Orth, R., & Gentine, P. (2025). Learning evaporative fraction with memory.
Citation: https://doi.org/10.5194/egusphere-2025-1617-RC2
- AC2: 'Reply on RC2', Samantha Biegel, 04 Jul 2025
  
  Thank you for your review. Please find our response in the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1617-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1617', Anonymous Referee #1, 02 Jun 2025
General comments:
To predict ecosystem gross primary productivity (GPP), this manuscript utilizes ecosystem flux data, meteorological measurements from 109 globally distributed sites, and remotely sensed vegetation indices to train three models: a mechanistic, theory-based photosynthesis model, a memoryless multilayer perceptron (MLP) and a recurrent neural network (Long Short-Term Memory, LSTM). The authors found that both deep learning models outperform the P-model, and the LSTM performs best. Particularly, model skill is consistently good across moist sites with strong seasonality. Model error tends to increase with increasing potential cumulative water deficits. The LSTM adapts better to arid environments affected by water stress, yet there is still a large variability in model skill across relatively arid sites.
This is an interesting analysis and the topic is pretty important. Overall, I find the paper compelling and fit for publication after revision. I include my comments below, which I hope help the authors to further strengthen the paper.
Some important details should be provided in this paper. Firstly, the methods and dataset for determining the optimal hyper-parameter of the proposed model are not provided. Additionally, the rationale behind selecting these three specific models, especially the combination of machine learning models (LSTM and MLP) with the process-based P-model, should be better justified. It is also recommended to explore and compare other state-of-the-art models, such as gated recurrent units (GRU), convolutional neural networks (CNN), and other sequence modeling approaches, to provide a more comprehensive evaluation.

In order to prove the superiority of the proposed models, this paper has carried out several forecasting experiments. Three models simulates GPP dynamics across a range of environmental conditions and vegetation types, please gives numbers of samples for each classification. Furthermore, figure descriptions must be more precise. For example, in Section 3.3 (Lines 240-255), the discussion of Figure 6 should clearly indicate that it comprises two sub-figures and describe each accordingly.

Please make sure all figures are clear. The parameters and other details of the proposed model and methods should be organized in some tables.

Minor Suggestions:
Please provide a description of the machine learning model construction, including the procedures for training and test dataset selection, normalization or preprocessing methods, and any other relevant implementation details necessary for reproducibility.

In Figure 1, it is recommended to include the number of observation sites corresponding to each aridity type.

Methods requires citing references, please check.

The explanation of part “3.3 Spatial patterns in model performance” (corresponding to Figure 6) is somewhat unclear. “Across sites with moisture index P/PET ≥ 0.75 the R² is 0.76, whereas it was only 0.57 for more arid sites (MI <0.75). The (normalised) RMSE follows a similar pattern, with a value of 0.88 for sites with MI <0.75, compared to 0.57 for moist sites.” However, these values are not directly visible in the two subplots on the left panel of Figure 6.
Citation: https://doi.org/10.5194/egusphere-2025-1617-RC1
- AC1: 'Reply on RC1', Samantha Biegel, 04 Jul 2025
  
  Thank you for your review. Please find our response in the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1617-AC1
RC2:
'Comment on egusphere-2025-1617', Anonymous Referee #2, 09 Jun 2025
Biegel et al. (2025) compare the performance of three modeling approaches for predicting gross primary productivity (GPP): an optimality-based photosynthesis model (the P-model), a memoryless multilayer perceptron (MLP), and a recurrent neural network (RNN) with long short-term memory (LSTM) that incorporates memory of past environmental conditions. The models are evaluated in two settings: (1) site-level training and evaluation to assess temporal dynamics, and (2) cross-site evaluation to assess spatial generalization. The authors report that both MLP and LSTM outperform the P-model, with LSTM showing the greatest advantage under dry conditions due to its ability to account for temporal memory. However, they also note that all models struggle to accurately simulate GPP at certain dry sites, likely due to limited representation of water stress and its impact on carbon fluxes.
While the modeling framework and experimental setup are generally sound and the results compelling, the manuscript revisits a well-studied topic in the eco-hydrological machine learning literature. Numerous recent studies have explored the use of RNNs and LSTMs to simulate ecological processes and quantify memory effects (e.g., Montero et al., 2024; Agarwal et al., 2023; Cattry et al., 2025; Kraft et al., 2019, 2021; Wesselkamp et al., 2025; Zhao et al., 2025). Many of these works directly compare memory-based RNNs to memoryless architectures in the context of simulating vegetation states or fluxes, often using methodologies closely aligned with the current study. This overlap does not diminish the potential contribution of Biegel et al., but it raises the bar for novelty and interpretability. Unfortunately, that potential remains underdeveloped in the current version of the manuscript.
For instance, in Figure 4, the authors compare absolute percentage errors of GPP predictions by aggregating results across PCWD bins. While this illustrates that LSTM and MLP models outperform the P-model under drier conditions, it does not clearly demonstrate that the LSTM's advantage stems specifically from its capacity to leverage temporal dependencies—such as drought memory or cold acclimation—rather than simply from its increased architectural complexity. To strengthen this argument, the authors could analyze model performance during known extreme events (e.g., multi-week droughts or cold spells) and assess whether LSTM models exhibit better generalization or resilience. Additionally, applying permutation-based methods (as in Kraft et al., 2019) or interpretable machine learning tools (e.g., Integrated Gradients, as in Zhao et al., 2025) could help identify which temporal features or variables most strongly drive model predictions under different environmental conditions.
The manuscript also includes a site-level versus global model comparison, which is a valuable angle. However, the practical implications of the observed performance gaps remain unclear. Such gaps could arise from various sources, including uneven spatial representation of ecosystems, differences in training data length, observational uncertainties, or intrinsic ecological variability. While Figure 8 suggests moisture index differences might explain some of the discrepancies, it is not clear whether these reflect genuine ecological signals or merely data sampling artifacts. These challenges—especially related to data imbalance and representativeness—are long-standing limitations in upscaling efforts like FLUXCOM and FLUXCOM-X. The authors could add considerable value by disentangling these effects and explicitly attributing performance gaps to either ecological complexity or sampling limitations.
Moreover, the discussion around the mechanistic model (P-model) and its comparison to data-driven approaches could be expanded. For instance, while the manuscript notes that the P-model encodes “rigid functional dependencies,” it was originally developed to reduce dependency on calibration by incorporating plant optimality principles. If the central conclusion of this study is that machine learning models (especially LSTMs) consistently outperform mechanistic models, then the manuscript should provide a deeper reflection on how data-driven insights might inform or improve process-based models. Could the identified memory effects or variable sensitivities be translated into new empirical or semi-mechanistic formulations? What implications do the observed model deficiencies under water stress have for future land surface model development?
In its current form, the manuscript presents an interesting comparison of modeling strategies but falls short of offering novel insights beyond existing literature. To warrant publication, the authors should:
More rigorously establish the connection between memory mechanisms and improved performance under specific environmental stressors.

Employ interpretable ML tools or event-based analyses to strengthen claims regarding temporal information use.

Clarify the ecological or data-driven reasons behind site-global model performance differences.

Reflect more deeply on how ML-based findings can inform mechanistic modeling efforts.

Addressing these points will substantially enhance the originality and relevance of the manuscript.

References:
Cattry, M., Zhao, W., Nathaniel, J., Qiu, J., Zhang, Y., & Gentine, P. (2025). EcoPro-LSTM 𝑣0: A Memory-based Machine Learning Approach to Predicting Ecosystem Dynamics across Time Scales in Mediterranean Environments. EGUsphere, 2025, 1-37.
Kraft, B., Jung, M., Körner, M., Requena Mesa, C., Cortés, J., & Reichstein, M. (2019). Identifying dynamic memory effects on vegetation state using recurrent neural networks. Frontiers in big Data, 2, 31.
Kraft, B., Besnard, S., & Koirala, S. (2021). Emulating ecological memory with recurrent neural networks. Deep Learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences, 269-281.
Wesselkamp, M., Chantry, M., Pinnington, E., Choulga, M., Boussetta, S., Kalweit, M., ... & Balsamo, G. (2025). Advances in land surface forecasting: a comparison of LSTM, gradient boosting, and feed-forward neural networks as prognostic state emulators in a case study with ecLand. Geoscientific Model Development, 18(4), 921-937.
Zhao, W., Winkler, A., Reichstein, M., Orth, R., & Gentine, P. (2025). Learning evaporative fraction with memory.
Citation: https://doi.org/10.5194/egusphere-2025-1617-RC2
- AC2: 'Reply on RC2', Samantha Biegel, 04 Jul 2025
  
  Thank you for your review. Please find our response in the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1617-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (07 Jul 2025) by Daniel S. Goll

AR by Samantha Biegel on behalf of the Authors (02 Oct 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (09 Oct 2025) by Daniel S. Goll

RR by Anonymous Referee #1 (18 Oct 2025)

RR by Anonymous Referee #2 (27 Oct 2025)

ED: Publish as is (05 Nov 2025) by Daniel S. Goll

AR by Samantha Biegel on behalf of the Authors (13 Nov 2025)

Journal article(s) based on this preprint

01 Dec 2025

Unrecognised water limitation is a main source of uncertainty for models of terrestrial photosynthesis

Samantha Biegel, Konrad Schindler, and Benjamin D. Stocker

Biogeosciences, 22, 7455–7481, https://doi.org/10.5194/bg-22-7455-2025,https://doi.org/10.5194/bg-22-7455-2025, 2025

Short summary

Samantha Biegel, Konrad Schindler, and Benjamin D. Stocker

Model code and software

Experiment code repository Samantha Biegel https://doi.org/10.5281/zenodo.15236497

Samantha Biegel, Konrad Schindler, and Benjamin D. Stocker

Viewed

Total article views: 878 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
744	108	26	878	32	53

HTML: 744
PDF: 108
XML: 26
Total: 878
BibTeX: 32
EndNote: 53

Views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	87	12	3	102
May 2025	76	21	3	100
Jun 2025	69	22	6	97
Jul 2025	53	12	5	70
Aug 2025	90	6	1	97
Sep 2025	330	7	1	338
Oct 2025	19	11	3	33
Nov 2025	20	17	4	41

Cumulative views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	87	12	3	102
May 2025	76	21	3	100
Jun 2025	69	22	6	97
Jul 2025	53	12	5	70
Aug 2025	90	6	1	97
Sep 2025	330	7	1	338
Oct 2025	19	11	3	33
Nov 2025	20	17	4	41

Viewed (geographical distribution)

Total article views: 868 (including HTML, PDF, and XML) Thereof 868 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 01 Dec 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (5518 KB)
Metadata XML

Short summary

Our work addresses the predictability of carbon absorption by ecosystems across the globe, particularly in dry regions. We compare 3 different models, including a deep learning model that can learn from past environmental conditions, and show that this helps improve predictions. Still, challenges remain in dry areas due to varying vulnerabilities to drought. As drought conditions intensify globally, it's crucial to understand the varying impacts on ecosystem function.

Unrecognised water limitation is a main source of uncertainty for models of terrestrial photosynthesis

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Model code and software

Viewed

Viewed (geographical distribution)


Total:	0
HTML:	0
PDF:	0
XML:	0