the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
EcoPro-LSTM𝑣0: A Memory-based Machine Learning Approach to Predicting Ecosystem Dynamics across Time Scales in Mediterranean Environments
Abstract. Climate change is anticipated to alter the global water and carbon cycles, but the spatiotemporal effects of these climate-induced shifts remain poorly understood. Of particular relevance are the variations in rainfall intensity and frequency affecting the carbon and water cycles from daily to interannual time scales. Yet, the current models fail to reproduce these processes as capturing the complex interactions and interrelated dependencies at different timescales (daily to seasonal) requires the simultaneous estimation of multiple interconnected ecological processes. To address this challenge, here, we introduce initial version of our ecosystem process modelling using Long Short-Term Memory approach (EcoPro-LSTM𝑣0) which uses a temporal multitask deep learning model designed to predict ecosystem responses, focusing on critical terrestrial variables, including ecosystem respiration (RECO), gross primary productivity (GPP), evapotranspiration (ET), and surface soil water content (SWC). Our approach leverages the capabilities of LSTM networks to capture the interdependencies of those processes across time scales. LSTMs excel at time-series prediction because they can learn long-term relationships and patterns in data. We trained and tested our model using long-term data from FLUXNET2015 Mediterranean sites (at hourly and daily time-steps), mainly in the USA and Europe, known for their ecological diversity and significance. We demonstrate our model's outperforming against state-of-the-art data products and test the robustness of our model and findings through k-fold cross-validation. We also showcase the model's interpretability in revealing how short- and long-term atmospheric drivers, like precipitation, influence GPP in Mediterranean climates. This model and accompanying insights can help better understand and manage ecosystems under climate change, especially in response to changing extreme events.
- Preprint
(14739 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2024-3726', Anonymous Referee #1, 09 Apr 2025
-
CC1: 'Data policy violation', Dario Papale, 11 Apr 2025
Dear authors
we do an incredible effort to collect, standardize, process, harmonize and share openly these data and we only ask the proper attribution (the "BY" in the "CC-BY " license). I don't think it is a lot of work and for us it is crucial to maintain the network and the data availability.
This is completely ignored in this paper and I'm sorry to say that this is really annoying and a bad practice. It is definitely not sufficient to write "We thank FLUXNET2015 and Copernicus for their open-source datasets." and I think it is not so difficult to read what is accepted when you access the data (the license) and follow what is requested... It is clearly reported here when you use FLUXNET2015: https://fluxnet.org/data/data-policy/
Dario Papale in name of all the PIs and Regional Networks coordinators
NB: nothing personal against you, from now on we will send these notes and requests of amendment and paper corrections in all the cases that we will discover. It is something we need to improve as community...
Citation: https://doi.org/10.5194/egusphere-2024-3726-CC1 -
AC1: 'Reply on CC1', Mitra Cattry, 11 Apr 2025
Dear Dr. Papale,
Thank you for your message and for your tremendous efforts in building and maintaining the FLUXNET2015 network, as well as for your continued advocacy for responsible and transparent data use within our community.
I would like to sincerely apologise for the omission of the proper attribution in our manuscript. We fully acknowledge the importance of crediting the data providers as outlined by the CC-BY license. It was certainly our intention to include the appropriate citations and acknowledgements during the revision stage. Unfortunately, this was unintentionally missed in the initial submission, despite several rounds of internal checks.
We understand how crucial these attributions are to sustaining the network and the open sharing of high-quality data. While such oversights can occasionally occur in the early stages of submission, especially in multi-author manuscripts, we are grateful for your reminder and are committed to addressing this in our revised version.
If there is specific wording or a preferred citation format that you and the network recommend, we would be happy to follow it precisely.
Lastly, as we had proposed reviewers from the FLUXNET community in part due to their scientific expertise with the dataset, we would have also appreciated feedback on the scientific aspects of the work. Nonetheless, we value your note and your broader efforts to strengthen standards across our field.
Thank you again, and please accept our apologies for the oversight.
Warm regards,
Mitra Cattry
on behalf of all co-authorsCitation: https://doi.org/10.5194/egusphere-2024-3726-AC1 -
CC2: 'Reply on AC1', Dario Papale, 12 Apr 2025
Dear Mitra
errors can happen, but even in the early submissions, in particular for journal like the Copernicus where the pre-print is online, the policies must be respected carefully, even with many co-authors. I think that the data policy and the request for attribution is clear in the data policy at the link I added in the first comment (https://fluxnet.org/data/data-policy/) so I invite you to read it and apply, because should something be not clear we will need to better clarify there (although it has been correctly applied in different cases).
Best regards and good luck with the paper
Dario Papale
Citation: https://doi.org/10.5194/egusphere-2024-3726-CC2
-
CC2: 'Reply on AC1', Dario Papale, 12 Apr 2025
-
AC1: 'Reply on CC1', Mitra Cattry, 11 Apr 2025
-
RC2: 'Comment on egusphere-2024-3726', Anonymous Referee #2, 28 Jun 2025
Review EcoProLSTM
Dear Authors, Dear Editor,
find below my review. Please excuse the delay.
Summary:
This work introduces an LSTM network for simultaneously modeling four ecosystem variables measured at FluxNet sites: RECO, GPP, ET & SWC. A dataset of 17 FLUXNET sites in mediterranean climate is assembled from the FLUXNET2015 dataset. The LSTM model is then trained in 5-fold cross validation using temporal splitting (i.e. ensuring that the test data comes from a different time period than the train data). The resulting model performs well at capturing seasonal dynamics. Finally, the models learned sensitivities to input variables are investigated by computing Shapley values and integrated gradients.
Strengths:
I appreciate the effort to improve inter-annual variability of data-driven carbon flux estimates with deep neural networks. In addition, I enjoyed reading about the limitations of the two different interpretability methods, and in particular the analysis of correlated drivers and how due to them the applicability of SHAP may be hampered. A further strength of this paper is the inclusion of estimates of epistemic uncertainty through MC-Dropout, which highlights a large variability beyond the mean prediction.
Major:
1. I am unsure what benefit the presented EcoPro-LSTM brings. Since its training is on temporal splits, it appears to me to only be suitable for the 17 sites that it has been trained on in this study. The model would not be suitable for generalizing to other locations (within mediterranean ecosystems), and thus in particular not be suitable for generating a global map. See also Meyer & Pebesma for a related discussion https://www.nature.com/articles/s41467-022-29838-9
2. It remains unclear how this model compares against the state of the art. The comparisons against FluxCom (v1 and X-Base) are unfair, as cross validation sets are not chosen identical. E.g. While in Figure 9, the EcoPro-LSTM has been trained on all sites (just during different years), the FluxCom and X-Base models have not seen any data from a particular site during training, but rather needed to extrapolate from other sites.
3. Hence the main claim "improved interannual variability" presented in Section 5.1 is in vain, as this improvement is not derived from generic understanding, but rather from learning site-specific patterns.
4. No remote sensing predictors have been used. However, many works have identified that for instance adding remotely sensed vegetation indices and LST can greatly enhance predictive performance. In fact, Kraft et al. https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2896/ argue that sequential deep learning models barely add any value beyond using remote sensing predictors.
5. Many modeling choices appear arbitrary, and thus would have to be ablated (show that they reduce the validation loss). For instance these include weighting the RMSE loss by target flux magnitude, modeling both temporal resolutions in a two-stage approach and the particular length of the temporal context windows (why not a full year e.g.?).
6. Similarly, it would be necessary to properly ablate that multi-task-learning is in fact better than training 4 separate LSTM models, one per task.
7. Since you train on only 17 sites, and include the site-class as a predictor, I would expect the model to perfectly be able to differentiate the different sites. Thus a meaningful baseline becomes a per-site-model, i.e. one that is trained only for one site. I'd be curious to see if your EcoPro-LSTM is able to outperform such a per-site-model.
8. How did you assess what is "the closest match to SHAP" in Fig. 5? It would be good to have a quantitative means to base this decision on.
9. L. 305f - To me it seems problematic to study integrated gradients for training set periods, and also to average over different models (especially if they use correlated drivers). This could result in multiple artifacts, such as spurious correlations solely due to over-fitting or canceling of contradicting explanations and thereby a misattribution of a given variable's contribution. In other words: please only plot explanations for individual models, and for test set data only.
10. Please cite FluxNet data appropriately
11. I could not run the code, because the data was not shared along with the code (maybe for licensing reasons?) - and there was no script or clear description provided on how to download the data. Please add this information, such that the work becomes reproducible.
Minor:
12. L. 64 "Snow depth data was retrieved hourly from the publicly available Copernicus platform" → please accurately say which product from which platform you are using.
13. Please cite Gal et al. when you mention the use of MC-Dropout for uncertainty quantification L.108-110 https://arxiv.org/abs/1506.02142
14. Your code is very hard to read, partly because variable names like "combined_data_x_dict_DD" are not descriptive of their content or purpose. I recommend giving the code a refactor to improve its legibility.
15. Figure 1 does not appear to be very useful in the current state. While I can see that the sites used in this study are all based in very few locations on Earth (California, Mid-West, Italy, South-Australia), it is hard to make out how many there are, exactly where they are located, which PFT they belong to and how much productivity they have on average. Please revise.
16. Figure 2 need a major rework, the fonts are not matching, alignment of boxes is not given, no legend provided and overall it remains fuzzy how data flows.
17. L. 104ff. the description of the K-Fold cross-validation scheme used is incomplete. Please describe more precisely how the split is performed, and then ideally add a figure showing the Folds for all sites and visualizing which time periods belong to which fold.
18. L. 152 "we further use interpretability" makes no sense, perhaps you mean "we further use methods to gain interpretability of the modeled functional relationship", or something alike?
19. Similar, L. 154 "can be combined", rather I would say it is more accurate to state, "can be applied to"
20. Can you elaborate what you mean by "suitable baseline" in L. 157
21. L. 159, "the GradientExplainer" is mentioned, without being properly introduced before.
22. Section 3.2 you mention how important the choice of baseline is, but not, how to actually pick a baseline. Please add this information, to make the method section complete.
23. L. 189f. - do you actually show somewhere how the inclusion of hourly data improves performance at daily resolution?
24. L. 196f. - your evaluation methodology should be part of the method section
25. Fig. 3 (but also all other figures) please fix font sizes for better readability and consistency across figures. Probably Copernicus provides guidelines.
26. Section 4.2 is extremely short and reads more like methodology. I recommend moving to the methods part, but then adding a longer discussion of Fig. 5 in the results part.
27. The flow would be improved, if your evaluation in 4.1 and benchmarking in 5.1 would in fact be unified in a single section.
28. How did you tune the hyperparameters of your model?
Following these remarks, I suggest a major revision of this work, to alleviate the major flaws related to the scientific content, but also to improve the overall presentation of the study.
However, I am also happy to be proven wrong should any of the points I raised solely be due to a misunderstanding from my side.
Kindly,
the reviewer
Citation: https://doi.org/10.5194/egusphere-2024-3726-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
941 | 100 | 19 | 1,060 | 18 | 31 |
- HTML: 941
- PDF: 100
- XML: 19
- Total: 1,060
- BibTeX: 18
- EndNote: 31
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
EcoPro-LSTM v0 : A Memory-based Machine Learning Approach to Predicting Ecosystem Dynamics across Time Scales in Mediterranean Environments
Mitra Cattry, Wenli Zhao, Juan Nathaniel, Jinghao Qiu, Yao Zhang, and Pierre Gentine
The authors introduce an initial version of an ecosystem process model using the Long Short-Term Memory approach (EcoPro-LSTM v0). This model employs a temporal multitask deep learning model to predict ecosystem respiration (RECO), gross primary productivity (GPP), evapotranspiration (ET), and surface soil water content (SWC), capturing the interdependencies of these variables across different time scales. They trained and tested the model using data from several Mediterranean sites from the FLUXNET2015 database.
My expertise lies more in general modeling and physical processes rather than deep learning. That said, I find this topic highly relevant and promising, particularly due to the challenges in predicting ecosystem responses—especially respiration and processes in dry ecosystems. My primary concerns about this manuscript are its lack of clarity, excessive text and figures, and insufficient key details. In my opinion, these issues are obscuring the full potential and effort behind this work.
I highly recommend restructuring the manuscript to improve its organization and clarity. This should involve consolidating and significantly shortening the text, ensuring all information appears in its proper section, eliminating redundant content, and reducing the number of figures—currently 12, which is excessive. Additionally, it's crucial to explicitly state all key modeling assumptions. Please see below for more details.
Major comments:
Minor comments:
Abstract
Introduction
Section 2
Section 3
Section 4
Section 5
Section 6
Figures
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Figs. 6, 7, and 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.