the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
EcoPro-LSTM𝑣0: A Memory-based Machine Learning Approach to Predicting Ecosystem Dynamics across Time Scales in Mediterranean Environments
Abstract. Climate change is anticipated to alter the global water and carbon cycles, but the spatiotemporal effects of these climate-induced shifts remain poorly understood. Of particular relevance are the variations in rainfall intensity and frequency affecting the carbon and water cycles from daily to interannual time scales. Yet, the current models fail to reproduce these processes as capturing the complex interactions and interrelated dependencies at different timescales (daily to seasonal) requires the simultaneous estimation of multiple interconnected ecological processes. To address this challenge, here, we introduce initial version of our ecosystem process modelling using Long Short-Term Memory approach (EcoPro-LSTM𝑣0) which uses a temporal multitask deep learning model designed to predict ecosystem responses, focusing on critical terrestrial variables, including ecosystem respiration (RECO), gross primary productivity (GPP), evapotranspiration (ET), and surface soil water content (SWC). Our approach leverages the capabilities of LSTM networks to capture the interdependencies of those processes across time scales. LSTMs excel at time-series prediction because they can learn long-term relationships and patterns in data. We trained and tested our model using long-term data from FLUXNET2015 Mediterranean sites (at hourly and daily time-steps), mainly in the USA and Europe, known for their ecological diversity and significance. We demonstrate our model's outperforming against state-of-the-art data products and test the robustness of our model and findings through k-fold cross-validation. We also showcase the model's interpretability in revealing how short- and long-term atmospheric drivers, like precipitation, influence GPP in Mediterranean climates. This model and accompanying insights can help better understand and manage ecosystems under climate change, especially in response to changing extreme events.
- Preprint
(14739 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3726', Anonymous Referee #1, 09 Apr 2025
EcoPro-LSTM v0 : A Memory-based Machine Learning Approach to Predicting Ecosystem Dynamics across Time Scales in Mediterranean Environments
Mitra Cattry, Wenli Zhao, Juan Nathaniel, Jinghao Qiu, Yao Zhang, and Pierre Gentine
The authors introduce an initial version of an ecosystem process model using the Long Short-Term Memory approach (EcoPro-LSTM v0). This model employs a temporal multitask deep learning model to predict ecosystem respiration (RECO), gross primary productivity (GPP), evapotranspiration (ET), and surface soil water content (SWC), capturing the interdependencies of these variables across different time scales. They trained and tested the model using data from several Mediterranean sites from the FLUXNET2015 database.
My expertise lies more in general modeling and physical processes rather than deep learning. That said, I find this topic highly relevant and promising, particularly due to the challenges in predicting ecosystem responses—especially respiration and processes in dry ecosystems. My primary concerns about this manuscript are its lack of clarity, excessive text and figures, and insufficient key details. In my opinion, these issues are obscuring the full potential and effort behind this work.
I highly recommend restructuring the manuscript to improve its organization and clarity. This should involve consolidating and significantly shortening the text, ensuring all information appears in its proper section, eliminating redundant content, and reducing the number of figures—currently 12, which is excessive. Additionally, it's crucial to explicitly state all key modeling assumptions. Please see below for more details.
Major comments:
- The manuscript currently presents methodological details incrementally and intersperses them with results. For instance, key information about FLUXCOM and X-BASE is omitted from the Methods section and instead introduced in discussion. To improve readability, I recommend consolidating all methodological descriptions in the appropriate sections upfront, rather than scattering them throughout the text. Additionally, the number and nature of proposed models should be explicitly stated early in the paper (e.g., in the Introduction or Methods). After reading the entire manuscript, this remains unclear.
- The manuscript suffers from a significant gap in physical and physiological explanations. While results are presented, they lack meaningful connections to underlying ecological processes or climate drivers. For example, when describing site conditions, the authors fail to contextualize how these environmental factors relate to the observed outcomes. Strengthening these mechanistic links would greatly enhance the scientific rigor and interpretability of the findings.
- The manuscript lacks a comprehensive synthesis of model performance across sites. While site-specific conclusions are presented, they appear overgeneralized without sufficient justification. Most critically, the analysis misses an overarching assessment of when and where the model performs best—a key piece of information readers expect in the conclusions. A clear, data-supported summary of model strengths and limitations across all studied sites would significantly strengthen the paper's impact.
Minor comments:
Abstract
- The abstract should present information more directly and specifically. Rather than using vague comparative statements like "we demonstrate our model’s outperforming against state-of-the-art data," please: name the specific database used, provide quantitative performance metrics and state concrete findings.
- Please indicate how many sites from FLUXNET you used.
Introduction
- Please use “e.g.” and “see” for citations only when they are necessary (throughout the manuscript.).
- L22-23. Please indicate why carbon uptake in semi-arid regions depends on winter and early growing season precipitation.
- L25. Please explain why physic-based models struggled to represent legacy and lag effects. Besides, what about statistical-based models and other models using IA or machine learning?
- L31-32. This sentence is not clear to me. Consider rewriting it more clearly.
- L53. What do the authors mean by “Short-Term Memory approach networks”?
- L55. Consider removing “like” as they are the only variables predicted by this model.
Section 2
- L66. LE is not mentioned in the incorporated variables.
- L66. Please indicate how evapotranspiration is calculated.
- L72. Does the classification of those sites change from past to future?
- L74. Please specify the criteria used to remove/consider sites.
- L81. Which “established relationships”? Please be more specific.
- L82. Why zero and not nan?
- L84. How was defined the threshold of “negligible” rainfall events? Explain how it was based on the histogram.
- L85. From where is taken the value 0.028? What does it mean?
- L86. Why cannot the SWC resulting from precipitation below this value be measured? With specific equipment?
- L93. Please describe how you leveraged both time scales.
- L94-95. This part is not necessary, especially the description of the unit symbols.
- L96. Please be coherent with the name of SWC, you mention both soil water content and soil moisture.
- L96. SWC has units, they are volume/volume, here probably m3 m-3.
- L98. How were defined as 4 months and 3 days?
Section 3
- L113-115. This information is repetitive.
- L116. Consider putting “see Figure 2” in parentheses.
- L118. How were defined 110 hidden units, 0.2-0.5? What is ReLU?
- L125-126. How were define those values?
- Eq. 1. Very long name for a variable. Furthermore, this equation is widely known. Consider removing it.
- L137. Which scaling techniques? Why they did not yield satisfactory results?
- L141. What do the authors mean by fast time scales?
- L148. How are hyperparameters identified?
- L149-150. That time is for simulating what?
- L152-153. Any reference?
- L152-L181. Consider to short this part. Leave only the relevant information to understand what you are doing.
Section 4
- L186-190. This information is repetitive.
- L183. Please, use the abbreviations.
- L197-198. This part is not necessary. The meaning of R2 is widely known.
- L202. Which newer metrics do the authors mean?
- L196-. Please avoid mixing methods and results.
- L216-L221. This should be in the figure caption not in the main text.
- Table 1. It's not clear to me why, if you said you would try NEE instead of RECO, the RECO comparison appears here.
- L222-L224. This information is given without any context.
- L222-. Please consider describing only the key results and include information about the site when it's truly relevant. For example, when it makes the results easier to interpret.
- L268. Please avoid these very general sentences.
- L276-277. Is this capability stated in the result for this specific site or is it general? If it's for the site, please specify; if it's general, please justify it.
- L281. All labels, right? Also, consider always naming this set of variables the same way to avoid confusion.
- L284. Add space between the text and the parenthesis.
- L285-286. How do the authors know that? Which specific factors?
- L288. Please indicate what is considered high data quality.
- L290. Strong correlation among them? How is that related to the challenge to predict GPP and RECO?
- L291. What is stable soil moisture? Perform the best describing which variable/s?
- L300. What do the authors mean by “matter of convenience” in this context?
- L300-301. Please fix the references.
- L301-302. Please explain what it means.
- L304. Here is the first time you use IA. Consider linking this to the introduction and methods.
- L307-314. Here you are mixing methods, legend captions, and results.
- L312. How is long-term defined here? Consider defining it at the beginning, arguing the consideration.
- L318. How was that date chosen?
- L318. Which specific environmental forcing?
- L323. In all sites? How much impact?
- L323. PAR is not always inversely proportional to cloudiness. By the way, what relevance does this have in this sentence? Furthermore, parentheses indicating this relationship appear at least three times in this subsection.
- L325. Please consider explaining their meaning instead of writing down values. Also, which panels are you referring to?
- L328-330. What do the authors mean by “more consequential” in this context? Please explain further the relationship you are making here as it is not clear what happened in those rain events.
- L331. Why is VPD in quotes?
- L332. The term "air aridity" isn't widely used. Consider explaining it or simply describing directly what you mean.
- L331-339. What do the authors mean by words like “favourable levels”, and “beneficial”? Consider replacing them with increases/reduces.
- L331-339. Please consider replacing the writing of these equations with their meaning.
- L337. Please specify what is the meaning of “beneficial influence”.
- L339. Where can this conclusion be drawn?
- L340. How can be VPD and TA confounded?
- L347-350. Please be specific. What does it mean that “PAR usually exhibits reduced sunlight negatively impacting GPP”? Deviations where?
- L359. What is the relation of that with the model?
- L362. 8 and 6 what?
- L362. Which environmental conditions?
- L363. I do not understand how that indicates stable water use and unstressed conditions.
- L364. Please specify which comparative analysis.
Section 5
- L368. FLUXCOM and X-BASE are not mentioned until here in any section. Again, you are mixing methods, results, and now, discussion.
- L370-373. You are comparing different metrics. Maximum R2 and then “often falls below”. Please compare the same metric.
- L375. The maximum is greater than the limit range.
- L376. More balanced than what?
- L380. Performs better than what? Please specify which several sites.
- L381. FLUXCOM-X is the benchmark, you used the data of X-BASE.
- L379-381. Again this comparison does not seem fair. Scores up to for several sites, then ranging, then more negative. Please compare the same metric among the different products.
- L383. Which model? Also, do not FLUXCOM and X-BASE capture them?
- L388. Please indicate better results than what. Furthermore, does that occur always? In all sites?
- L389-390. With which conclusions?
- L391. What is advanced FLUXCOM?
- L393. Why KGE and R2 if Figure 9 shows only NSE.
- L394. What about the R2 and NSE of FLUXCOM and X-BASE?
- L395-397. Only in your model or also in FLUXCOM and X-BASE?
- L402 - . Too much information. Please consider only writing the key points.
- L408. Please explain what is considered “acceptable performance”.
- L408-409. Please explain why you believe that.
- L414. Please directly mention the key environmental variables.
- L423. Why indicate precipitation infiltration? Soil moisture is much more complex than that.
- L427. Please argue that affirmation.
- L441. What do the authors mean by “adverse consequences”?
Section 6
- L466-468. I do not understand this conclusion. Nor do I consider it a conclusion of this work.
Figures
Fig. 1.
- This map should be cropped, focusing only on the regions of interest and allowing for a better view of the sites. The size of the circles makes it impossible to see the other areas. For example, in the US, it's impossible to see how many sites there are.
- What ‘does diversity’ mean here?
- Consider changing the color table. Continuous color tables are for continuous values and those are discrete.
- The last sentence is not relevant in a figure caption.
- With “productivity” do you mean “GPP”? Consider using the abbreviation you defined at the beginning to avoid confusion.
- Units of productivity should be written as gC/(m$^{2} \cdot$ y) or gC(m$^{-2} \cdot$ y$^{-1}). As it is, it seems years is in the numerator.
Fig. 2.
- VPD is vapour pressure deficit, not air aridity.
- SWC is soil water content.
- What do you mean by “site-name inputs”?
- Why “including”?
- Define X and Y.
- The last sentence is not relevant in a figure caption.
Fig. 3.
- Add the units (not in the caption).
- This figure is only briefly mentioned in the main text.
- Consider changing 'panel' to ‘row’. A panel is each subplot. Besides, add the variable abbreviations in the caption.
- What is ‘set’?
Fig. 4.
- Please put titles in y-axes.
- Consider differentiating which parts correspond to training, testing, and validation.
- The last sentence is not relevant in a figure caption.
- What are the colors?
Fig. 5.
- Please put the units in the figure.
- Consider removing the space at the beginning of each time series to enhance visibility.
- Please put the label in the color bar.
Figs. 6, 7, and 8.
- Please put the units in the figure.
- Please use variable abbreviation instead of productivity.
- The penultimate sentence is not relevant in a figure caption.
- Please correct the units of the bottom figures.
Fig. 9.
- Describe the figure in the caption.
- What is the meaning of the colors?
- The last sentence is not relevant in a figure caption.
Fig. 10.
- It is very difficult to see the lines of FLUXCOM. Please consider modifying the colors.
Fig. 11.
- Please put the units in the figure.
- Please put the label in the color bar.
- The last sentence is not relevant in a figure caption.
Fig. 12.
- Please explain in the caption the figure. What are the radial figures? What do the numbers on the right mean? What are the units? The colors?
- The last sentence is not relevant in a figure caption.
Citation: https://doi.org/10.5194/egusphere-2024-3726-RC1 -
CC1: 'Data policy violation', Dario Papale, 11 Apr 2025
Dear authors
we do an incredible effort to collect, standardize, process, harmonize and share openly these data and we only ask the proper attribution (the "BY" in the "CC-BY " license). I don't think it is a lot of work and for us it is crucial to maintain the network and the data availability.
This is completely ignored in this paper and I'm sorry to say that this is really annoying and a bad practice. It is definitely not sufficient to write "We thank FLUXNET2015 and Copernicus for their open-source datasets." and I think it is not so difficult to read what is accepted when you access the data (the license) and follow what is requested... It is clearly reported here when you use FLUXNET2015: https://fluxnet.org/data/data-policy/
Dario Papale in name of all the PIs and Regional Networks coordinators
NB: nothing personal against you, from now on we will send these notes and requests of amendment and paper corrections in all the cases that we will discover. It is something we need to improve as community...
Citation: https://doi.org/10.5194/egusphere-2024-3726-CC1 -
AC1: 'Reply on CC1', Mitra Cattry, 11 Apr 2025
Dear Dr. Papale,
Thank you for your message and for your tremendous efforts in building and maintaining the FLUXNET2015 network, as well as for your continued advocacy for responsible and transparent data use within our community.
I would like to sincerely apologise for the omission of the proper attribution in our manuscript. We fully acknowledge the importance of crediting the data providers as outlined by the CC-BY license. It was certainly our intention to include the appropriate citations and acknowledgements during the revision stage. Unfortunately, this was unintentionally missed in the initial submission, despite several rounds of internal checks.
We understand how crucial these attributions are to sustaining the network and the open sharing of high-quality data. While such oversights can occasionally occur in the early stages of submission, especially in multi-author manuscripts, we are grateful for your reminder and are committed to addressing this in our revised version.
If there is specific wording or a preferred citation format that you and the network recommend, we would be happy to follow it precisely.
Lastly, as we had proposed reviewers from the FLUXNET community in part due to their scientific expertise with the dataset, we would have also appreciated feedback on the scientific aspects of the work. Nonetheless, we value your note and your broader efforts to strengthen standards across our field.
Thank you again, and please accept our apologies for the oversight.
Warm regards,
Mitra Cattry
on behalf of all co-authorsCitation: https://doi.org/10.5194/egusphere-2024-3726-AC1 -
CC2: 'Reply on AC1', Dario Papale, 12 Apr 2025
Dear Mitra
errors can happen, but even in the early submissions, in particular for journal like the Copernicus where the pre-print is online, the policies must be respected carefully, even with many co-authors. I think that the data policy and the request for attribution is clear in the data policy at the link I added in the first comment (https://fluxnet.org/data/data-policy/) so I invite you to read it and apply, because should something be not clear we will need to better clarify there (although it has been correctly applied in different cases).
Best regards and good luck with the paper
Dario Papale
Citation: https://doi.org/10.5194/egusphere-2024-3726-CC2
-
CC2: 'Reply on AC1', Dario Papale, 12 Apr 2025
-
AC1: 'Reply on CC1', Mitra Cattry, 11 Apr 2025
-
RC2: 'Comment on egusphere-2024-3726', Anonymous Referee #2, 28 Jun 2025
Review EcoProLSTM
Dear Authors, Dear Editor,
find below my review. Please excuse the delay.
Summary:
This work introduces an LSTM network for simultaneously modeling four ecosystem variables measured at FluxNet sites: RECO, GPP, ET & SWC. A dataset of 17 FLUXNET sites in mediterranean climate is assembled from the FLUXNET2015 dataset. The LSTM model is then trained in 5-fold cross validation using temporal splitting (i.e. ensuring that the test data comes from a different time period than the train data). The resulting model performs well at capturing seasonal dynamics. Finally, the models learned sensitivities to input variables are investigated by computing Shapley values and integrated gradients.
Strengths:
I appreciate the effort to improve inter-annual variability of data-driven carbon flux estimates with deep neural networks. In addition, I enjoyed reading about the limitations of the two different interpretability methods, and in particular the analysis of correlated drivers and how due to them the applicability of SHAP may be hampered. A further strength of this paper is the inclusion of estimates of epistemic uncertainty through MC-Dropout, which highlights a large variability beyond the mean prediction.
Major:
1. I am unsure what benefit the presented EcoPro-LSTM brings. Since its training is on temporal splits, it appears to me to only be suitable for the 17 sites that it has been trained on in this study. The model would not be suitable for generalizing to other locations (within mediterranean ecosystems), and thus in particular not be suitable for generating a global map. See also Meyer & Pebesma for a related discussion https://www.nature.com/articles/s41467-022-29838-9
2. It remains unclear how this model compares against the state of the art. The comparisons against FluxCom (v1 and X-Base) are unfair, as cross validation sets are not chosen identical. E.g. While in Figure 9, the EcoPro-LSTM has been trained on all sites (just during different years), the FluxCom and X-Base models have not seen any data from a particular site during training, but rather needed to extrapolate from other sites.
3. Hence the main claim "improved interannual variability" presented in Section 5.1 is in vain, as this improvement is not derived from generic understanding, but rather from learning site-specific patterns.
4. No remote sensing predictors have been used. However, many works have identified that for instance adding remotely sensed vegetation indices and LST can greatly enhance predictive performance. In fact, Kraft et al. https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2896/ argue that sequential deep learning models barely add any value beyond using remote sensing predictors.
5. Many modeling choices appear arbitrary, and thus would have to be ablated (show that they reduce the validation loss). For instance these include weighting the RMSE loss by target flux magnitude, modeling both temporal resolutions in a two-stage approach and the particular length of the temporal context windows (why not a full year e.g.?).
6. Similarly, it would be necessary to properly ablate that multi-task-learning is in fact better than training 4 separate LSTM models, one per task.
7. Since you train on only 17 sites, and include the site-class as a predictor, I would expect the model to perfectly be able to differentiate the different sites. Thus a meaningful baseline becomes a per-site-model, i.e. one that is trained only for one site. I'd be curious to see if your EcoPro-LSTM is able to outperform such a per-site-model.
8. How did you assess what is "the closest match to SHAP" in Fig. 5? It would be good to have a quantitative means to base this decision on.
9. L. 305f - To me it seems problematic to study integrated gradients for training set periods, and also to average over different models (especially if they use correlated drivers). This could result in multiple artifacts, such as spurious correlations solely due to over-fitting or canceling of contradicting explanations and thereby a misattribution of a given variable's contribution. In other words: please only plot explanations for individual models, and for test set data only.
10. Please cite FluxNet data appropriately
11. I could not run the code, because the data was not shared along with the code (maybe for licensing reasons?) - and there was no script or clear description provided on how to download the data. Please add this information, such that the work becomes reproducible.
Minor:
12. L. 64 "Snow depth data was retrieved hourly from the publicly available Copernicus platform" → please accurately say which product from which platform you are using.
13. Please cite Gal et al. when you mention the use of MC-Dropout for uncertainty quantification L.108-110 https://arxiv.org/abs/1506.02142
14. Your code is very hard to read, partly because variable names like "combined_data_x_dict_DD" are not descriptive of their content or purpose. I recommend giving the code a refactor to improve its legibility.
15. Figure 1 does not appear to be very useful in the current state. While I can see that the sites used in this study are all based in very few locations on Earth (California, Mid-West, Italy, South-Australia), it is hard to make out how many there are, exactly where they are located, which PFT they belong to and how much productivity they have on average. Please revise.
16. Figure 2 need a major rework, the fonts are not matching, alignment of boxes is not given, no legend provided and overall it remains fuzzy how data flows.
17. L. 104ff. the description of the K-Fold cross-validation scheme used is incomplete. Please describe more precisely how the split is performed, and then ideally add a figure showing the Folds for all sites and visualizing which time periods belong to which fold.
18. L. 152 "we further use interpretability" makes no sense, perhaps you mean "we further use methods to gain interpretability of the modeled functional relationship", or something alike?
19. Similar, L. 154 "can be combined", rather I would say it is more accurate to state, "can be applied to"
20. Can you elaborate what you mean by "suitable baseline" in L. 157
21. L. 159, "the GradientExplainer" is mentioned, without being properly introduced before.
22. Section 3.2 you mention how important the choice of baseline is, but not, how to actually pick a baseline. Please add this information, to make the method section complete.
23. L. 189f. - do you actually show somewhere how the inclusion of hourly data improves performance at daily resolution?
24. L. 196f. - your evaluation methodology should be part of the method section
25. Fig. 3 (but also all other figures) please fix font sizes for better readability and consistency across figures. Probably Copernicus provides guidelines.
26. Section 4.2 is extremely short and reads more like methodology. I recommend moving to the methods part, but then adding a longer discussion of Fig. 5 in the results part.
27. The flow would be improved, if your evaluation in 4.1 and benchmarking in 5.1 would in fact be unified in a single section.
28. How did you tune the hyperparameters of your model?
Following these remarks, I suggest a major revision of this work, to alleviate the major flaws related to the scientific content, but also to improve the overall presentation of the study.
However, I am also happy to be proven wrong should any of the points I raised solely be due to a misunderstanding from my side.
Kindly,
the reviewer
Citation: https://doi.org/10.5194/egusphere-2024-3726-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
591 | 93 | 18 | 702 | 16 | 28 |
- HTML: 591
- PDF: 93
- XML: 18
- Total: 702
- BibTeX: 16
- EndNote: 28
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1