the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
EcoPro-LSTM𝑣0: A Memory-based Machine Learning Approach to Predicting Ecosystem Dynamics across Time Scales in Mediterranean Environments
Abstract. Climate change is anticipated to alter the global water and carbon cycles, but the spatiotemporal effects of these climate-induced shifts remain poorly understood. Of particular relevance are the variations in rainfall intensity and frequency affecting the carbon and water cycles from daily to interannual time scales. Yet, the current models fail to reproduce these processes as capturing the complex interactions and interrelated dependencies at different timescales (daily to seasonal) requires the simultaneous estimation of multiple interconnected ecological processes. To address this challenge, here, we introduce initial version of our ecosystem process modelling using Long Short-Term Memory approach (EcoPro-LSTM𝑣0) which uses a temporal multitask deep learning model designed to predict ecosystem responses, focusing on critical terrestrial variables, including ecosystem respiration (RECO), gross primary productivity (GPP), evapotranspiration (ET), and surface soil water content (SWC). Our approach leverages the capabilities of LSTM networks to capture the interdependencies of those processes across time scales. LSTMs excel at time-series prediction because they can learn long-term relationships and patterns in data. We trained and tested our model using long-term data from FLUXNET2015 Mediterranean sites (at hourly and daily time-steps), mainly in the USA and Europe, known for their ecological diversity and significance. We demonstrate our model's outperforming against state-of-the-art data products and test the robustness of our model and findings through k-fold cross-validation. We also showcase the model's interpretability in revealing how short- and long-term atmospheric drivers, like precipitation, influence GPP in Mediterranean climates. This model and accompanying insights can help better understand and manage ecosystems under climate change, especially in response to changing extreme events.
- Preprint
(14739 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 24 May 2025)
-
RC1: 'Comment on egusphere-2024-3726', Anonymous Referee #1, 09 Apr 2025
reply
EcoPro-LSTM v0 : A Memory-based Machine Learning Approach to Predicting Ecosystem Dynamics across Time Scales in Mediterranean Environments
Mitra Cattry, Wenli Zhao, Juan Nathaniel, Jinghao Qiu, Yao Zhang, and Pierre Gentine
The authors introduce an initial version of an ecosystem process model using the Long Short-Term Memory approach (EcoPro-LSTM v0). This model employs a temporal multitask deep learning model to predict ecosystem respiration (RECO), gross primary productivity (GPP), evapotranspiration (ET), and surface soil water content (SWC), capturing the interdependencies of these variables across different time scales. They trained and tested the model using data from several Mediterranean sites from the FLUXNET2015 database.
My expertise lies more in general modeling and physical processes rather than deep learning. That said, I find this topic highly relevant and promising, particularly due to the challenges in predicting ecosystem responses—especially respiration and processes in dry ecosystems. My primary concerns about this manuscript are its lack of clarity, excessive text and figures, and insufficient key details. In my opinion, these issues are obscuring the full potential and effort behind this work.
I highly recommend restructuring the manuscript to improve its organization and clarity. This should involve consolidating and significantly shortening the text, ensuring all information appears in its proper section, eliminating redundant content, and reducing the number of figures—currently 12, which is excessive. Additionally, it's crucial to explicitly state all key modeling assumptions. Please see below for more details.
Major comments:
- The manuscript currently presents methodological details incrementally and intersperses them with results. For instance, key information about FLUXCOM and X-BASE is omitted from the Methods section and instead introduced in discussion. To improve readability, I recommend consolidating all methodological descriptions in the appropriate sections upfront, rather than scattering them throughout the text. Additionally, the number and nature of proposed models should be explicitly stated early in the paper (e.g., in the Introduction or Methods). After reading the entire manuscript, this remains unclear.
- The manuscript suffers from a significant gap in physical and physiological explanations. While results are presented, they lack meaningful connections to underlying ecological processes or climate drivers. For example, when describing site conditions, the authors fail to contextualize how these environmental factors relate to the observed outcomes. Strengthening these mechanistic links would greatly enhance the scientific rigor and interpretability of the findings.
- The manuscript lacks a comprehensive synthesis of model performance across sites. While site-specific conclusions are presented, they appear overgeneralized without sufficient justification. Most critically, the analysis misses an overarching assessment of when and where the model performs best—a key piece of information readers expect in the conclusions. A clear, data-supported summary of model strengths and limitations across all studied sites would significantly strengthen the paper's impact.
Minor comments:
Abstract
- The abstract should present information more directly and specifically. Rather than using vague comparative statements like "we demonstrate our model’s outperforming against state-of-the-art data," please: name the specific database used, provide quantitative performance metrics and state concrete findings.
- Please indicate how many sites from FLUXNET you used.
Introduction
- Please use “e.g.” and “see” for citations only when they are necessary (throughout the manuscript.).
- L22-23. Please indicate why carbon uptake in semi-arid regions depends on winter and early growing season precipitation.
- L25. Please explain why physic-based models struggled to represent legacy and lag effects. Besides, what about statistical-based models and other models using IA or machine learning?
- L31-32. This sentence is not clear to me. Consider rewriting it more clearly.
- L53. What do the authors mean by “Short-Term Memory approach networks”?
- L55. Consider removing “like” as they are the only variables predicted by this model.
Section 2
- L66. LE is not mentioned in the incorporated variables.
- L66. Please indicate how evapotranspiration is calculated.
- L72. Does the classification of those sites change from past to future?
- L74. Please specify the criteria used to remove/consider sites.
- L81. Which “established relationships”? Please be more specific.
- L82. Why zero and not nan?
- L84. How was defined the threshold of “negligible” rainfall events? Explain how it was based on the histogram.
- L85. From where is taken the value 0.028? What does it mean?
- L86. Why cannot the SWC resulting from precipitation below this value be measured? With specific equipment?
- L93. Please describe how you leveraged both time scales.
- L94-95. This part is not necessary, especially the description of the unit symbols.
- L96. Please be coherent with the name of SWC, you mention both soil water content and soil moisture.
- L96. SWC has units, they are volume/volume, here probably m3 m-3.
- L98. How were defined as 4 months and 3 days?
Section 3
- L113-115. This information is repetitive.
- L116. Consider putting “see Figure 2” in parentheses.
- L118. How were defined 110 hidden units, 0.2-0.5? What is ReLU?
- L125-126. How were define those values?
- Eq. 1. Very long name for a variable. Furthermore, this equation is widely known. Consider removing it.
- L137. Which scaling techniques? Why they did not yield satisfactory results?
- L141. What do the authors mean by fast time scales?
- L148. How are hyperparameters identified?
- L149-150. That time is for simulating what?
- L152-153. Any reference?
- L152-L181. Consider to short this part. Leave only the relevant information to understand what you are doing.
Section 4
- L186-190. This information is repetitive.
- L183. Please, use the abbreviations.
- L197-198. This part is not necessary. The meaning of R2 is widely known.
- L202. Which newer metrics do the authors mean?
- L196-. Please avoid mixing methods and results.
- L216-L221. This should be in the figure caption not in the main text.
- Table 1. It's not clear to me why, if you said you would try NEE instead of RECO, the RECO comparison appears here.
- L222-L224. This information is given without any context.
- L222-. Please consider describing only the key results and include information about the site when it's truly relevant. For example, when it makes the results easier to interpret.
- L268. Please avoid these very general sentences.
- L276-277. Is this capability stated in the result for this specific site or is it general? If it's for the site, please specify; if it's general, please justify it.
- L281. All labels, right? Also, consider always naming this set of variables the same way to avoid confusion.
- L284. Add space between the text and the parenthesis.
- L285-286. How do the authors know that? Which specific factors?
- L288. Please indicate what is considered high data quality.
- L290. Strong correlation among them? How is that related to the challenge to predict GPP and RECO?
- L291. What is stable soil moisture? Perform the best describing which variable/s?
- L300. What do the authors mean by “matter of convenience” in this context?
- L300-301. Please fix the references.
- L301-302. Please explain what it means.
- L304. Here is the first time you use IA. Consider linking this to the introduction and methods.
- L307-314. Here you are mixing methods, legend captions, and results.
- L312. How is long-term defined here? Consider defining it at the beginning, arguing the consideration.
- L318. How was that date chosen?
- L318. Which specific environmental forcing?
- L323. In all sites? How much impact?
- L323. PAR is not always inversely proportional to cloudiness. By the way, what relevance does this have in this sentence? Furthermore, parentheses indicating this relationship appear at least three times in this subsection.
- L325. Please consider explaining their meaning instead of writing down values. Also, which panels are you referring to?
- L328-330. What do the authors mean by “more consequential” in this context? Please explain further the relationship you are making here as it is not clear what happened in those rain events.
- L331. Why is VPD in quotes?
- L332. The term "air aridity" isn't widely used. Consider explaining it or simply describing directly what you mean.
- L331-339. What do the authors mean by words like “favourable levels”, and “beneficial”? Consider replacing them with increases/reduces.
- L331-339. Please consider replacing the writing of these equations with their meaning.
- L337. Please specify what is the meaning of “beneficial influence”.
- L339. Where can this conclusion be drawn?
- L340. How can be VPD and TA confounded?
- L347-350. Please be specific. What does it mean that “PAR usually exhibits reduced sunlight negatively impacting GPP”? Deviations where?
- L359. What is the relation of that with the model?
- L362. 8 and 6 what?
- L362. Which environmental conditions?
- L363. I do not understand how that indicates stable water use and unstressed conditions.
- L364. Please specify which comparative analysis.
Section 5
- L368. FLUXCOM and X-BASE are not mentioned until here in any section. Again, you are mixing methods, results, and now, discussion.
- L370-373. You are comparing different metrics. Maximum R2 and then “often falls below”. Please compare the same metric.
- L375. The maximum is greater than the limit range.
- L376. More balanced than what?
- L380. Performs better than what? Please specify which several sites.
- L381. FLUXCOM-X is the benchmark, you used the data of X-BASE.
- L379-381. Again this comparison does not seem fair. Scores up to for several sites, then ranging, then more negative. Please compare the same metric among the different products.
- L383. Which model? Also, do not FLUXCOM and X-BASE capture them?
- L388. Please indicate better results than what. Furthermore, does that occur always? In all sites?
- L389-390. With which conclusions?
- L391. What is advanced FLUXCOM?
- L393. Why KGE and R2 if Figure 9 shows only NSE.
- L394. What about the R2 and NSE of FLUXCOM and X-BASE?
- L395-397. Only in your model or also in FLUXCOM and X-BASE?
- L402 - . Too much information. Please consider only writing the key points.
- L408. Please explain what is considered “acceptable performance”.
- L408-409. Please explain why you believe that.
- L414. Please directly mention the key environmental variables.
- L423. Why indicate precipitation infiltration? Soil moisture is much more complex than that.
- L427. Please argue that affirmation.
- L441. What do the authors mean by “adverse consequences”?
Section 6
- L466-468. I do not understand this conclusion. Nor do I consider it a conclusion of this work.
Figures
Fig. 1.
- This map should be cropped, focusing only on the regions of interest and allowing for a better view of the sites. The size of the circles makes it impossible to see the other areas. For example, in the US, it's impossible to see how many sites there are.
- What ‘does diversity’ mean here?
- Consider changing the color table. Continuous color tables are for continuous values and those are discrete.
- The last sentence is not relevant in a figure caption.
- With “productivity” do you mean “GPP”? Consider using the abbreviation you defined at the beginning to avoid confusion.
- Units of productivity should be written as gC/(m$^{2} \cdot$ y) or gC(m$^{-2} \cdot$ y$^{-1}). As it is, it seems years is in the numerator.
Fig. 2.
- VPD is vapour pressure deficit, not air aridity.
- SWC is soil water content.
- What do you mean by “site-name inputs”?
- Why “including”?
- Define X and Y.
- The last sentence is not relevant in a figure caption.
Fig. 3.
- Add the units (not in the caption).
- This figure is only briefly mentioned in the main text.
- Consider changing 'panel' to ‘row’. A panel is each subplot. Besides, add the variable abbreviations in the caption.
- What is ‘set’?
Fig. 4.
- Please put titles in y-axes.
- Consider differentiating which parts correspond to training, testing, and validation.
- The last sentence is not relevant in a figure caption.
- What are the colors?
Fig. 5.
- Please put the units in the figure.
- Consider removing the space at the beginning of each time series to enhance visibility.
- Please put the label in the color bar.
Figs. 6, 7, and 8.
- Please put the units in the figure.
- Please use variable abbreviation instead of productivity.
- The penultimate sentence is not relevant in a figure caption.
- Please correct the units of the bottom figures.
Fig. 9.
- Describe the figure in the caption.
- What is the meaning of the colors?
- The last sentence is not relevant in a figure caption.
Fig. 10.
- It is very difficult to see the lines of FLUXCOM. Please consider modifying the colors.
Fig. 11.
- Please put the units in the figure.
- Please put the label in the color bar.
- The last sentence is not relevant in a figure caption.
Fig. 12.
- Please explain in the caption the figure. What are the radial figures? What do the numbers on the right mean? What are the units? The colors?
- The last sentence is not relevant in a figure caption.
Citation: https://doi.org/10.5194/egusphere-2024-3726-RC1 -
CC1: 'Data policy violation', Dario Papale, 11 Apr 2025
reply
Dear authors
we do an incredible effort to collect, standardize, process, harmonize and share openly these data and we only ask the proper attribution (the "BY" in the "CC-BY " license). I don't think it is a lot of work and for us it is crucial to maintain the network and the data availability.
This is completely ignored in this paper and I'm sorry to say that this is really annoying and a bad practice. It is definitely not sufficient to write "We thank FLUXNET2015 and Copernicus for their open-source datasets." and I think it is not so difficult to read what is accepted when you access the data (the license) and follow what is requested... It is clearly reported here when you use FLUXNET2015: https://fluxnet.org/data/data-policy/
Dario Papale in name of all the PIs and Regional Networks coordinators
NB: nothing personal against you, from now on we will send these notes and requests of amendment and paper corrections in all the cases that we will discover. It is something we need to improve as community...
Citation: https://doi.org/10.5194/egusphere-2024-3726-CC1 -
AC1: 'Reply on CC1', Mitra Cattry, 11 Apr 2025
reply
Dear Dr. Papale,
Thank you for your message and for your tremendous efforts in building and maintaining the FLUXNET2015 network, as well as for your continued advocacy for responsible and transparent data use within our community.
I would like to sincerely apologise for the omission of the proper attribution in our manuscript. We fully acknowledge the importance of crediting the data providers as outlined by the CC-BY license. It was certainly our intention to include the appropriate citations and acknowledgements during the revision stage. Unfortunately, this was unintentionally missed in the initial submission, despite several rounds of internal checks.
We understand how crucial these attributions are to sustaining the network and the open sharing of high-quality data. While such oversights can occasionally occur in the early stages of submission, especially in multi-author manuscripts, we are grateful for your reminder and are committed to addressing this in our revised version.
If there is specific wording or a preferred citation format that you and the network recommend, we would be happy to follow it precisely.
Lastly, as we had proposed reviewers from the FLUXNET community in part due to their scientific expertise with the dataset, we would have also appreciated feedback on the scientific aspects of the work. Nonetheless, we value your note and your broader efforts to strengthen standards across our field.
Thank you again, and please accept our apologies for the oversight.
Warm regards,
Mitra Cattry
on behalf of all co-authorsCitation: https://doi.org/10.5194/egusphere-2024-3726-AC1 -
CC2: 'Reply on AC1', Dario Papale, 12 Apr 2025
reply
Dear Mitra
errors can happen, but even in the early submissions, in particular for journal like the Copernicus where the pre-print is online, the policies must be respected carefully, even with many co-authors. I think that the data policy and the request for attribution is clear in the data policy at the link I added in the first comment (https://fluxnet.org/data/data-policy/) so I invite you to read it and apply, because should something be not clear we will need to better clarify there (although it has been correctly applied in different cases).
Best regards and good luck with the paper
Dario Papale
Citation: https://doi.org/10.5194/egusphere-2024-3726-CC2
-
CC2: 'Reply on AC1', Dario Papale, 12 Apr 2025
reply
-
AC1: 'Reply on CC1', Mitra Cattry, 11 Apr 2025
reply
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
272 | 38 | 7 | 317 | 6 | 3 |
- HTML: 272
- PDF: 38
- XML: 7
- Total: 317
- BibTeX: 6
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 188 | 56 |
China | 2 | 30 | 9 |
Germany | 3 | 14 | 4 |
France | 4 | 14 | 4 |
India | 5 | 8 | 2 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 188