the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An effective machine learning approach for predicting ecosystem CO2 assimilation across space and time
Abstract. Accurate predictions of environmental controls on ecosystem photosynthesis are essential for understanding the impacts of climate change and extreme events on the carbon cycle and the provisioning of ecosystem services. Using time-series measurements of ecosystem fluxes paired with measurements of meteorological variables from a network of globally distributed sites and remotely sensed vegetation indices, we train a recurrent deep neural network (Long-Short-Term Memory, LSTM), a simple deep neural network (DNN), and a mechanistic, theory-based photosynthesis model with the aim to predict ecosystem gross primary production (GPP). We test these models' ability to spatially and temporally generalise across a wide range of environmental conditions. Both neural network models outperform the theory-based model considering leave-site-out cross-validation (LSOCV). The LSTM model performs best and achieves a mean R2 of 0.78 across sites in the LSOCV and an average R2 of 0.82 across relatively moist temperate and boreal sites. This suggests that recurrent deep neural networks provide a basis for robust data-driven ecosystem photosynthesis modelling in respective biomes. However, limits to global model upscaling are identified using cross-validation by vegetation types and by continents. In particular, our model performance is weakest at relatively arid sites where unknown vegetation exposure to water limitation limits model reliability.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(800 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1826', Anonymous Referee #1, 06 Nov 2023
The manuscript presents different models for predicting GPP at EC towers. The authors show that the neural network with memory, the LSTM, performs best, contradicting previous research. This shows the need for incorporating memory effects for data-based ecosystem flux predictions, potentially improving the models that are now often used as benchmarks. Additionally, the authors find some limitations to the generalisability of the models, stressing an important caveat for current benchmarks.
Overall, the paper is very well written, of importance and clear. Although I think the discussion should be elaborated on some points, I only have some minor comments.
General comments:
L1: 'Accurate predictions of environmental controls on ecosystem photosynthesis are essential [...]'. Although I agree with the statement, the manuscript does not present a prediction of the environmental controls on GPP, but more a method of estimating GPP, and an analysis of its generalisability. I'm unsure if it's possible, within your framework, to analyse the information stored in the recurrent units, but such an analysis would contribute greatly to our understanding of ecosystem dynamics as well as the added value of the LSTM.
L12: '... arid sites where unknown vegetation exporsure to water limitation ...'. I do not understand why the vegetation exposure to water at such sites is more unknown than at wetter sites. Therefore, I would propose to rephrase, or explain this.
L131: I would suggest to explain why the weights in the MLP are shared between each timestep.
L185: LSTMs are known for being data-hungry, and I wonder if the predictive skill of the LSTM is similar with less data. If so, that means the LSTM is not limited by the data-availability, but rather model architecture or data representiveness. On the other hand, if the LSTM performs worse with less data, that means that longer timeseries of flux measurements would improve the fit. Therefore, I would like to see an analysis of the skill of some selected sites with 25%, 50%, 75% and 100% of all available data.
L185: Throughout the results section, mostly the R2-score is discussed. However, the RMSE would be informative as well, as mentioned in the methods section. Therefore, I would recommend to add RMSE as a metric to the boxplots in Figure 5.
L246: I wonder what would happen if one would add the precipitation of last week (or month, or year) to the input of the MLP, and if the authors have this or something similar, I would be interested to see it back in the discussion.
L248: I would urge the authors to add a discussion of my point at line 185 here.
L262: A recent paper (1) found that recurrent NNs could very well reproduce IAV, and I would urge the authors to discuss the differences between this manuscript and (1).
L287: I would urge the authors to compare their findings to those by other studies that used machine learning to predict GPP at EC sites, e.g. (2,3),
L302: The limited generalisability of the models at drier sites is very interesting, and definitely should be researched in more depth. e.g. (2) have also predicted LE, which could help to analyse where the models go wrong. Therefore, I wonder if the authors have thought about including LE. If so, I'd recommend to discuss the findings. If not, I would urge the authors to include this limitation in the discussion.
L343: The information present in fAPAR implied here is very interesting, but leaves to wonder what happens to the models predictive quality of one would leave fAPAR out as predictor. If the authors did such an analysis, it would be interesting to include here.
L350: This interesting statement raises questions about previously published literature that uses EC data to upscale GPP to a global scale e.g. (2,3). I urge the authors to elaborate on this point, and include a discussion about the interpretability of other studies.
Detailed comments:
Figure 1a: I suggest to make an inset for Europe, as the points now overlap.
L153: 'We trained .. loss function' has been noted before in L135 and can be removed
L156: A dropout probability of 0 to me means either never dropout, or always dropout (depending on the definition), and I therefore assume this is a typo.
L231: I would recommend to add standard deviations to the mean RMSE's mentioned.
REFERENCES:
(1) https://doi.org/10.1016/j.agrformet.2023.109691
(2) https://essd.copernicus.org/articles/10/1327/2018/
(3) https://bg.copernicus.org/articles/13/4291/2016/Citation: https://doi.org/10.5194/egusphere-2023-1826-RC1 -
EC1: 'Comment on egusphere-2023-1826 - Peer review halted', Jens-Arne Subke, 14 Dec 2023
The peer review for this manuscript has been halted as some problems with 'data leakage' during model training, which affects results and conclusions. The authors will work on a corrected version and resubmit a revised manuscript.
Jens-Arne Subke
Citation: https://doi.org/10.5194/egusphere-2023-1826-EC1
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1826', Anonymous Referee #1, 06 Nov 2023
The manuscript presents different models for predicting GPP at EC towers. The authors show that the neural network with memory, the LSTM, performs best, contradicting previous research. This shows the need for incorporating memory effects for data-based ecosystem flux predictions, potentially improving the models that are now often used as benchmarks. Additionally, the authors find some limitations to the generalisability of the models, stressing an important caveat for current benchmarks.
Overall, the paper is very well written, of importance and clear. Although I think the discussion should be elaborated on some points, I only have some minor comments.
General comments:
L1: 'Accurate predictions of environmental controls on ecosystem photosynthesis are essential [...]'. Although I agree with the statement, the manuscript does not present a prediction of the environmental controls on GPP, but more a method of estimating GPP, and an analysis of its generalisability. I'm unsure if it's possible, within your framework, to analyse the information stored in the recurrent units, but such an analysis would contribute greatly to our understanding of ecosystem dynamics as well as the added value of the LSTM.
L12: '... arid sites where unknown vegetation exporsure to water limitation ...'. I do not understand why the vegetation exposure to water at such sites is more unknown than at wetter sites. Therefore, I would propose to rephrase, or explain this.
L131: I would suggest to explain why the weights in the MLP are shared between each timestep.
L185: LSTMs are known for being data-hungry, and I wonder if the predictive skill of the LSTM is similar with less data. If so, that means the LSTM is not limited by the data-availability, but rather model architecture or data representiveness. On the other hand, if the LSTM performs worse with less data, that means that longer timeseries of flux measurements would improve the fit. Therefore, I would like to see an analysis of the skill of some selected sites with 25%, 50%, 75% and 100% of all available data.
L185: Throughout the results section, mostly the R2-score is discussed. However, the RMSE would be informative as well, as mentioned in the methods section. Therefore, I would recommend to add RMSE as a metric to the boxplots in Figure 5.
L246: I wonder what would happen if one would add the precipitation of last week (or month, or year) to the input of the MLP, and if the authors have this or something similar, I would be interested to see it back in the discussion.
L248: I would urge the authors to add a discussion of my point at line 185 here.
L262: A recent paper (1) found that recurrent NNs could very well reproduce IAV, and I would urge the authors to discuss the differences between this manuscript and (1).
L287: I would urge the authors to compare their findings to those by other studies that used machine learning to predict GPP at EC sites, e.g. (2,3),
L302: The limited generalisability of the models at drier sites is very interesting, and definitely should be researched in more depth. e.g. (2) have also predicted LE, which could help to analyse where the models go wrong. Therefore, I wonder if the authors have thought about including LE. If so, I'd recommend to discuss the findings. If not, I would urge the authors to include this limitation in the discussion.
L343: The information present in fAPAR implied here is very interesting, but leaves to wonder what happens to the models predictive quality of one would leave fAPAR out as predictor. If the authors did such an analysis, it would be interesting to include here.
L350: This interesting statement raises questions about previously published literature that uses EC data to upscale GPP to a global scale e.g. (2,3). I urge the authors to elaborate on this point, and include a discussion about the interpretability of other studies.
Detailed comments:
Figure 1a: I suggest to make an inset for Europe, as the points now overlap.
L153: 'We trained .. loss function' has been noted before in L135 and can be removed
L156: A dropout probability of 0 to me means either never dropout, or always dropout (depending on the definition), and I therefore assume this is a typo.
L231: I would recommend to add standard deviations to the mean RMSE's mentioned.
REFERENCES:
(1) https://doi.org/10.1016/j.agrformet.2023.109691
(2) https://essd.copernicus.org/articles/10/1327/2018/
(3) https://bg.copernicus.org/articles/13/4291/2016/Citation: https://doi.org/10.5194/egusphere-2023-1826-RC1 -
EC1: 'Comment on egusphere-2023-1826 - Peer review halted', Jens-Arne Subke, 14 Dec 2023
The peer review for this manuscript has been halted as some problems with 'data leakage' during model training, which affects results and conclusions. The authors will work on a corrected version and resubmit a revised manuscript.
Jens-Arne Subke
Citation: https://doi.org/10.5194/egusphere-2023-1826-EC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
537 | 221 | 43 | 801 | 27 | 25 |
- HTML: 537
- PDF: 221
- XML: 43
- Total: 801
- BibTeX: 27
- EndNote: 25
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1