the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Explainable machine learning for modelling of net ecosystem exchange in boreal forest
Abstract. There is a growing interest in applying machine learning methods to predict net ecosystem exchange (NEE) based on site information and climatic parameters. In case of successful performance, it could give an excellent opportunity for gapfilling or upscaling, i.e., extrapolation of results to times and sites for which direct measurements are unavailable. There exists already quite an extensive body of research covering different seasons, time scales, number of sites, input parameters (features), and models. We apply four machine learning models to predict NEE of boreal forest ecosystems based on climatic and site parameters. We use data sets from two stations in the Finnish boreal forest and model NEE during the peak growing season and the whole year. Using Explainable Artificial Intelligence methods, we compare the most important input parameters chosen by the models. In addition, we analyze the dependencies of NEE on input parameters against existing theoretical understanding on NEE drivers. We show that even though the statistical scores of some models can be very good, the results should be treated with caution especially when applied to upscaling. In the model setup with several interdependent parameters ubiquitous in atmospheric measurements, some models display strong opposite dependencies on these parameters. This behavior might have adverse consequences if models are applied to the data sets in future climate conditions. Our results highlight the importance of Explainable Artificial Intelligence methods for interpreting outcomes from machine learning models, in particular, when a large set of interdependent variables is used as a model input.
- Preprint
(2306 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2023-2559', Anonymous Referee #1, 29 Jan 2024
The authors model net ecosystem exchange (NEE) at two Sites in Finland, one boreal and one subarctic, and at two different time points for the boreal sites (pre and post-thinning). They used three machine learning models and linear regression alongside a suite of explanatory variables common to NEE studies. The authors aimed to evaluate the efficacy of the four different model types for accurately describing the NEE at the sites using whole-year and peak data sets. The main novelty of the manuscript, which the authors readily point out, is the use of explainable artificial intelligence methods to determine what are the most important explanatory variables across the models and data sets used, which they can then compare to current theory to help determine the best model for predicting NEE under different conditions.
Overall, the manuscript does a good job of accomplishing the aims it sets. However, the unbalanced datasets between the sites and single temporal resolution used for the models can cause some issues with interpretation of the results. When analyzing all of the sites together the authors point out that results mostly mirror the most represented site. Running the models with standardized data sets to reduce the influence of the site with highest observations when comparing all of the sites together could make the interpretations of these results more straightforward. Similarly, the authors mention that the temporal resolution of the data can influence whether it is better suited for gap-filling or upscaling studies, but proceed to use a single resolution best suited to gap-filling and conclude that the models have a harder time with upscaling. If another temporal resolution is available from the same towers it would be necessary to compare the two before determining how well the models can handle gap-filling vs upscaling. Despite these two potential flaws, which may not be feasible to address, the manuscript provides a good comparison between the model types and the use of explainable artificial intelligence methods helps to provide clear outcomes from the models.
Detailed comments:
Introduction: The introduction provides sufficient background information to understand the aim of the paper, however, it often comes across as overly explanatory and some sections could be synthesized and written more concisely to cut down on the word count. Consider revising this section to reduce unnecessary information and improve the flow of the background towards the objectives of the study.
Line 19-23: Awkward wording. The sentence could flow better, and currently does not come to a satisfying conclusion.
Line 26: The word “variable” from variable values can be removed, it flows better without and feels like it is already implied from how the sentence is written.
Line 49-50: This sentence is jarring and does not give any obvious rationale for why you are now mentioning measurement of different temporal scales. I would consider revising to improve the flow of the paragraph and maybe add a rationale as to why this matters for the paper. Maybe start by discussing the difference between gapfilling and upscaling studies/questions and then discuss how they are typically measured at different temporal scales, so if a researcher would want to look at both they would need multiple scales of data.
Line 74: Referring to subhourly time resolution - Does this relate to the findings of the paper, since you state earlier that upscaling studies typically use longer timescales and upscaling is the part you had the hardest time modeling? Why did you not try using data from multiple temporal resolutions to model both the gapfilling and upscaling if you knew that different resolutions were better for modeling different kinds of data?
Line 103-105: It is recommended to be consistent with the ordering of the sites throughout the methods. If you start with SMEAR I (Värriö) then SMEAR II (Hyytiälä), it would be best to always refer to them in that order to avoid confusion.
Line 142: Referring to training vs test data - You should mention what the test sets were as well and how they were selected. It seems like it was 4% of observations used for testing, except for post-thinning Hyytiälä which used 3%, why?
Line 143: Referencing the phrase “ individual sites” - Does this separate pre- and post-thinning Hyytiälä, so there are three sets of all season data, and three sets of peak season data, then one set of all data combined, correct? It may be good to be more explicit about what constitutes as individual sites since pre and post-thinning Hyytiälä are from the same site.
Line 210: Another small comment, start with the ALE plot paragraph since the method is mentioned first, or mention Permutation Feature Importance first in the preceeding paragraph. It is helpful to be consistent with the order things are discussed.
Line 217: Estimates
Line 273: Refer to either R2, R-squared, or R-scores throughout the document, do not switch between them.
Line 283: remove “:”, instead use “;”. Also “accounted here”, should be “accounted for here”.
Line 365: I would change the formatting used for this section.
Line 418-421: I think this distinction is unnecessary, you could just state that you distinguished between the sites by coding them to three dummy variables.
Line 427: Have you tried running the models after standardizing the number of observations included from each site? Ideally twice to compare the same time periods for Värriö and pre-thinning Hyytiälä, and Värriö and post-thinning Hyytiälä. This could help prevent the scores from following a single site just because it was sampled more.
Line 466-468: Generally, this should either be in brackets inside the other sentence, or have the brackets removed.
Figure captions: I believe figures should stand on their own without requiring the reader to have read either the main text or other figure captions. Most of your figure captions are a single line and not very descriptive. Even, as in figure 2, where you ask the reader to refer to an earlier caption is missing from the other figure captions. It would be best if you wrote full captions for all figures.
Citation: https://doi.org/10.5194/egusphere-2023-2559-RC1 -
AC1: 'Reply on RC1', Topi Laanti, 04 Apr 2024
We are grateful to the Referee for reviewing the manuscript and for providing valuable insights, which helped to improve the clarity of the manuscript. We improved the manuscript based on these suggestions. Please find attached a PDF file with a detailed response to each of the referees comments.
BR,
Topi Laanti
-
AC1: 'Reply on RC1', Topi Laanti, 04 Apr 2024
-
RC2: 'Comment on egusphere-2023-2559', Anonymous Referee #2, 09 Jun 2024
Ezhova et al. provided an interesting study to utilize multiple basic machine learning algorithms to quantify net ecosystem exchange in a boreal forest site. Beyond machine learning performance comparison, this study also compared the feature importance of each machine learning algorithm to understand the driving factors of NEE in this site. Overall, the manuscript is well written. However, there are a few flaws in the current manuscript. Please find my comments below.
- The abstracts need to be improved by adding more details. The current version is a little vague. For example, the current version only mentioned four machine learning models. It is better to explicitly elaborate what kinds of four machine learning algorithms were used in the manuscript. Furthermore, model performance and statistics are also better to be presented in the abstract. In addition, the manuscript highlights the explainable machine learning to quantify NEE drivers. However, no details on which drivers are most important for NEE predictions in the abstract.
- The manuscript was only conducted once for training and testing dataset splitting. As we know, there are always uncertainties in the data splitting. It is better to conduct data splitting multiple times to also present uncertainties of R2 and RMSE in Figures 1 and 2.
- From Figures 1 and 2, it seems that some machine learning models are overfitting, which means that training performance is much better than testing performance. It is better to fune hyperparameters of machine learning models to avoid machine learning model overfitting.
- Figures 3 and 6. There are too many points in the scatter plots. It is better to use density scatter plots to illustrate the results.
- Figures 4 and 5. It is better to add units to the y-axis.
- Figures 7 and 11. Better to add uncertainty bars, once you have done different data splitting.
- Figure A1. It is better to add the significance levels for the correlation analysis.
Citation: https://doi.org/10.5194/egusphere-2023-2559-RC2 -
AC2: 'Reply on RC2', Topi Laanti, 30 Jun 2024
We are grateful to the Referee for reviewing the manuscript and for providing valuable insights, which helped to improve the clarity of the manuscript. We improved the manuscript based on these suggestions. Please find attached a PDF file with a detailed response to each of the referees comments.
BR,
Topi Laanti
Status: closed
-
RC1: 'Comment on egusphere-2023-2559', Anonymous Referee #1, 29 Jan 2024
The authors model net ecosystem exchange (NEE) at two Sites in Finland, one boreal and one subarctic, and at two different time points for the boreal sites (pre and post-thinning). They used three machine learning models and linear regression alongside a suite of explanatory variables common to NEE studies. The authors aimed to evaluate the efficacy of the four different model types for accurately describing the NEE at the sites using whole-year and peak data sets. The main novelty of the manuscript, which the authors readily point out, is the use of explainable artificial intelligence methods to determine what are the most important explanatory variables across the models and data sets used, which they can then compare to current theory to help determine the best model for predicting NEE under different conditions.
Overall, the manuscript does a good job of accomplishing the aims it sets. However, the unbalanced datasets between the sites and single temporal resolution used for the models can cause some issues with interpretation of the results. When analyzing all of the sites together the authors point out that results mostly mirror the most represented site. Running the models with standardized data sets to reduce the influence of the site with highest observations when comparing all of the sites together could make the interpretations of these results more straightforward. Similarly, the authors mention that the temporal resolution of the data can influence whether it is better suited for gap-filling or upscaling studies, but proceed to use a single resolution best suited to gap-filling and conclude that the models have a harder time with upscaling. If another temporal resolution is available from the same towers it would be necessary to compare the two before determining how well the models can handle gap-filling vs upscaling. Despite these two potential flaws, which may not be feasible to address, the manuscript provides a good comparison between the model types and the use of explainable artificial intelligence methods helps to provide clear outcomes from the models.
Detailed comments:
Introduction: The introduction provides sufficient background information to understand the aim of the paper, however, it often comes across as overly explanatory and some sections could be synthesized and written more concisely to cut down on the word count. Consider revising this section to reduce unnecessary information and improve the flow of the background towards the objectives of the study.
Line 19-23: Awkward wording. The sentence could flow better, and currently does not come to a satisfying conclusion.
Line 26: The word “variable” from variable values can be removed, it flows better without and feels like it is already implied from how the sentence is written.
Line 49-50: This sentence is jarring and does not give any obvious rationale for why you are now mentioning measurement of different temporal scales. I would consider revising to improve the flow of the paragraph and maybe add a rationale as to why this matters for the paper. Maybe start by discussing the difference between gapfilling and upscaling studies/questions and then discuss how they are typically measured at different temporal scales, so if a researcher would want to look at both they would need multiple scales of data.
Line 74: Referring to subhourly time resolution - Does this relate to the findings of the paper, since you state earlier that upscaling studies typically use longer timescales and upscaling is the part you had the hardest time modeling? Why did you not try using data from multiple temporal resolutions to model both the gapfilling and upscaling if you knew that different resolutions were better for modeling different kinds of data?
Line 103-105: It is recommended to be consistent with the ordering of the sites throughout the methods. If you start with SMEAR I (Värriö) then SMEAR II (Hyytiälä), it would be best to always refer to them in that order to avoid confusion.
Line 142: Referring to training vs test data - You should mention what the test sets were as well and how they were selected. It seems like it was 4% of observations used for testing, except for post-thinning Hyytiälä which used 3%, why?
Line 143: Referencing the phrase “ individual sites” - Does this separate pre- and post-thinning Hyytiälä, so there are three sets of all season data, and three sets of peak season data, then one set of all data combined, correct? It may be good to be more explicit about what constitutes as individual sites since pre and post-thinning Hyytiälä are from the same site.
Line 210: Another small comment, start with the ALE plot paragraph since the method is mentioned first, or mention Permutation Feature Importance first in the preceeding paragraph. It is helpful to be consistent with the order things are discussed.
Line 217: Estimates
Line 273: Refer to either R2, R-squared, or R-scores throughout the document, do not switch between them.
Line 283: remove “:”, instead use “;”. Also “accounted here”, should be “accounted for here”.
Line 365: I would change the formatting used for this section.
Line 418-421: I think this distinction is unnecessary, you could just state that you distinguished between the sites by coding them to three dummy variables.
Line 427: Have you tried running the models after standardizing the number of observations included from each site? Ideally twice to compare the same time periods for Värriö and pre-thinning Hyytiälä, and Värriö and post-thinning Hyytiälä. This could help prevent the scores from following a single site just because it was sampled more.
Line 466-468: Generally, this should either be in brackets inside the other sentence, or have the brackets removed.
Figure captions: I believe figures should stand on their own without requiring the reader to have read either the main text or other figure captions. Most of your figure captions are a single line and not very descriptive. Even, as in figure 2, where you ask the reader to refer to an earlier caption is missing from the other figure captions. It would be best if you wrote full captions for all figures.
Citation: https://doi.org/10.5194/egusphere-2023-2559-RC1 -
AC1: 'Reply on RC1', Topi Laanti, 04 Apr 2024
We are grateful to the Referee for reviewing the manuscript and for providing valuable insights, which helped to improve the clarity of the manuscript. We improved the manuscript based on these suggestions. Please find attached a PDF file with a detailed response to each of the referees comments.
BR,
Topi Laanti
-
AC1: 'Reply on RC1', Topi Laanti, 04 Apr 2024
-
RC2: 'Comment on egusphere-2023-2559', Anonymous Referee #2, 09 Jun 2024
Ezhova et al. provided an interesting study to utilize multiple basic machine learning algorithms to quantify net ecosystem exchange in a boreal forest site. Beyond machine learning performance comparison, this study also compared the feature importance of each machine learning algorithm to understand the driving factors of NEE in this site. Overall, the manuscript is well written. However, there are a few flaws in the current manuscript. Please find my comments below.
- The abstracts need to be improved by adding more details. The current version is a little vague. For example, the current version only mentioned four machine learning models. It is better to explicitly elaborate what kinds of four machine learning algorithms were used in the manuscript. Furthermore, model performance and statistics are also better to be presented in the abstract. In addition, the manuscript highlights the explainable machine learning to quantify NEE drivers. However, no details on which drivers are most important for NEE predictions in the abstract.
- The manuscript was only conducted once for training and testing dataset splitting. As we know, there are always uncertainties in the data splitting. It is better to conduct data splitting multiple times to also present uncertainties of R2 and RMSE in Figures 1 and 2.
- From Figures 1 and 2, it seems that some machine learning models are overfitting, which means that training performance is much better than testing performance. It is better to fune hyperparameters of machine learning models to avoid machine learning model overfitting.
- Figures 3 and 6. There are too many points in the scatter plots. It is better to use density scatter plots to illustrate the results.
- Figures 4 and 5. It is better to add units to the y-axis.
- Figures 7 and 11. Better to add uncertainty bars, once you have done different data splitting.
- Figure A1. It is better to add the significance levels for the correlation analysis.
Citation: https://doi.org/10.5194/egusphere-2023-2559-RC2 -
AC2: 'Reply on RC2', Topi Laanti, 30 Jun 2024
We are grateful to the Referee for reviewing the manuscript and for providing valuable insights, which helped to improve the clarity of the manuscript. We improved the manuscript based on these suggestions. Please find attached a PDF file with a detailed response to each of the referees comments.
BR,
Topi Laanti
Data sets
Hyytiälä, Värriö Hari et al, Kulmala et al https://smear.avaa.csc.fi/
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
576 | 217 | 39 | 832 | 54 | 34 |
- HTML: 576
- PDF: 217
- XML: 39
- Total: 832
- BibTeX: 54
- EndNote: 34
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1