the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Advances in Land Surface Model-based Forecasting: A Comparison of LSTM, Gradient Boosting, and Feedforward Neural Networks as Prognostic State Emulators in a Case Study with ECLand
Abstract. Most useful weather prediction for the public is near the surface. The processes that are most relevant for near-surface weather prediction are also those that are most interactive and exhibit positive feedback or have key role in energy partitioning. Land surface models (LSMs) consider these processes together with surface heterogeneity and forecast water, carbon and energy fluxes, and coupled with an atmospheric model provide boundary and initial conditions. This numerical parametrization of atmospheric boundaries being computationally expensive, statistical surrogate models are increasingly used to accelerated progress in experimental research. We evaluated the efficiency of three surrogate models in speeding up experimental research by simulating land surface processes, which are integral to forecasting water, carbon, and energy fluxes in coupled atmospheric models. Specifically, we compared the performance of a Long-Short Term Memory (LSTM) encoder-decoder network, extreme gradient boosting, and a feed-forward neural network within a physics-informed multi-objective framework. This framework emulates key states of the ECMWF's Integrated Forecasting System (IFS) land surface scheme, ECLand, across continental and global scales. Our findings indicate that while all models on average demonstrate high accuracy over the forecast period, the LSTM network excels in continental long-range predictions when carefully tuned, the XGB scores consistently high across tasks and the MLP provides an excellent implementation-time-accuracy trade-off. The runtime reduction achieved by the emulators in comparison to the full numerical models are significant, offering a faster, yet reliable alternative for conducting numerical experiments on land surfaces.
- Preprint
(2657 KB) - Metadata XML
-
Supplement
(8262 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2024-2081', Astrid Kerkweg, 06 Sep 2024
Dear authors,
in my role as Executive editor of GMD, I would like to bring to your attention our Editorial version 1.2:
https://www.geosci-model-dev.net/12/2215/2019/
This highlights some requirements of papers published in GMD, which is also available on the GMD website in the ‘Manuscript Types’ section:
http://www.geoscientific-model-development.net/submission/manuscript_types.html
In particular, please note that for your paper, the following requirements have not been met in the Discussions paper:
- "Code must be published on a persistent public archive with a unique identifier for the exact model version described in the paper or uploaded to the supplement, unless this is impossible for reasons beyond the control of authors. All papers must include a section, at the end of the paper, entitled "Code availability". Here, either instructions for obtaining the code, or the reasons why the code is not available should be clearly stated. It is preferred for the code to be uploaded as a supplement or to be made available at a data repository with an associated DOI (digital object identifier) for the exact model version described in the paper. Alternatively, for established models, there may be an existing means of accessing the code through a particular system. In this case, there must exist a means of permanently accessing the precise model version described in the paper. In some cases, authors may prefer to put models on their own website, or to act as a point of contact for obtaining the code. Given the impermanence of websites and email addresses, this is not encouraged, and authors should consider improving the availability with a more permanent arrangement. Making code available through personal websites or via email contact to the authors is not sufficient. After the paper is accepted the model archive should be updated to include a link to the GMD paper."
Therefore please provide a the source code which exactly relates to the version used for this publication in a permanent archive (DOI). Additionally, you should provide the data (training data + output data). If the amount of the data is too high, please state so in the data availability section and provide the information on which data has been used (similar as in the github repository) within the data availability section.
Yours,
Astrid Kerkweg (GMD executive Editor)
Citation: https://doi.org/10.5194/egusphere-2024-2081-CEC1 -
AC1: 'Reply on CEC1', Marieke Wesselkamp, 06 Oct 2024
Dear Astrid Kerkweg,
Many thanks for clarifying the guidelines of code and data storage requirements for our development submission. To complement the GitHub repository for full reproducibility, code, models and details on the experimental configuration for reproducing results of the three machine learning emulators are now stored in a permanent and public OSF repository (DOI: 10.17605/OSF.IO/8567D). The DOI will be added to the data availability statement. We have further requested DOIs for all training and testing data sets that will be published on the ECMWF server and ready for download on request. As soon as we have received the DOIs, we will also add them to the data availability statement.
We hope this addresses your concerns and we will gladly take additional steps otherwise.
Sincerely
Marieke Wesselkamp (for all authors)
Citation: https://doi.org/10.5194/egusphere-2024-2081-AC1
-
RC1: 'Comment on egusphere-2024-2081', Simon O'Meara, 12 Sep 2024
Referee Review of ‘Advances in Land Surface Model-based Forecasting: A Comparison of LSTM, Gradient Boosting, and Feedforward Neural Networks as Prognostic State Emulators in a Case Study with ECLand’
The authors have identified a component of numerical land surface and weather forecasting that has not previously been tested against current methods of surrogate model development. It is the role of this paper to develop (and provide links to the code of) surrogate models and verify them against a benchmark numerical model.
There is a lot to like about the paper, of course the points below focus on weaknesses, but I would like to thank the authors for an enjoyable read, and some very good research.
Overall I think the points below constitute minor revisions (or appropriate rebuttals from the authors), but I cannot recommend the paper for publication as is.
Related work and appropriate references are included for machine learning methods. Though there are very few references to examples of the numerical experiments on land surface that the surrogate models can provide for.
In terms of scientific quality and significance, the paper expertly develops relevant machine learning methods for predicting variables of land surface models, which is an important and challenging step toward a complete evaluation of the surrogate models.
The full significance of the paper is currently understated because the authors do not provide examples of how the surrogate models could be applied to numerical experiments on land surfaces, and, importantly, how the inaccuracies quantified through comparison with ECLand could impact such experiments. I recommend the authors revise the paper so that such examples and related discussion are included – this could be in the discussion section. In addition, I have made further recommendations below.
Lines 110-113
It is unclear whether the surrogate models developed here can predict all of the variables that the original ECLand model predicts (and could therefore potentially fully replace the ECLand model).
I do not see in the main paper information on the runtime of surrogate models (for experiments representative of numerical experiments on land surfaces) alongside the runtime for ECLand for comparison. This information does need to be included as it is the driving force behind the work.
As mentioned by the other reviewer, although a link to GitHub is provided, a persistent public archive source is not provided.
Citations in the main text are very messy – a mixture of citation styles, making it unacceptable for publication in its current form.
There are multiple spelling and punctuation errors that need resolving before publication.
The abstract describes the emulators as reliable alternatives, however, the discussion stresses that the definition of reliability depends on the application (thereby placing the determination of reliability on the reader). As such, I recommend the abstract be changed to accurately represent this important discussion point.
Where necessary ‘-3’ to denote per unit cubed needs to be superscript
In figure 2 and elsewhere, the type of fraction that snow cover fraction represents needs to be stated, e.g. (%) or (0-1)
Section 3 and throughout – RMSEs and MAEs should be given in units of the variable they are assessing model accuracy for, e.g. K for soil temperature.
Because RMSE and MAE have units of the variable they are assessing model accuracy for, I do not think that RMSE and MAE results of different variables can be combined, as I think they are in Figure 2a and Table 2 and in other parts of results (e.g. Fig. 4a). The main text should be changed accordingly.
Because ACC is a relative value I can see how ACC results of the assessed variables can be combined into one score per model. If this combination is what is shown in Figure 2 (and perhaps elsewhere) then it needs to be stated clearly. Additionally, it should be explained in the method how ACC results of the difference variables were combined, e.g., is an arithmetic mean calculated?
The caption in figure 3 needs to explain what the top row of sub-plots is showing, i.e. average snow cover in these regions – but what kind of average and from what source is the data, is it ECLand?
There needs to be greater emphasis in the abstract and elsewhere that when accuracy is discussed, the authors mean in terms of verification against synthetic data, not evaluation against observations. I think the authors should state very clearly somewhere that further work is needed for evaluation against observations before recommendation of any of the surrogate models for numerical experiments is possible.
Simon O’Meara
Citation: https://doi.org/10.5194/egusphere-2024-2081-RC1 -
AC2: 'Reply on RC1', Marieke Wesselkamp, 14 Oct 2024
Dear Simon O’Meara,
Many thanks for assessing our work and providing many valuable suggestions for its improvement. The general comment gives a very helpful perspective, sorry for not having pointed out the significance of surrogate models to the full extent. We will address the comment together with the last specific comment and the comment of referee 2.
In our revised manuscript, we will place the development of our emulators more clearly in the context of coupled earth system models: In the IFS, the land surface is coupled to the atmosphere via skin temperature, the predictability of which is known to be influenced by soil moisture and soil temperature. This is the numerical interface where a surrogate model could act in application and it motivates the experiment from a broader perspective, within which we also mention their application as adjoint models. Currently however, only a subset of ECLand variables is represented by the emulators so they don’t replace the full numerical model capabilities.
As such, we will continue to point out that the emulators are useful as alone standing models for the aforementioned experiments on the land surface. The computation of forecast horizons is an example in this context, as we can see it as a step toward a seasonal predictability analysis of land surface components. A full predictability analysis requires ensemble simulations, and the emulators can serve here again as a quick surrogate for the numerical model (will be added as example). We will also mention sensitivity analysis in an uncoupled version in this context (will be added as example).
Alongside this however, we will address the last specific comment and therefore stress that before we can use the emulators for any such experiments as a reliable alternative, an evaluation on observations is necessary to avoid misleading statements. We will underline this point by referring to two specific sources of error in a basic emulation procedure: That is the structural uncertainty by statistical approximation of the numerical model, and the training and inference in the currently synthetic data domain.
We hope this will address some of your concerns.
Kind regards
Marieke Wesselkamp (for all authors)
Lines 110-113
It is unclear whether the surrogate models developed here can predict all of the variables that the original ECLand model predicts (and could therefore potentially fully replace the ECLand model).
See answer to general comment.
I do not see in the main paper information on the runtime of surrogate models (for experiments representative of numerical experiments on land surfaces) alongside the runtime for ECLand for comparison. This information does need to be included as it is the driving force behind the work.
Will be included in the manuscript.
As mentioned by the other reviewer, although a link to GitHub is provided, a persistent public archive source is not provided.
See answer to editorial comment.
Citations in the main text are very messy – a mixture of citation styles, making it unacceptable for publication in its current form.
There are multiple spelling and punctuation errors that need resolving before publication.
We will of course clean the citations and the spelling errors.
The abstract describes the emulators as reliable alternatives, however, the discussion stresses that the definition of reliability depends on the application (thereby placing the determination of reliability on the reader). As such, I recommend the abstract be changed to accurately represent this important discussion point.
See answer to general comment. We will adjust our abstract to better match the revised content.
Where necessary ‘-3’ to denote per unit cubed needs to be superscript
Will be adjusted.
In figure 2 and elsewhere, the type of fraction that snow cover fraction represents needs to be stated, e.g. (%) or (0-1)
Will be adjusted.
Section 3 and throughout – RMSEs and MAEs should be given in units of the variable they are assessing model accuracy for, e.g. K for soil temperature.
Will be adjusted.
Because RMSE and MAE have units of the variable they are assessing model accuracy for, I do not think that RMSE and MAE results of different variables can be combined, as I think they are in Figure 2a and Table 2 and in other parts of results (e.g. Fig. 4a). The main text should be changed accordingly.
We thank the referee for making this point, and we agree that the aggregated RMSE and MAE scores are not meaningful for inference. However, as we conduct a multi-objective and unweighted optimization towards the global average during model training with the MSE, the aggregated results we report also indicate the global test scores. We state in the discussion that the results on single variables may even differ with a variable-targeted optimization. As such, we prefer to keep reporting the global aggregated scores but will point out their lack of interpretation in the discussion.
Because ACC is a relative value I can see how ACC results of the assessed variables can be combined into one score per model. If this combination is what is shown in Figure 2 (and perhaps elsewhere) then it needs to be stated clearly. Additionally, it should be explained in the method how ACC results of the difference variables were combined, e.g., is an arithmetic mean calculated?
We thank the referee for this observation. The ACC is calculated as the spatial arithmetic mean over grid cells for the forecast horizon, and as the spatio-temporal mean for the total scores we report. We will add the description of aggregation formally in the methods section.
The caption in figure 3 needs to explain what the top row of sub-plots is showing, i.e. average snow cover in these regions – but what kind of average and from what source is the data, is it ECLand?
Will be adjusted.
There needs to be greater emphasis in the abstract and elsewhere that when accuracy is discussed, the authors mean in terms of verification against synthetic data, not evaluation against observations. I think the authors should state very clearly somewhere that further work is needed for evaluation against observations before recommendation of any of the surrogate models for numerical experiments is possible.
We thank the reviewer again for this helpful assessment. For the answer to this, see answer to general comment.
Citation: https://doi.org/10.5194/egusphere-2024-2081-AC2
-
AC2: 'Reply on RC1', Marieke Wesselkamp, 14 Oct 2024
-
RC2: 'Comment on egusphere-2024-2081', Anonymous Referee #2, 28 Sep 2024
General comments
This paper describes a comparative analysis of emulators as surrogate models for land surface modeling. All three tested emulators achieved high predictive scores. Different effectiveness and the unique advantages of each emulator are analyzed and discussed. This presented work shows the great potential of emulators in land surface modeling, especially regarding computational effectiveness. The authors did a great job in describing the models and in explaining the training and testing procedures. The logic of this paper is quite clear, and it is very well written. I only have a few very minor points for the authors to consider.
(very) Minor
I know the emulators are tested as offline surrogate models, but some discussions on the potential use of the emulators within the fully coupled model could guide the usage of the emulators in future research and development.
Technical corrections
L50 and elsewhere: please update the reference format.
Table 1: how do you feed ‘low’ and ‘high’ into the emulators?
I think Section 2.3.2 is a nice and concise description and summary of LSTM.
Citation: https://doi.org/10.5194/egusphere-2024-2081-RC2 -
AC3: 'Reply on RC2', Marieke Wesselkamp, 14 Oct 2024
I know the emulators are tested as offline surrogate models, but some discussions on the potential use of the emulators within the fully coupled model could guide the usage of the emulators in future research and development.
We thank the referee for the generous assessment and this comment on our work. We acknowledge it and will address this as already described in the answer to general comment of referee 1.
Technical corrections
L50 and elsewhere: please update the reference format.
Will be adjusted.
Table 1: how do you feed ‘low’ and ‘high’ into the emulators?
Grid cells are dived into multiple fractions of the different coverage types, of which high and low vegetation without snow each are one. So, they are given to the emulators as percentage values between 0 and 1.
I think Section 2.3.2 is a nice and concise description and summary of LSTM.
We thank the referee for this assessment.
Citation: https://doi.org/10.5194/egusphere-2024-2081-AC3
-
AC3: 'Reply on RC2', Marieke Wesselkamp, 14 Oct 2024
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
311 | 89 | 138 | 538 | 57 | 8 | 6 |
- HTML: 311
- PDF: 89
- XML: 138
- Total: 538
- Supplement: 57
- BibTeX: 8
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1