the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Advances in Land Surface Model-based Forecasting: A Comparison of LSTM, Gradient Boosting, and Feedforward Neural Networks as Prognostic State Emulators in a Case Study with ECLand
Abstract. Most useful weather prediction for the public is near the surface. The processes that are most relevant for near-surface weather prediction are also those that are most interactive and exhibit positive feedback or have key role in energy partitioning. Land surface models (LSMs) consider these processes together with surface heterogeneity and forecast water, carbon and energy fluxes, and coupled with an atmospheric model provide boundary and initial conditions. This numerical parametrization of atmospheric boundaries being computationally expensive, statistical surrogate models are increasingly used to accelerated progress in experimental research. We evaluated the efficiency of three surrogate models in speeding up experimental research by simulating land surface processes, which are integral to forecasting water, carbon, and energy fluxes in coupled atmospheric models. Specifically, we compared the performance of a Long-Short Term Memory (LSTM) encoder-decoder network, extreme gradient boosting, and a feed-forward neural network within a physics-informed multi-objective framework. This framework emulates key states of the ECMWF's Integrated Forecasting System (IFS) land surface scheme, ECLand, across continental and global scales. Our findings indicate that while all models on average demonstrate high accuracy over the forecast period, the LSTM network excels in continental long-range predictions when carefully tuned, the XGB scores consistently high across tasks and the MLP provides an excellent implementation-time-accuracy trade-off. The runtime reduction achieved by the emulators in comparison to the full numerical models are significant, offering a faster, yet reliable alternative for conducting numerical experiments on land surfaces.
- Preprint
(2657 KB) - Metadata XML
-
Supplement
(8262 KB) - BibTeX
- EndNote
Status: open (until 07 Oct 2024)
-
CEC1: 'Comment on egusphere-2024-2081', Astrid Kerkweg, 06 Sep 2024
reply
Dear authors,
in my role as Executive editor of GMD, I would like to bring to your attention our Editorial version 1.2:
https://www.geosci-model-dev.net/12/2215/2019/
This highlights some requirements of papers published in GMD, which is also available on the GMD website in the ‘Manuscript Types’ section:
http://www.geoscientific-model-development.net/submission/manuscript_types.html
In particular, please note that for your paper, the following requirements have not been met in the Discussions paper:
- "Code must be published on a persistent public archive with a unique identifier for the exact model version described in the paper or uploaded to the supplement, unless this is impossible for reasons beyond the control of authors. All papers must include a section, at the end of the paper, entitled "Code availability". Here, either instructions for obtaining the code, or the reasons why the code is not available should be clearly stated. It is preferred for the code to be uploaded as a supplement or to be made available at a data repository with an associated DOI (digital object identifier) for the exact model version described in the paper. Alternatively, for established models, there may be an existing means of accessing the code through a particular system. In this case, there must exist a means of permanently accessing the precise model version described in the paper. In some cases, authors may prefer to put models on their own website, or to act as a point of contact for obtaining the code. Given the impermanence of websites and email addresses, this is not encouraged, and authors should consider improving the availability with a more permanent arrangement. Making code available through personal websites or via email contact to the authors is not sufficient. After the paper is accepted the model archive should be updated to include a link to the GMD paper."
Therefore please provide a the source code which exactly relates to the version used for this publication in a permanent archive (DOI). Additionally, you should provide the data (training data + output data). If the amount of the data is too high, please state so in the data availability section and provide the information on which data has been used (similar as in the github repository) within the data availability section.
Yours,
Astrid Kerkweg (GMD executive Editor)
Citation: https://doi.org/10.5194/egusphere-2024-2081-CEC1 -
RC1: 'Comment on egusphere-2024-2081', Simon O'Meara, 12 Sep 2024
reply
Referee Review of ‘Advances in Land Surface Model-based Forecasting: A Comparison of LSTM, Gradient Boosting, and Feedforward Neural Networks as Prognostic State Emulators in a Case Study with ECLand’
The authors have identified a component of numerical land surface and weather forecasting that has not previously been tested against current methods of surrogate model development. It is the role of this paper to develop (and provide links to the code of) surrogate models and verify them against a benchmark numerical model.
There is a lot to like about the paper, of course the points below focus on weaknesses, but I would like to thank the authors for an enjoyable read, and some very good research.
Overall I think the points below constitute minor revisions (or appropriate rebuttals from the authors), but I cannot recommend the paper for publication as is.
Related work and appropriate references are included for machine learning methods. Though there are very few references to examples of the numerical experiments on land surface that the surrogate models can provide for.
In terms of scientific quality and significance, the paper expertly develops relevant machine learning methods for predicting variables of land surface models, which is an important and challenging step toward a complete evaluation of the surrogate models.
The full significance of the paper is currently understated because the authors do not provide examples of how the surrogate models could be applied to numerical experiments on land surfaces, and, importantly, how the inaccuracies quantified through comparison with ECLand could impact such experiments. I recommend the authors revise the paper so that such examples and related discussion are included – this could be in the discussion section. In addition, I have made further recommendations below.
Lines 110-113
It is unclear whether the surrogate models developed here can predict all of the variables that the original ECLand model predicts (and could therefore potentially fully replace the ECLand model).
I do not see in the main paper information on the runtime of surrogate models (for experiments representative of numerical experiments on land surfaces) alongside the runtime for ECLand for comparison. This information does need to be included as it is the driving force behind the work.
As mentioned by the other reviewer, although a link to GitHub is provided, a persistent public archive source is not provided.
Citations in the main text are very messy – a mixture of citation styles, making it unacceptable for publication in its current form.
There are multiple spelling and punctuation errors that need resolving before publication.
The abstract describes the emulators as reliable alternatives, however, the discussion stresses that the definition of reliability depends on the application (thereby placing the determination of reliability on the reader). As such, I recommend the abstract be changed to accurately represent this important discussion point.
Where necessary ‘-3’ to denote per unit cubed needs to be superscript
In figure 2 and elsewhere, the type of fraction that snow cover fraction represents needs to be stated, e.g. (%) or (0-1)
Section 3 and throughout – RMSEs and MAEs should be given in units of the variable they are assessing model accuracy for, e.g. K for soil temperature.
Because RMSE and MAE have units of the variable they are assessing model accuracy for, I do not think that RMSE and MAE results of different variables can be combined, as I think they are in Figure 2a and Table 2 and in other parts of results (e.g. Fig. 4a). The main text should be changed accordingly.
Because ACC is a relative value I can see how ACC results of the assessed variables can be combined into one score per model. If this combination is what is shown in Figure 2 (and perhaps elsewhere) then it needs to be stated clearly. Additionally, it should be explained in the method how ACC results of the difference variables were combined, e.g., is an arithmetic mean calculated?
The caption in figure 3 needs to explain what the top row of sub-plots is showing, i.e. average snow cover in these regions – but what kind of average and from what source is the data, is it ECLand?
There needs to be greater emphasis in the abstract and elsewhere that when accuracy is discussed, the authors mean in terms of verification against synthetic data, not evaluation against observations. I think the authors should state very clearly somewhere that further work is needed for evaluation against observations before recommendation of any of the surrogate models for numerical experiments is possible.
Simon O’Meara
Citation: https://doi.org/10.5194/egusphere-2024-2081-RC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
157 | 42 | 34 | 233 | 33 | 5 | 4 |
- HTML: 157
- PDF: 42
- XML: 34
- Total: 233
- Supplement: 33
- BibTeX: 5
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1