the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine Learning for numerical weather and climate modelling: a review
Catherine Odelia de Burgh-Day
Tennessee Leeuwenburg
Abstract. Machine learning (ML) is increasing in popularity in the field of weather and climate modelling. Applications range from improved solvers and preconditioners, to parametrisation scheme emulation and replacement, and recently even to full ML-based weather and climate prediction models. While ML has been used in this space for more than 25 years, it is only in the last 10 or so years that progress has accelerated to the point that ML applications are becoming competitive with numerical knowledge-based alternatives. In this review, we provide a roughly chronological summary of the application of ML to aspects of weather and climate modelling from early publications through to the latest progress at the time of writing. We also provide an overview of key ML concepts and terms. Our aim is to provide a primer for researchers and model developers to rapidly familiarize and update themselves with the world of ML in the context of weather and climate models.
Catherine Odelia de Burgh-Day and Tennessee Leeuwenburg
Status: open (until 12 Jun 2023)
-
RC1: 'Comment on egusphere-2023-350', Anonymous Referee #1, 17 May 2023
reply
General comments:
The authors review the application of Machine Learning (ML) techniques to weather and climate modelling with an emphasis on historical and current developments. A glossary of commonly used terms and basic introductions to some concepts are provided. An in-depth exchange of knowledge between the ML and geoscientific modelling communities could be immensely beneficial and reviews like this can be an important step to facilitate this exchange.
As far as I am able to judge, the authors do a great job at covering a wide range of relevant publications and explaining the questions tackled in many of these applications. In terms of presentation, the language is concise and the paper is enjoyable to read. Including tabular or schematic representations of ML concepts and/or the discussed applications could further improve the visual appeal of the paper. A stronger narrative thread linking the different subsections and applications would preempt the impression of reading through a long list of papers - although this may be unavoidable given the scope of the reviewed works.
My primary concern about the paper in its current form is its utility to aid researchers in the development of better geoscientific models. Due to the wide range of works that are being discussed, many concepts and models are only touched upon in brief, without further elaboration of the underlying principles and connections between different applications. References to methodological works that could support future model development are only sparsely included in the main text or the glossary. In my opinion, incorporating suitable methodological references into the glossary and introductory sections could greatly strengthen the paper!
Specific comments:
L20 - Isn’t there an ongoing research effort to extend numerical models to utilise GPU hardware?
L24 - What about improvements in subgrid parameterizations due to better process understanding?
L65/66 - Maybe include a reference? (e.g. McGovern et al 2019 [1])
L104 - Very debatable if this is a necessary requirement for e.g. a weather prediction model?
L116ff - A narrative thread linking these subsections would be much appreciated!
L128 - Could it be more useful to briefly discuss the utility of the individual references rather than providing a large list?
L136 - Debatable, as recent trends in ML point strongly in the opposite direction (i.e. larger homogenised models).
L145 - Debatable as emphasis shifts towards self-supervised learning and better training regimes rather than architectural developments!
L156 - It could be important to emphasise that NN are known to interpolate within the training envelope and may not generalise well outside it (in contrast to physical laws).
L163 - Sigmoid is a highly uncommon and suboptimal choice of activation function compared to ReLU! (ReLU is also missing from the glossary despite its ubiquity in modern models).
L179 - Why are Token-sequence and Transformer models listed separately? I don’t see the justification for this classification introduced as is.
L521 - ConvLSTM were introduced in 2015 also in the context of nowcasting, including a reference would be appropriate [2].
L537 - Why is Sonderby et al discussed if nowcasting is supposedly omitted (L515)? Why not Espeholt et al 2021? Why is this not discussed in the context of probabilistic models?
L560 - A background reference to GNN either here or in the glossary could be beneficial! (e.g. Battaglia et al 2018 [3])
L581 - I would strongly object to treating the models in this section (excluding Clare et al) as probabilistic in contrast to the ones in the previous section! These models are fundamentally deterministic, in contrast to e.g. generative models such as Ravuri et al or true probabilistic models like Sonderby et al. Discussing the different types of ensembling used in these models could be valuable on its own (also referring to Scher et al [4]).
L661 - A large part of the affordable training is the use of much lower resolution and not due to the architecture! (1.4deg vs 0.25)
L663 - Missing several extensions of WeatherBench (e.g. WeatherBenchProbability, RainBench) and ClimateBench.
L981 - The activation function is applied elementwise to the result of a matrix multiplication and does not incorporate multiplication or bias addition by definition.
L995 - Calling it a “complex” mechanism is not necessary. Typical Attention simply computes a dot product between vectors.
L1007 - Normalisation plays an essential role in modern NN and probably deserves its own glossary term. It does not need to be performed over the batch (i.e. LayerNorm)
L1028 - “Convolutional … sliding window” seems redundant.
L1037 - Not true in general! If the network is too thin it becomes highly unstable.
Technical corrections:
DNN and NN are introduced as separate abbreviations, but the distinction is not kept consistent nor does it appear beneficial.
References:
[1] McGovern, A., R. Lagerquist, D. John Gagne, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the Black Box More Transparent: Understanding the Physical Implications of Machine Learning. Bull. Amer. Meteor. Soc., 100, 2175–2199, https://doi.org/10.1175/BAMS-D-18-0195.1.
[2] Shi, Xingjian, et al. "Convolutional LSTM network: A machine learning approach for precipitation nowcasting." Advances in neural information processing systems 28 (2015).
[3] Battaglia, Peter W., et al. "Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).
[4] Scher, Sebastian, and Gabriele Messori. "Ensemble methods for neural network‐based weather forecasts." Journal of Advances in Modeling Earth Systems 13.2 (2021).
Citation: https://doi.org/10.5194/egusphere-2023-350-RC1
Catherine Odelia de Burgh-Day and Tennessee Leeuwenburg
Catherine Odelia de Burgh-Day and Tennessee Leeuwenburg
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
473 | 283 | 10 | 766 | 4 | 5 |
- HTML: 473
- PDF: 283
- XML: 10
- Total: 766
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1