the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Brief Communication: Training of AI-based nowcasting models for rainfall early warning should take into account user requirements
Abstract. In the field of precipitation nowcasting, deep learning (DL) has emerged as an alternative to conventional tracking and extrapolation techniques. However, DL struggles to adequately predict heavy precipitation, which is essential in early warning. By taking into account specific user requirements, though, we can simplify the training task and boost predictive skill. As an example, we predict the cumulative precipitation of the next hour (instead of five minute increments), and the exceedance of thresholds (instead of numerical values). A dialogue between developers and users should identify the requirements to a nowcast, and how to consider these in model training.
- Preprint
(849 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2024-1945', Anonymous Referee #1, 20 Aug 2024
Review of 'Brief Communication: Training of AI-based nowcasting models for rainfall early warning should take into account user requirements' by G. Ayzel and M. Heistermann egusphere-2024-1945
General comments: this paper compares several versions of a neural network precipitation nowcasting system. The precipitation and verification data are taken from a radar analysis product. The prediction systems are upgrades of a machine learning system previously published by the authors. The main conclusion is that, on selected high precipitation events, a system that is trained for particular forecast ranges and intensities is better than a more generic rain nowcasting system.
The paper is well structured and easy to read. As recognized by the authors, the scientific significance of the result is rather limited, because one would obviously expect the scores of a statistical prediction method to benefit from training with forecast ranges, intensities and weather regimes consistent with those used as verification. Nevertheless, the paper has some educational value for pointing out to the machine learning community that statistical weather prediction models optimized with respect to generic metrics cannot, in general, be expected to perform competitively in terms of more specific metrics.General issues:
- the paper has 11 pages, which seems long for brief communication according to the NHESS website (it indicates a limit of 4 pages).
- not enough information is provided to understand key technical aspects of the study: in particular, there should be a more detailed description of the original RainNet architecture and of the motivation for its changes : what is EfficientNetB4 ? What is the relevance of LogCosh loss to heavy precipitation ? How do you define the threshold on the output to obtain 'a segmentation task' ?
- false alarms are a key problem when issuing warnings. The dataset used was taken from CatRaRE catalog, which is designed to contain observed extreme events. It means that the training and testing datasets are biased. Thus, the verification will likely underestimate the false alarm frequency, because the dataset excludes cases where the models generate a heavy precipitation forecast and none was observed. A solution would be to compute scores over the entire 2019-2020 testing period. Alternatively the paper should present some proof that the data sampling does not affect false alarm counts. This is the main scientific issue of the paper.
- significance testing is missing from the results, which is problematic for a paper about statistical prediction. At least, figures 2 and 3 should display some confidence intervals.
- the PySteps system is getting old. It would be more convincing to display the scores from a more recent nowcasting system such as DGMR, as a performance baseline.Specific issues:
- the mention of 'user requirements' in the title sounds a bit excessive, because generating rainfall warnings involves other considerations than the choice of accumulation period and threshold. It may be more appropriate to state that the paper demonstrates the sensitivity of nowcast performance to the choice of objective function.- please clarify how can the Jaccard loss can be differentiated, since the Jaccard metric is a ratio of integer success counts.
- the CSI score should be complemented by some information about the hit rate and false alarm rates (false negatives and false positives), as both are very important for the credibility of warnings.
- typo on line 190 'prediciton'
- Figure 3 is hard to read in terms of comparison between the systems. Since there is not much information in the dependency on scale, it may be better to present curves of FSS(range) at a fixed scale (say, 20km), instead. It would also facilitate the display of confidence intervals or statistical significance of the FSS differences.
Citation: https://doi.org/10.5194/egusphere-2024-1945-RC1 - AC1: 'Reply on RC1', Maik Heistermann, 06 Sep 2024
-
RC2: 'Comment on egusphere-2024-1945', Remko Uijlenhoet, 27 Aug 2024
General remarks
This brief communication makes the point that "training of AI-based nowcasting models for rainfall early warning should take into account user requirements". To illustrate this point, the paper presents an intercomparison of one-hour cumulative rainfall nowcasts in terms of threshold exceedances produced by several different (gauge-adjusted) radar-based rainfall nowcasting schemes, including the newly developed RainNet2024-S, which was specifically developed and trained to perform such threshold-based nowcasts, and the widely used pysteps model. Overall, the paper makes a valid point, is well-written and provides some clear and convincing illustrations. As such, it provides a timely and relevant perspective on the state of the art of rainfall nowcasting for the readers of NHESS.
Specific remarks
- As an example of a possible user requirement, the authors consider "the exceedance of thresholds (instead of numerical values)" (l. 4-5). They claim that "this has been rarely attempted so far" (l. 31-32). Note, however, that the use of rainfall thresholds is actually common practice in the area of "flash flood guidance" (e.g. Georgakakos et al., 2022; https://journals.ametsoc.org/view/journals/bams/103/3/BAMS-D-20-0241.1.xml).
- "As an example, we predict the cumulative precipitation of the next hour" (l. 4). Why this particular duration? What about longer lead times (probably necessitating the connection to numerical weather prediction models), which may also be of interest for certain applications?
- Concerning point 1 (l. 36-40): What can be said for the required temporal resolution from the perspective of user requirements also holds for the spatial scale of application, e.g. an urban area, a river catchment, etc. However, whereas the temporal aspect is considered here in quite some detail, the spatial scale of application (referring to the total size of the domain over which the forecast is produced and evaluated) appears to be neglected here.
- "on the same domain" (l.52-53): See previous remark concerning the spatial scale (i.e. domain) of application. In addition, for catchment hydrological applications, relevant spatial and temporal scales are related to each other: see e.g. Berne et al. (2004; https://www.sciencedirect.com/science/article/pii/S0022169404003634).
- "heavy rainfall objects (l.66): How are heavy rainfall "objects" defined exactly?
- "a spatial domain of 256 x 256 km" (l.80): This seems a rather arbitrary spatial scale (domain), except for the fact that it is 2^8 km x 2^8 km. How does this spatial domain "match" with the hourly time step that was chosen from a (rainfall-runoff) process perspective?
- "at a duration of six hours or less" (l.111): Why this particular range of durations? Was this choice the result of an interaction with stakeholders?
- "the kind of events [...] early warning context" (l.115-116): Have the authors interacted with stakeholders to define their (subjective choices of) lead time, rainfall thresholds, spatial resolution, scale of application (domain) and durations?
- Fig.2: It would be interesting and relevant to see the corresponding curves for 2-hour and 3-hour rainfall accumulations (why not up to 6 hours, the maximum duration selected) and different domain sizes, going from 256 km x 256 km down to 2 km x 2 km (following Lin et al., 2024; https://journals.ametsoc.org/view/journals/hydr/25/5/JHM-D-23-0194.1.xml).
- Fig.3, "spatial scale": Is this "scale" or is this "resolution", keeping the same 256 km x 256 km spatial domain? Ultimately, the domain size over which the statistics are calculated (related to the area of application of the nowcast) is also relevant from a practical (end-user / stakeholder) perspective.
- "the RainNet2024-S models clearly outperform all competitors across all precipitation thresholds" (l.152): This is certainly a nice performance of the presented model for the selected heavy rainfall events, but a question one may ask is how robust these results are in terms of false alarms?
- "or fields from numerical weather prediction models" (l.190): Including NWP forecasts could hopefully further increase the skillful lead time. What is the authors' perspective on merging ML-based radar rainfall nowcasts with NWP forecasts for seamless prediction up to longer lead times than just a few hours?
Editorial remarks
- l.58: "2018a,b" instead of "2018b,a"?
- l.80: "256 km x 256 km" rather than "256 x 256 km".
- Fig.1: "YW" presumably refers to "the RADKLIM_YW_2017.002 dataset (Winterrath et al., 2018b,a)" (l.58). However, this should be more clearly indicated (2x). Also, the authors may want to expand the caption of this figure into one that is slightly more informative.
- "kilometre-scale resolution" (l.165): Here, the term "resolution" is used, where previously (including in the caption of Fig. 3) the authors use "scale". It would be appropriate to clearly distinguish one from the other.
- "spatial scale" (l.186): See previous remark.
- l.253: "Brendel" instead of "Brend".
Remko Uijlenhoet
Citation: https://doi.org/10.5194/egusphere-2024-1945-RC2 - AC2: 'Reply on RC2', Maik Heistermann, 06 Sep 2024
Status: closed
-
RC1: 'Comment on egusphere-2024-1945', Anonymous Referee #1, 20 Aug 2024
Review of 'Brief Communication: Training of AI-based nowcasting models for rainfall early warning should take into account user requirements' by G. Ayzel and M. Heistermann egusphere-2024-1945
General comments: this paper compares several versions of a neural network precipitation nowcasting system. The precipitation and verification data are taken from a radar analysis product. The prediction systems are upgrades of a machine learning system previously published by the authors. The main conclusion is that, on selected high precipitation events, a system that is trained for particular forecast ranges and intensities is better than a more generic rain nowcasting system.
The paper is well structured and easy to read. As recognized by the authors, the scientific significance of the result is rather limited, because one would obviously expect the scores of a statistical prediction method to benefit from training with forecast ranges, intensities and weather regimes consistent with those used as verification. Nevertheless, the paper has some educational value for pointing out to the machine learning community that statistical weather prediction models optimized with respect to generic metrics cannot, in general, be expected to perform competitively in terms of more specific metrics.General issues:
- the paper has 11 pages, which seems long for brief communication according to the NHESS website (it indicates a limit of 4 pages).
- not enough information is provided to understand key technical aspects of the study: in particular, there should be a more detailed description of the original RainNet architecture and of the motivation for its changes : what is EfficientNetB4 ? What is the relevance of LogCosh loss to heavy precipitation ? How do you define the threshold on the output to obtain 'a segmentation task' ?
- false alarms are a key problem when issuing warnings. The dataset used was taken from CatRaRE catalog, which is designed to contain observed extreme events. It means that the training and testing datasets are biased. Thus, the verification will likely underestimate the false alarm frequency, because the dataset excludes cases where the models generate a heavy precipitation forecast and none was observed. A solution would be to compute scores over the entire 2019-2020 testing period. Alternatively the paper should present some proof that the data sampling does not affect false alarm counts. This is the main scientific issue of the paper.
- significance testing is missing from the results, which is problematic for a paper about statistical prediction. At least, figures 2 and 3 should display some confidence intervals.
- the PySteps system is getting old. It would be more convincing to display the scores from a more recent nowcasting system such as DGMR, as a performance baseline.Specific issues:
- the mention of 'user requirements' in the title sounds a bit excessive, because generating rainfall warnings involves other considerations than the choice of accumulation period and threshold. It may be more appropriate to state that the paper demonstrates the sensitivity of nowcast performance to the choice of objective function.- please clarify how can the Jaccard loss can be differentiated, since the Jaccard metric is a ratio of integer success counts.
- the CSI score should be complemented by some information about the hit rate and false alarm rates (false negatives and false positives), as both are very important for the credibility of warnings.
- typo on line 190 'prediciton'
- Figure 3 is hard to read in terms of comparison between the systems. Since there is not much information in the dependency on scale, it may be better to present curves of FSS(range) at a fixed scale (say, 20km), instead. It would also facilitate the display of confidence intervals or statistical significance of the FSS differences.
Citation: https://doi.org/10.5194/egusphere-2024-1945-RC1 - AC1: 'Reply on RC1', Maik Heistermann, 06 Sep 2024
-
RC2: 'Comment on egusphere-2024-1945', Remko Uijlenhoet, 27 Aug 2024
General remarks
This brief communication makes the point that "training of AI-based nowcasting models for rainfall early warning should take into account user requirements". To illustrate this point, the paper presents an intercomparison of one-hour cumulative rainfall nowcasts in terms of threshold exceedances produced by several different (gauge-adjusted) radar-based rainfall nowcasting schemes, including the newly developed RainNet2024-S, which was specifically developed and trained to perform such threshold-based nowcasts, and the widely used pysteps model. Overall, the paper makes a valid point, is well-written and provides some clear and convincing illustrations. As such, it provides a timely and relevant perspective on the state of the art of rainfall nowcasting for the readers of NHESS.
Specific remarks
- As an example of a possible user requirement, the authors consider "the exceedance of thresholds (instead of numerical values)" (l. 4-5). They claim that "this has been rarely attempted so far" (l. 31-32). Note, however, that the use of rainfall thresholds is actually common practice in the area of "flash flood guidance" (e.g. Georgakakos et al., 2022; https://journals.ametsoc.org/view/journals/bams/103/3/BAMS-D-20-0241.1.xml).
- "As an example, we predict the cumulative precipitation of the next hour" (l. 4). Why this particular duration? What about longer lead times (probably necessitating the connection to numerical weather prediction models), which may also be of interest for certain applications?
- Concerning point 1 (l. 36-40): What can be said for the required temporal resolution from the perspective of user requirements also holds for the spatial scale of application, e.g. an urban area, a river catchment, etc. However, whereas the temporal aspect is considered here in quite some detail, the spatial scale of application (referring to the total size of the domain over which the forecast is produced and evaluated) appears to be neglected here.
- "on the same domain" (l.52-53): See previous remark concerning the spatial scale (i.e. domain) of application. In addition, for catchment hydrological applications, relevant spatial and temporal scales are related to each other: see e.g. Berne et al. (2004; https://www.sciencedirect.com/science/article/pii/S0022169404003634).
- "heavy rainfall objects (l.66): How are heavy rainfall "objects" defined exactly?
- "a spatial domain of 256 x 256 km" (l.80): This seems a rather arbitrary spatial scale (domain), except for the fact that it is 2^8 km x 2^8 km. How does this spatial domain "match" with the hourly time step that was chosen from a (rainfall-runoff) process perspective?
- "at a duration of six hours or less" (l.111): Why this particular range of durations? Was this choice the result of an interaction with stakeholders?
- "the kind of events [...] early warning context" (l.115-116): Have the authors interacted with stakeholders to define their (subjective choices of) lead time, rainfall thresholds, spatial resolution, scale of application (domain) and durations?
- Fig.2: It would be interesting and relevant to see the corresponding curves for 2-hour and 3-hour rainfall accumulations (why not up to 6 hours, the maximum duration selected) and different domain sizes, going from 256 km x 256 km down to 2 km x 2 km (following Lin et al., 2024; https://journals.ametsoc.org/view/journals/hydr/25/5/JHM-D-23-0194.1.xml).
- Fig.3, "spatial scale": Is this "scale" or is this "resolution", keeping the same 256 km x 256 km spatial domain? Ultimately, the domain size over which the statistics are calculated (related to the area of application of the nowcast) is also relevant from a practical (end-user / stakeholder) perspective.
- "the RainNet2024-S models clearly outperform all competitors across all precipitation thresholds" (l.152): This is certainly a nice performance of the presented model for the selected heavy rainfall events, but a question one may ask is how robust these results are in terms of false alarms?
- "or fields from numerical weather prediction models" (l.190): Including NWP forecasts could hopefully further increase the skillful lead time. What is the authors' perspective on merging ML-based radar rainfall nowcasts with NWP forecasts for seamless prediction up to longer lead times than just a few hours?
Editorial remarks
- l.58: "2018a,b" instead of "2018b,a"?
- l.80: "256 km x 256 km" rather than "256 x 256 km".
- Fig.1: "YW" presumably refers to "the RADKLIM_YW_2017.002 dataset (Winterrath et al., 2018b,a)" (l.58). However, this should be more clearly indicated (2x). Also, the authors may want to expand the caption of this figure into one that is slightly more informative.
- "kilometre-scale resolution" (l.165): Here, the term "resolution" is used, where previously (including in the caption of Fig. 3) the authors use "scale". It would be appropriate to clearly distinguish one from the other.
- "spatial scale" (l.186): See previous remark.
- l.253: "Brendel" instead of "Brend".
Remko Uijlenhoet
Citation: https://doi.org/10.5194/egusphere-2024-1945-RC2 - AC2: 'Reply on RC2', Maik Heistermann, 06 Sep 2024
Model code and software
The RainNet2024 family of deep neural networks for precipitation nowcasting Georgy Ayzel and Maik Heistermann https://doi.org/10.5281/zenodo.12547127
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
191 | 84 | 106 | 381 | 9 | 10 |
- HTML: 191
- PDF: 84
- XML: 106
- Total: 381
- BibTeX: 9
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1