the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Using synthetic case studies to explore the spread and calibration of ensemble atmospheric dispersion forecasts
Abstract. Ensemble predictions of atmospheric dispersion that account for the meteorological uncertainties in a weather forecast are constructed by propagating the individual members of an ensemble numerical weather prediction forecast through an atmospheric dispersion model. Two event scenarios involving hypothetical atmospheric releases are considered: a near-surface radiological release from a nuclear power plant accident, and a large eruption of an Icelandic volcano releasing volcanic ash into the upper air. Simulations were run twice-daily in real time over a four month period to create a large data set of cases for this study. Performance of the ensemble predictions is measured against retrospective simulations using analysed meteorological fields. The focus of this paper is on comparing the spread of the ensemble members against forecast errors and on the calibration of probabilistic forecasts derived from the ensemble distribution.
Results show good overall performance by the dispersion ensembles in both studies, but with simulations for the upper air ash release generally performing better than those for the near-surface release of radiological material. The near-surface results demonstrate a sensitivity to the release location, with good performance in areas dominated by the synoptic-scale meteorology and generally poorer performance at some other sites where, we speculate, the global-scale meteorological ensemble used in this study has difficulty in adequately capturing the uncertainty from local and regional scale influences on the boundary layer. The ensemble tends to be under-spread, or over-confident, for the radiological case in general, especially at earlier forecast steps. The limited ensemble size of 18 members may also affect its ability to fully resolve peak values or adequately sample outlier regions. Probability forecasts of threshold exceedances show a reasonable degree of calibration, though the over-confident nature of the ensemble means that it tends to be too keen on using the extreme forecast probabilities.
Ensemble forecasts for the volcanic ash study demonstrate an appropriate degree of spread and are generally well-calibrated, particularly for ash concentration forecasts in the troposphere. The ensemble is slightly over-spread, or under-confident, within the troposphere at the first output time step T+6, thought to be attributable to a known deficiency in the ensemble perturbation scheme in use at the time of this study, but improves with probability forecasts becoming well-calibrated here by the end of the period. Conversely, an increasing tendency towards over-confident forecasts is seen in the stratosphere, which again mirrors an expectation for ensemble spread to fall away at higher altitudes in the met ensemble. Results in the volcanic ash case are also broadly similar between the three different eruption scenarios considered in the study, suggesting that good ensemble performance might apply to a wide range of eruptions with different heights and mass eruption rates.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(2380 KB)
-
Supplement
(1667 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2380 KB) - Metadata XML
-
Supplement
(1667 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-628', Slawomir Potempski, 08 May 2023
General comments
The paper deals with the problem of the propagation of meteorological forecast uncertainty through atmospheric dispersion model. The ensemble prediction system with 18 forecast members from the MOGREPS-G has been used for performing atmospheric dispersion simulations using NAME model, so the final output is in the form of the ensemble of atmospheric dispersion predictions. The investigation of the  spread and calibration of this ensemble is one of the main purposes of this work. Two main hypothetical scenarios have been investigated: low elevated radiolological release for selected 12 sites in Europe and high or even very high elevated 3 volcanic ash releases. Very extensive simulations for a period of 5 months with two releases daily for both scenarios have been performed. Finally, a huge set of data has been produced thus giving sound ground for any statistical analysis. The setup of such experiment is highly appreciated and can be considered as recommended for making deep analysis of the behaviour of any atmospheric dispersion ensemble system, in particular the ones used in operational mode. The final aim should be estimation of uncertainty of atmospheric dispersion modelling for various meteorological conditions. In this respect at some stage a comparison with other models and real measurements will be also necessary, but first proper calibration of the ensemble is one of the key factors, and this is why in the paper the authors concentrate on the analysis of the spread and calibration. However, it could be probably worth to put the work into a bit broader context, so the reader could better understand the whole process of uncertainty analysis and complexity of this problem, the more so a number of works have been already published aiming at the analysis of various types of ensembles, both from theoretical and practical points of view. It should also added that the added value of such extensive calculations producing large data, is such that various analyses can be performed, for example by comparing the results for different places or at different meteorological conditions.Â
Specific comments1. One of the basic questions related to the presented methodology is whether 18 members is enough to produce sufficient statistics to cover interested range of possible results. It seems that there are situations when this is not the case, and the authors are aware that either more ensemble members would be needed or other models can be applied. ECMWF produces large forecast ensembling that can be used to drive atmospheric dispersion calculations, however it'd be very time consuming. The other possibility is to produce multi-model ensemble, which usually has bigger spread than the ensemble based on one dispersion model. In fact there are many articles already published dealing with these issues.
2. Table 1 contains thresholds used for both scenarios. Obviously, in case of operational system, the best would be, when these thresholds reflect some criteria used operationally. For radiological scenario mostly doses are applied in various criteria, however in some countries, like Austria also time integrated concentration and deposition are used. For example some agriculture countermeasures can be implemented, if time integrated concentration of Cs-137 exceeds 350 Bq*s/m3 or depostion is higher than 650 Bq/m2 (for iodine I-131 this is respectively 170 Bq*s/m3 and 700 Bq/m2). Thresholds shown in Table 1 are much higher, but this is obviously arbitrary choice of the modellers.
3. The authors use quite simple indicators (rank histogram, attribute diagram, spread-error relation), but it seems they are mostly sufficient. On the other hand it would be convenient to see the values in the form of table (ensemble spread vs error in ensemble mean) to see how the results are changing in time. Some additional indicators can be also considered: like factor of 2 for spread-error diagram. Â
4. The way of rank maps presentation with two colour sections is appreciated. However, the reader should be warned against too simple interpretation of these maps. The fact that the ensemble system predicts areas where "real plume" (i.e. from analysis) are not present does not mean that the ensemble gave bad prognosis. If the ensemble shows low probability for such areas it is fine, otherwise you can say that prognosis was not very accurate. The role of ensemble is to predict areas when plume can, but not necessarily, must appear.Â
Technical correctionsThe main comment is related to the request of including mathematical formulas for quantities used in the article, firstly, in order to avoid any ambiguity, and secondly simply for the reader's convenience. This concerns also the way how the figures have been constructed.Â
Citation: https://doi.org/10.5194/egusphere-2023-628-RC1 -
RC2: 'Comment on egusphere-2023-628', Anonymous Referee #2, 09 Jun 2023
Review of "Using synthetic case studies to explore the spread and calibration of ensemble atmospheric dispersion forecasts" by Jones et al. (2023)
Â
Synopsis:
In this study, the performance of an ensemble of dispersion model forecasts based on an ensemble NWP system (MOGREPS) is evaluated. The ensemble forecasts are generated from hypothetical (radiological and volcanic) emissions at different locations within northwest Europe over a period of several months. The forecasted air concentrations and deposited masses from the ensemble system are evaluated against corresponding quantities obtained by running the dispersion model with a sequence of NWP analyses obtained from a high-resolution NWP model. The results indicate that the MOGREPS ensemble is generally under-spread near the surface and in the stratosphere. However, in the troposphere, the ensemble spread better matches the forecast error and the forecast probabilities appear to be well calibrated, matching observed frequencies quite well at lead times greater than about 6 h.
Â
General comments:
I have no major criticisms about this study. The methodology appears to be sound, and the results are generally in line with expectations given the characteristics of the MOGREPS ensemble. The use of concentrations obtained from the use of "analysed" NWP fields (essentially the fields obtained from NWP data assimilation) as "truth" is a good idea that averts the problem of finding high-quality observations of atmospheric pollutants such as volcanic ash in sufficient quantity, which can be a very difficult problem in practice. Having said that, verification against observed ash, even with limited data, would strengthen this paper. Another suggestion for the authors is to show a comparison of the ensemble mean RMSE and the control member RMSE scores as well corresponding RPS/CRPS values. This would better highlight the value of the ensemble approach over the deterministic approach and would enable the reader to judge whether the deficiencies of the ensemble near the surface or at high altitude are severe enough to make the additional computational cost of the ensemble unjustifiable for particular applications.
Â
Specific comments
Line 6: "Performance of the ensemble predictions is measured against retrospective simulations using analysed meteorological fields". I think something like this would make the methodology clearer for readers not familiar with NWP jargon: "Performance of the ensemble predictions is measured against retrospective simulations using a sequence of meteorological fields analysed against observations".
Line 61: (related to comment above) This is an opportunity to clarify the meaning of "analysed" meteorological fields.
Figure 5(b): Clarify what " #points in bin" mean and fix the number layout if possible.
Figure 5(c): Clarify that colour scheme matches labels in 5(d).
Line 688: "met" -> "meteorological".
Citation: https://doi.org/10.5194/egusphere-2023-628-RC2 - AC1: 'Author Comment on egusphere-2023-628', Andrew Jones, 18 Aug 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-628', Slawomir Potempski, 08 May 2023
General comments
The paper deals with the problem of the propagation of meteorological forecast uncertainty through atmospheric dispersion model. The ensemble prediction system with 18 forecast members from the MOGREPS-G has been used for performing atmospheric dispersion simulations using NAME model, so the final output is in the form of the ensemble of atmospheric dispersion predictions. The investigation of the  spread and calibration of this ensemble is one of the main purposes of this work. Two main hypothetical scenarios have been investigated: low elevated radiolological release for selected 12 sites in Europe and high or even very high elevated 3 volcanic ash releases. Very extensive simulations for a period of 5 months with two releases daily for both scenarios have been performed. Finally, a huge set of data has been produced thus giving sound ground for any statistical analysis. The setup of such experiment is highly appreciated and can be considered as recommended for making deep analysis of the behaviour of any atmospheric dispersion ensemble system, in particular the ones used in operational mode. The final aim should be estimation of uncertainty of atmospheric dispersion modelling for various meteorological conditions. In this respect at some stage a comparison with other models and real measurements will be also necessary, but first proper calibration of the ensemble is one of the key factors, and this is why in the paper the authors concentrate on the analysis of the spread and calibration. However, it could be probably worth to put the work into a bit broader context, so the reader could better understand the whole process of uncertainty analysis and complexity of this problem, the more so a number of works have been already published aiming at the analysis of various types of ensembles, both from theoretical and practical points of view. It should also added that the added value of such extensive calculations producing large data, is such that various analyses can be performed, for example by comparing the results for different places or at different meteorological conditions.Â
Specific comments1. One of the basic questions related to the presented methodology is whether 18 members is enough to produce sufficient statistics to cover interested range of possible results. It seems that there are situations when this is not the case, and the authors are aware that either more ensemble members would be needed or other models can be applied. ECMWF produces large forecast ensembling that can be used to drive atmospheric dispersion calculations, however it'd be very time consuming. The other possibility is to produce multi-model ensemble, which usually has bigger spread than the ensemble based on one dispersion model. In fact there are many articles already published dealing with these issues.
2. Table 1 contains thresholds used for both scenarios. Obviously, in case of operational system, the best would be, when these thresholds reflect some criteria used operationally. For radiological scenario mostly doses are applied in various criteria, however in some countries, like Austria also time integrated concentration and deposition are used. For example some agriculture countermeasures can be implemented, if time integrated concentration of Cs-137 exceeds 350 Bq*s/m3 or depostion is higher than 650 Bq/m2 (for iodine I-131 this is respectively 170 Bq*s/m3 and 700 Bq/m2). Thresholds shown in Table 1 are much higher, but this is obviously arbitrary choice of the modellers.
3. The authors use quite simple indicators (rank histogram, attribute diagram, spread-error relation), but it seems they are mostly sufficient. On the other hand it would be convenient to see the values in the form of table (ensemble spread vs error in ensemble mean) to see how the results are changing in time. Some additional indicators can be also considered: like factor of 2 for spread-error diagram. Â
4. The way of rank maps presentation with two colour sections is appreciated. However, the reader should be warned against too simple interpretation of these maps. The fact that the ensemble system predicts areas where "real plume" (i.e. from analysis) are not present does not mean that the ensemble gave bad prognosis. If the ensemble shows low probability for such areas it is fine, otherwise you can say that prognosis was not very accurate. The role of ensemble is to predict areas when plume can, but not necessarily, must appear.Â
Technical correctionsThe main comment is related to the request of including mathematical formulas for quantities used in the article, firstly, in order to avoid any ambiguity, and secondly simply for the reader's convenience. This concerns also the way how the figures have been constructed.Â
Citation: https://doi.org/10.5194/egusphere-2023-628-RC1 -
RC2: 'Comment on egusphere-2023-628', Anonymous Referee #2, 09 Jun 2023
Review of "Using synthetic case studies to explore the spread and calibration of ensemble atmospheric dispersion forecasts" by Jones et al. (2023)
Â
Synopsis:
In this study, the performance of an ensemble of dispersion model forecasts based on an ensemble NWP system (MOGREPS) is evaluated. The ensemble forecasts are generated from hypothetical (radiological and volcanic) emissions at different locations within northwest Europe over a period of several months. The forecasted air concentrations and deposited masses from the ensemble system are evaluated against corresponding quantities obtained by running the dispersion model with a sequence of NWP analyses obtained from a high-resolution NWP model. The results indicate that the MOGREPS ensemble is generally under-spread near the surface and in the stratosphere. However, in the troposphere, the ensemble spread better matches the forecast error and the forecast probabilities appear to be well calibrated, matching observed frequencies quite well at lead times greater than about 6 h.
Â
General comments:
I have no major criticisms about this study. The methodology appears to be sound, and the results are generally in line with expectations given the characteristics of the MOGREPS ensemble. The use of concentrations obtained from the use of "analysed" NWP fields (essentially the fields obtained from NWP data assimilation) as "truth" is a good idea that averts the problem of finding high-quality observations of atmospheric pollutants such as volcanic ash in sufficient quantity, which can be a very difficult problem in practice. Having said that, verification against observed ash, even with limited data, would strengthen this paper. Another suggestion for the authors is to show a comparison of the ensemble mean RMSE and the control member RMSE scores as well corresponding RPS/CRPS values. This would better highlight the value of the ensemble approach over the deterministic approach and would enable the reader to judge whether the deficiencies of the ensemble near the surface or at high altitude are severe enough to make the additional computational cost of the ensemble unjustifiable for particular applications.
Â
Specific comments
Line 6: "Performance of the ensemble predictions is measured against retrospective simulations using analysed meteorological fields". I think something like this would make the methodology clearer for readers not familiar with NWP jargon: "Performance of the ensemble predictions is measured against retrospective simulations using a sequence of meteorological fields analysed against observations".
Line 61: (related to comment above) This is an opportunity to clarify the meaning of "analysed" meteorological fields.
Figure 5(b): Clarify what " #points in bin" mean and fix the number layout if possible.
Figure 5(c): Clarify that colour scheme matches labels in 5(d).
Line 688: "met" -> "meteorological".
Citation: https://doi.org/10.5194/egusphere-2023-628-RC2 - AC1: 'Author Comment on egusphere-2023-628', Andrew Jones, 18 Aug 2023
Peer review completion
Journal article(s) based on this preprint
Data sets
Hypothetical ensemble dispersion model runs with statistical verification S. Leadbetter and A. Jones https://doi.org/10.5281/zenodo.4770066
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
229 | 68 | 15 | 312 | 38 | 8 | 9 |
- HTML: 229
- PDF: 68
- XML: 15
- Total: 312
- Supplement: 38
- BibTeX: 8
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Andrew Richard Jones
Susan J. Leadbetter
Matthew C. Hort
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2380 KB) - Metadata XML
-
Supplement
(1667 KB) - BibTeX
- EndNote
- Final revised paper