Using synthetic case studies to explore the spread and calibration of ensemble atmospheric dispersion forecasts

Jones, Andrew Richard; Leadbetter, Susan J.; Hort, Matthew C.

doi:https://doi.org/10.5194/egusphere-2023-628

Preprints

https://doi.org/10.5194/egusphere-2023-628

Preprints

17 Apr 2023

| 17 Apr 2023

Using synthetic case studies to explore the spread and calibration of ensemble atmospheric dispersion forecasts

Andrew Richard Jones, Susan J. Leadbetter, and Matthew C. Hort

Abstract. Ensemble predictions of atmospheric dispersion that account for the meteorological uncertainties in a weather forecast are constructed by propagating the individual members of an ensemble numerical weather prediction forecast through an atmospheric dispersion model. Two event scenarios involving hypothetical atmospheric releases are considered: a near-surface radiological release from a nuclear power plant accident, and a large eruption of an Icelandic volcano releasing volcanic ash into the upper air. Simulations were run twice-daily in real time over a four month period to create a large data set of cases for this study. Performance of the ensemble predictions is measured against retrospective simulations using analysed meteorological fields. The focus of this paper is on comparing the spread of the ensemble members against forecast errors and on the calibration of probabilistic forecasts derived from the ensemble distribution.

Results show good overall performance by the dispersion ensembles in both studies, but with simulations for the upper air ash release generally performing better than those for the near-surface release of radiological material. The near-surface results demonstrate a sensitivity to the release location, with good performance in areas dominated by the synoptic-scale meteorology and generally poorer performance at some other sites where, we speculate, the global-scale meteorological ensemble used in this study has difficulty in adequately capturing the uncertainty from local and regional scale influences on the boundary layer. The ensemble tends to be under-spread, or over-confident, for the radiological case in general, especially at earlier forecast steps. The limited ensemble size of 18 members may also affect its ability to fully resolve peak values or adequately sample outlier regions. Probability forecasts of threshold exceedances show a reasonable degree of calibration, though the over-confident nature of the ensemble means that it tends to be too keen on using the extreme forecast probabilities.

Ensemble forecasts for the volcanic ash study demonstrate an appropriate degree of spread and are generally well-calibrated, particularly for ash concentration forecasts in the troposphere. The ensemble is slightly over-spread, or under-confident, within the troposphere at the first output time step T+6, thought to be attributable to a known deficiency in the ensemble perturbation scheme in use at the time of this study, but improves with probability forecasts becoming well-calibrated here by the end of the period. Conversely, an increasing tendency towards over-confident forecasts is seen in the stratosphere, which again mirrors an expectation for ensemble spread to fall away at higher altitudes in the met ensemble. Results in the volcanic ash case are also broadly similar between the three different eruption scenarios considered in the study, suggesting that good ensemble performance might apply to a wide range of eruptions with different heights and mass eruption rates.

Received: 31 Mar 2023 – Discussion started: 17 Apr 2023

Download & links

Preprint (PDF, 2380 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2380 KB)

Supplement (1667 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

09 Oct 2023

Using synthetic case studies to explore the spread and calibration of ensemble atmospheric dispersion forecasts

Andrew R. Jones, Susan J. Leadbetter, and Matthew C. Hort

Atmos. Chem. Phys., 23, 12477–12503, https://doi.org/10.5194/acp-23-12477-2023,https://doi.org/10.5194/acp-23-12477-2023, 2023

Short summary

Andrew Richard Jones et al.

Interactive discussion

Status: closed

RC1: 'Comment on egusphere-2023-628', Slawomir Potempski, 08 May 2023

General comments
The paper deals with the problem of the propagation of meteorological forecast uncertainty through atmospheric dispersion model. The ensemble prediction system with 18 forecast members from the MOGREPS-G has been used for performing atmospheric dispersion simulations using NAME model, so the final output is in the form of the ensemble of atmospheric dispersion predictions. The investigation of the spread and calibration of this ensemble is one of the main purposes of this work. Two main hypothetical scenarios have been investigated: low elevated radiolological release for selected 12 sites in Europe and high or even very high elevated 3 volcanic ash releases. Very extensive simulations for a period of 5 months with two releases daily for both scenarios have been performed. Finally, a huge set of data has been produced thus giving sound ground for any statistical analysis. The setup of such experiment is highly appreciated and can be considered as recommended for making deep analysis of the behaviour of any atmospheric dispersion ensemble system, in particular the ones used in operational mode. The final aim should be estimation of uncertainty of atmospheric dispersion modelling for various meteorological conditions. In this respect at some stage a comparison with other models and real measurements will be also necessary, but first proper calibration of the ensemble is one of the key factors, and this is why in the paper the authors concentrate on the analysis of the spread and calibration. However, it could be probably worth to put the work into a bit broader context, so the reader could better understand the whole process of uncertainty analysis and complexity of this problem, the more so a number of works have been already published aiming at the analysis of various types of ensembles, both from theoretical and practical points of view. It should also added that the added value of such extensive calculations producing large data, is such that various analyses can be performed, for example by comparing the results for different places or at different meteorological conditions.

Specific comments
1. One of the basic questions related to the presented methodology is whether 18 members is enough to produce sufficient statistics to cover interested range of possible results. It seems that there are situations when this is not the case, and the authors are aware that either more ensemble members would be needed or other models can be applied. ECMWF produces large forecast ensembling that can be used to drive atmospheric dispersion calculations, however it'd be very time consuming. The other possibility is to produce multi-model ensemble, which usually has bigger spread than the ensemble based on one dispersion model. In fact there are many articles already published dealing with these issues.
2. Table 1 contains thresholds used for both scenarios. Obviously, in case of operational system, the best would be, when these thresholds reflect some criteria used operationally. For radiological scenario mostly doses are applied in various criteria, however in some countries, like Austria also time integrated concentration and deposition are used. For example some agriculture countermeasures can be implemented, if time integrated concentration of Cs-137 exceeds 350 Bq*s/m3 or depostion is higher than 650 Bq/m2 (for iodine I-131 this is respectively 170 Bq*s/m3 and 700 Bq/m2). Thresholds shown in Table 1 are much higher, but this is obviously arbitrary choice of the modellers.
3. The authors use quite simple indicators (rank histogram, attribute diagram, spread-error relation), but it seems they are mostly sufficient. On the other hand it would be convenient to see the values in the form of table (ensemble spread vs error in ensemble mean) to see how the results are changing in time. Some additional indicators can be also considered: like factor of 2 for spread-error diagram.
4. The way of rank maps presentation with two colour sections is appreciated. However, the reader should be warned against too simple interpretation of these maps. The fact that the ensemble system predicts areas where "real plume" (i.e. from analysis) are not present does not mean that the ensemble gave bad prognosis. If the ensemble shows low probability for such areas it is fine, otherwise you can say that prognosis was not very accurate. The role of ensemble is to predict areas when plume can, but not necessarily, must appear.

Technical corrections
The main comment is related to the request of including mathematical formulas for quantities used in the article, firstly, in order to avoid any ambiguity, and secondly simply for the reader's convenience. This concerns also the way how the figures have been constructed.

Citation: https://doi.org/10.5194/egusphere-2023-628-RC1
RC2: 'Comment on egusphere-2023-628', Anonymous Referee #2, 09 Jun 2023

Review of "Using synthetic case studies to explore the spread and calibration of ensemble atmospheric dispersion forecasts" by Jones et al. (2023)

Synopsis:
In this study, the performance of an ensemble of dispersion model forecasts based on an ensemble NWP system (MOGREPS) is evaluated. The ensemble forecasts are generated from hypothetical (radiological and volcanic) emissions at different locations within northwest Europe over a period of several months. The forecasted air concentrations and deposited masses from the ensemble system are evaluated against corresponding quantities obtained by running the dispersion model with a sequence of NWP analyses obtained from a high-resolution NWP model. The results indicate that the MOGREPS ensemble is generally under-spread near the surface and in the stratosphere. However, in the troposphere, the ensemble spread better matches the forecast error and the forecast probabilities appear to be well calibrated, matching observed frequencies quite well at lead times greater than about 6 h.

General comments:
I have no major criticisms about this study. The methodology appears to be sound, and the results are generally in line with expectations given the characteristics of the MOGREPS ensemble. The use of concentrations obtained from the use of "analysed" NWP fields (essentially the fields obtained from NWP data assimilation) as "truth" is a good idea that averts the problem of finding high-quality observations of atmospheric pollutants such as volcanic ash in sufficient quantity, which can be a very difficult problem in practice. Having said that, verification against observed ash, even with limited data, would strengthen this paper. Another suggestion for the authors is to show a comparison of the ensemble mean RMSE and the control member RMSE scores as well corresponding RPS/CRPS values. This would better highlight the value of the ensemble approach over the deterministic approach and would enable the reader to judge whether the deficiencies of the ensemble near the surface or at high altitude are severe enough to make the additional computational cost of the ensemble unjustifiable for particular applications.

Specific comments
Line 6: "Performance of the ensemble predictions is measured against retrospective simulations using analysed meteorological fields". I think something like this would make the methodology clearer for readers not familiar with NWP jargon: "Performance of the ensemble predictions is measured against retrospective simulations using a sequence of meteorological fields analysed against observations".
Line 61: (related to comment above) This is an opportunity to clarify the meaning of "analysed" meteorological fields.
Figure 5(b): Clarify what " #points in bin" mean and fix the number layout if possible.
Figure 5(c): Clarify that colour scheme matches labels in 5(d).
Line 688: "met" -> "meteorological".

Citation: https://doi.org/10.5194/egusphere-2023-628-RC2
AC1: 'Author Comment on egusphere-2023-628', Andrew Jones, 18 Aug 2023

Attaching Author Comment in response to Referee Comments.

Citation: https://doi.org/10.5194/egusphere-2023-628-AC1

Interactive discussion

Status: closed

RC1: 'Comment on egusphere-2023-628', Slawomir Potempski, 08 May 2023

General comments
The paper deals with the problem of the propagation of meteorological forecast uncertainty through atmospheric dispersion model. The ensemble prediction system with 18 forecast members from the MOGREPS-G has been used for performing atmospheric dispersion simulations using NAME model, so the final output is in the form of the ensemble of atmospheric dispersion predictions. The investigation of the spread and calibration of this ensemble is one of the main purposes of this work. Two main hypothetical scenarios have been investigated: low elevated radiolological release for selected 12 sites in Europe and high or even very high elevated 3 volcanic ash releases. Very extensive simulations for a period of 5 months with two releases daily for both scenarios have been performed. Finally, a huge set of data has been produced thus giving sound ground for any statistical analysis. The setup of such experiment is highly appreciated and can be considered as recommended for making deep analysis of the behaviour of any atmospheric dispersion ensemble system, in particular the ones used in operational mode. The final aim should be estimation of uncertainty of atmospheric dispersion modelling for various meteorological conditions. In this respect at some stage a comparison with other models and real measurements will be also necessary, but first proper calibration of the ensemble is one of the key factors, and this is why in the paper the authors concentrate on the analysis of the spread and calibration. However, it could be probably worth to put the work into a bit broader context, so the reader could better understand the whole process of uncertainty analysis and complexity of this problem, the more so a number of works have been already published aiming at the analysis of various types of ensembles, both from theoretical and practical points of view. It should also added that the added value of such extensive calculations producing large data, is such that various analyses can be performed, for example by comparing the results for different places or at different meteorological conditions.

Specific comments
1. One of the basic questions related to the presented methodology is whether 18 members is enough to produce sufficient statistics to cover interested range of possible results. It seems that there are situations when this is not the case, and the authors are aware that either more ensemble members would be needed or other models can be applied. ECMWF produces large forecast ensembling that can be used to drive atmospheric dispersion calculations, however it'd be very time consuming. The other possibility is to produce multi-model ensemble, which usually has bigger spread than the ensemble based on one dispersion model. In fact there are many articles already published dealing with these issues.
2. Table 1 contains thresholds used for both scenarios. Obviously, in case of operational system, the best would be, when these thresholds reflect some criteria used operationally. For radiological scenario mostly doses are applied in various criteria, however in some countries, like Austria also time integrated concentration and deposition are used. For example some agriculture countermeasures can be implemented, if time integrated concentration of Cs-137 exceeds 350 Bq*s/m3 or depostion is higher than 650 Bq/m2 (for iodine I-131 this is respectively 170 Bq*s/m3 and 700 Bq/m2). Thresholds shown in Table 1 are much higher, but this is obviously arbitrary choice of the modellers.
3. The authors use quite simple indicators (rank histogram, attribute diagram, spread-error relation), but it seems they are mostly sufficient. On the other hand it would be convenient to see the values in the form of table (ensemble spread vs error in ensemble mean) to see how the results are changing in time. Some additional indicators can be also considered: like factor of 2 for spread-error diagram.
4. The way of rank maps presentation with two colour sections is appreciated. However, the reader should be warned against too simple interpretation of these maps. The fact that the ensemble system predicts areas where "real plume" (i.e. from analysis) are not present does not mean that the ensemble gave bad prognosis. If the ensemble shows low probability for such areas it is fine, otherwise you can say that prognosis was not very accurate. The role of ensemble is to predict areas when plume can, but not necessarily, must appear.

Technical corrections
The main comment is related to the request of including mathematical formulas for quantities used in the article, firstly, in order to avoid any ambiguity, and secondly simply for the reader's convenience. This concerns also the way how the figures have been constructed.

Citation: https://doi.org/10.5194/egusphere-2023-628-RC1
RC2: 'Comment on egusphere-2023-628', Anonymous Referee #2, 09 Jun 2023

Review of "Using synthetic case studies to explore the spread and calibration of ensemble atmospheric dispersion forecasts" by Jones et al. (2023)

Synopsis:
In this study, the performance of an ensemble of dispersion model forecasts based on an ensemble NWP system (MOGREPS) is evaluated. The ensemble forecasts are generated from hypothetical (radiological and volcanic) emissions at different locations within northwest Europe over a period of several months. The forecasted air concentrations and deposited masses from the ensemble system are evaluated against corresponding quantities obtained by running the dispersion model with a sequence of NWP analyses obtained from a high-resolution NWP model. The results indicate that the MOGREPS ensemble is generally under-spread near the surface and in the stratosphere. However, in the troposphere, the ensemble spread better matches the forecast error and the forecast probabilities appear to be well calibrated, matching observed frequencies quite well at lead times greater than about 6 h.

General comments:
I have no major criticisms about this study. The methodology appears to be sound, and the results are generally in line with expectations given the characteristics of the MOGREPS ensemble. The use of concentrations obtained from the use of "analysed" NWP fields (essentially the fields obtained from NWP data assimilation) as "truth" is a good idea that averts the problem of finding high-quality observations of atmospheric pollutants such as volcanic ash in sufficient quantity, which can be a very difficult problem in practice. Having said that, verification against observed ash, even with limited data, would strengthen this paper. Another suggestion for the authors is to show a comparison of the ensemble mean RMSE and the control member RMSE scores as well corresponding RPS/CRPS values. This would better highlight the value of the ensemble approach over the deterministic approach and would enable the reader to judge whether the deficiencies of the ensemble near the surface or at high altitude are severe enough to make the additional computational cost of the ensemble unjustifiable for particular applications.

Specific comments
Line 6: "Performance of the ensemble predictions is measured against retrospective simulations using analysed meteorological fields". I think something like this would make the methodology clearer for readers not familiar with NWP jargon: "Performance of the ensemble predictions is measured against retrospective simulations using a sequence of meteorological fields analysed against observations".
Line 61: (related to comment above) This is an opportunity to clarify the meaning of "analysed" meteorological fields.
Figure 5(b): Clarify what " #points in bin" mean and fix the number layout if possible.
Figure 5(c): Clarify that colour scheme matches labels in 5(d).
Line 688: "met" -> "meteorological".

Citation: https://doi.org/10.5194/egusphere-2023-628-RC2
AC1: 'Author Comment on egusphere-2023-628', Andrew Jones, 18 Aug 2023

Attaching Author Comment in response to Referee Comments.

Citation: https://doi.org/10.5194/egusphere-2023-628-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Andrew Jones on behalf of the Authors (21 Aug 2023) Author's response Author's tracked changes Manuscript

ED: Publish as is (28 Aug 2023) by Stefano Galmarini

AR by Andrew Jones on behalf of the Authors (30 Aug 2023)

Journal article(s) based on this preprint

09 Oct 2023

Using synthetic case studies to explore the spread and calibration of ensemble atmospheric dispersion forecasts

Andrew R. Jones, Susan J. Leadbetter, and Matthew C. Hort

Atmos. Chem. Phys., 23, 12477–12503, https://doi.org/10.5194/acp-23-12477-2023,https://doi.org/10.5194/acp-23-12477-2023, 2023

Short summary

Andrew Richard Jones et al.

Supplement

https://doi.org/10.5194/egusphere-2023-628-supplement

Data sets

Hypothetical ensemble dispersion model runs with statistical verification S. Leadbetter and A. Jones https://doi.org/10.5281/zenodo.4770066

Andrew Richard Jones et al.

Viewed

Total article views: 312 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
229	68	15	312	38	8	9

HTML: 229
PDF: 68
XML: 15
Total: 312
Supplement: 38
BibTeX: 8
EndNote: 9

Views and downloads (calculated since 17 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	106	29	5	140
May 2023	41	9	2	52
Jun 2023	30	6	2	38
Jul 2023	11	6	1	18
Aug 2023	19	4	2	25
Sep 2023	18	10	2	30
Oct 2023	4	4	1	9

Cumulative views and downloads (calculated since 17 Apr 2023)

Month	HTML	PDF	XML	Total
Apr 2023	106	29	5	140
May 2023	41	9	2	52
Jun 2023	30	6	2	38
Jul 2023	11	6	1	18
Aug 2023	19	4	2	25
Sep 2023	18	10	2	30
Oct 2023	4	4	1	9

Viewed (geographical distribution)

Total article views: 310 (including HTML, PDF, and XML) Thereof 310 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 09 Oct 2023

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2380 KB)
Metadata XML

Short summary

The paper explores spread and calibration properties of ensemble atmospheric dispersion forecasts for hypothetical release events. Real-time forecasts from an ensemble weather prediction system were used to generate an ensemble of dispersion predictions and assessed against simulations produced using analysis meteorology. Results demonstrate good performance overall, but highlight more skilful predictions for material released in the upper air compared with releases near to the surface.


Total:	0
HTML:	0
PDF:	0
XML:	0