the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Exploring extreme event attribution by using longrunning meteorological observations
Abstract. Despite a growing interest in extreme event attribution, attributing individual weather events remains difficult and uncertain. We have explored extreme event attribution by comparing a widely adopted method for probabilistic extreme event attribution to a more analogue approach utilising the extensive, and longrunning, network of meteorological observations available in Sweden. The long observational records enabled us to calculate the change in probability for two recent extreme events in Sweden without relying on the correlation to the global mean surface temperature, as is usually done in the reference method. Our results indicate that the two methods generally agree on the sign of attribution for an event based on daily maximum temperatures. However, the reference method results in a weaker indication of attribution compared to the observations, where 12 out of 15 stations indicate a stronger attribution than found by the reference method. On the other hand, for a recent extreme precipitation event, the reference method results in a stronger indication of attribution compared to the observations. For this event, only two out of ten stations exhibited results similar to the reference method.

Notice on discussion status
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint
(1808 KB)

The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(1808 KB)  Metadata XML
 BibTeX
 EndNote
 Final revised paper
Journal article(s) based on this preprint
Our results show that while both methods lead to similar conclusions for two recent weather events in Sweden, the commonly used method risks underestimating the strength of the connection between the event and changes to the climate.
Interactive discussion
Status: closed

RC1: 'Comment on egusphere20232879', Vikki Thompson, 10 Jan 2024
Exploring extreme event attribution by using longrunning meteorological observations
This paper assesses two methods of event attribution for two events in Sweden – a hot summer and an intense rainfall event. It is shown that the two different methods agree reasonably for the temperature event, but show more disagreement for rainfall.
I found the introduction and methods clearly written and enjoyable to read, with good background literature (although some perhaps less relevant to this particular study). The methods are clearly explained and the results from the second method well presented– but I struggled to identify the results from the first method, or a clear comparison between methods.
General comments:
Clearly labelling the two methods from the outset would be useful – the header of 2.2 is misleading, as observations are also used in the first method. Something like ‘GMST adjusted method’ and ‘using preindustrial observations’ would be more accurate.
In order to provide a comparison of the two methods a more thorough presentation of the results of the GMST adjusted method is needed. The GMST adjusted method could be applied to the same datasets as used in the preindustrial observations data – I am not clear if it is.
The study presents the use of preindustrial data for attribution as a good alternative to the GMST adjusted method, without full discussion of possible problems with the method. Greater emphasis on possible downfall of the preindustrial observational data would be useful. One major advantage of using a shorter observational record with GMST is that the data needed is available for more locations and variable globally. Although long observational records are available in Sweden, there are many parts of the world where this is not the case – this should be better highlighted.
Specific comments:
Title – doesn't capture the content, too vague
Abstract  ‘analogue approach’ this term is widely used for a different method using dynamical analogues (e.g. Climameter). Perhaps adding ‘statistical’ would make it more clear what you are doing (see also comment above about labelling the two methods).
Paragraph at line 40 could come sooner in the introduction (around line 23) as the two paragraphs either side flow better together (and have some repetition in the localglobal responses).
Lines 2530 perhaps irrelevant to this study as physical processes are not covered in this statistical assessment.
Line 18 – what types of events?
Fig1 – I find this a little unclear, is p0 more likely hot than p1?
To include or not the event in question?
Why days greater than 25, not just Tmax?
Fig.3 caption typo in dates (18821992)  and fig 6
Fig.6  no stations have at least one year missing 15% of days (none have the cross)? Is that correct?
e.g. line 160, figA3, Interchanging use of historical and preindustrial for the 18821911 period – I think it would be clearer to use preindustrial throughout as historical could mean any past period (I think sometimes you use it to refer to the full historical/observational record).
Paragraph at line 88 could be shortened, as the methods described are not those used in this study – perhaps it would be better to start with paragraph at line 97 stating what is done in this study, then mention that there are other methods used elsewhere.
Line 112 – data for this study / event definition, a subheader would be useful here.
Header 2.2  observations are used in the first method too
Section 2.3  the climatic indicators have already been mentioned in the section above, maybe this should go into an event definition section – which perhaps could be section 2.1, before the two methods.
Citation: https://doi.org/10.5194/egusphere20232879RC1  AC1: 'Reply on RC1', Erik Holmgren, 20 Mar 2024

RC2: 'Comment on egusphere20232879', Clair Barnes, 15 Feb 2024
For two classes of weather events – the frequency of extremely hot days and the maximum 1day precipitation accumulation  this paper compares the estimated fraction of risk attributable (FAR) to climate change using two approaches to event attribution: first by estimating exceedance probabilities in 30year time slices representing the factual and counterfactual climates, which are assumed to be stationary, at individual stations; and then using a nonstationary trend fitted to a spatial average computed from gridded data products. Both methods are found to produce relatively similar results for the FAR of the number of very hot days, while the FAR for extreme precipitation is found to be somewhat variable, particularly in the station data.
The paper is clearly written, and the discussion around potential homogeneity issues in the station observations is a useful and important one. However, it’s not entirely clear to me what the purpose of the comparison is here, or what the overall conclusions should be. This is perhaps because one method is used with station data, and another with gridded data, so it’s hard to understand whether differences in the results arise from the dataset or the method used: I think this could be a really useful comparison if both methods were used with both station and gridded data.
General comments:

The nonstationary method used here seems to only use 30 years of recent data to estimate the covariate β describing the strength of the relationship between the extreme and GMST (lines 114115). This is a very short time series: usually in WWA studies we would use as much data as possible to estimate this parameter, partly because a large sample size is usually needed to get stable estimates of the model parameters and partly to reduce the risk of conflating the GMST trend with decadal variability. I would suggest using longer time series to fit the trends, which would give a really useful and interesting comparison of whether the linear regression really captures the changes between the two snapshots. If that’s not possible due to data availability, you should highlight that only 30 years of data were used to estimate the nonstationary model parameters, and discuss what the implications might be.

A GEV or Gumbel distribution is used to model block maxima/minima: there’s no theoretical basis on which to use them to model txge25, which is a count variable. To simplify the statistical modelling, I’d suggest looking at maximum temperatures instead; if that’s not feasible, you could try fitting a nonstationary Gaussian distribution to the log of the counts.

My understanding is that, since both p_0 and p_1 are nonnegative, the FAR can never be greater than 1 (equation 2): however, in both Figures 5 and 7 it looks as though FARs above 1 occur, although the axes are truncated at 1 so it’s hard to see. Please check this, and also modify the axes so that the upper bounds of the confidence intervals are visible.
Specific comments:
Abstract: I find the terminology here a little vague: it’s not clear what is meant by ‘the reference method’ and, since both methods use obervations of some sort, this doesn’t help to understand which is which. It would be useful to add a line explaining that the ‘widely adopted’ method uses a transient/nonstationary model, and rather than referring to the ‘analogue approach’ (which is becoming synonymous with another method), I would perhaps refer to a factual/counterfactual comparison.
46, 52: The reader doesn’t know what ‘the reference method’ is yet, or ‘shifting and scaling’ – this needs some introduction.
6263 & 6768: The rapid attribution method could also be used on the longrunning meteorological observations, so I think it would be useful to distinguish more clearly between the two methods: maybe ‘we will also perform an analysis based on directly comparing the current and preindustrial periods in data from several stations with long observational records’. Also some repetition here, so 6163 could be removed altogether.
77: This should be more precisely defined: p_1 and p_0 are the probabilities of observing an event of equal or greater magnitude than some threshold value in the factual (current) and counterfactual (preindistrial) climates (‘exceedance probabilities’).
80. I found this a bit unclear – maybe ‘FAR describes the proportion of events of the same (or greater) magnitude that can be attributed to the forced change’?
84. Change to ‘The exceedance probability’
8896. I don’t think I’ve seen examples of climate models being used to estimate p_0, although they are certainly used to estimate probability ratios. I’d suggest moving this paragraph to the description of the datasets
1012. Not all distributions have these three parameters: to make this more general, I’d remove this line and simply say that ‘the mean \mu’ is shifted following…
105. ‘\mu and the standard deviation \sigma are…’
112128. This breaks up the flow a bit – I’d move this (and maybe 8896) into a separate subsection on datasets.
Figure 1. This was quite hard to read in black & white, could you change the colour scheme to something more colourblindfriendly?
129130. The WWA approach outlined in Philip et al. (2020) uses maximum likelihood estimation to estimate the parameters of a nonstationary GEV distribution directly from (35), rather than first fitting a linear regression to estimate the trend and then estimating the parameters of a stationary GEV separately. I wouldn’t expect this to make much difference to the overall conclusions but this should be checked and commented on – you can fit the nonstationary GEV distributions using the online Climate Explorer tool provided by KNMI (first upload the time series, then choose the ‘trends in return times of extremes’ option).
It would also be useful to be clearer about which time period was used for the regression and parameter estimation – and, if only 30 years is used, this would be a good opportunity to discuss the implications of using a relatively short time series.
131. How was this 95% interval estimated?
1412. Does this mean that the spread of all members was used to determine the confidence bounds? How was this done – was a parametric distribution used, or order statistics?
1589. Why was stationarity checked, and over which period? I can see the advantage of checking that each of the time periods studied could be treated as locally stationary, but as written, this could be read as suggesting that the full series was found to be stationary.
1757. You could add a line to explain why this is: the GEV with negative shape parameter has a finite upper bound, which can lead to observed events becoming theoretically impossibly in the shifted/scaled distribution. The Gumbel distribution, which has its shape parameter fixed at zero, has no upper limit and so does not exhibit this behaviour.
1789. As noted above, a Gumbel distribution isn’t theoretically justified for count data like txge25.
189. Please add a line interpreting this FAR in terms of the number of hot days.
191. How are percentiles of the FAR computed? Also, this notation is slightly confusing, because p_0 and p_1 have already been used to denote exceedance probabilities. Percentiles could perhaps be relabelled as Q_5.
Figure 4. I would expect the size of the circles to represent a range of values  it's not clear exactly what they refer to here.
2134. Why are P_5 and P_25 given here, rather than an upper and lower limit?
Figure 5 & 7. I don’t understand how the FAR is greater than one in some of these cases – perhaps some additional scaling has been applied? Please extend the xaxis to show the upper bounds of the confidence intervals. It would also be useful to add a vertical line at 0, highlighting the critical threshold for evidence of an effect.
236240. You could also discuss the fact that observations of precipitation are typically more variable than observations of temperatures; and that gridded data, by its very nature, will not tend to contain such extreme extreme values as a single station, which may result in a better constrained distribution. When trying to fit a distribution to only 30 years of data we don’t really expect to get an accurate estimate of the return level of any events with a return period of greater than 30 years (or, conversely, an accurate estimate of the return period of particularly extreme events): this may also lead to inflated estimates of the return period, which can in turn make the PR and FAR estimates unstable.
251258. I think this could fit better in section 2.3, where the climate indicators are introduced.
259262. This discussion of spatial variability in the trend is really interesting and could be referred to in the discussion of variation between the stations – I’d like to see Figure A1 in the main text, perhaps with the station regression coefficients overlaid so that the similarities/differences between the gridded product and the stations are really clear.
292. You could mention that attribution of extreme precipitation events is known to be sensitive to the event definition, both in terms of the spatial domain and the duration of the event.
2945. I think that most studies would try to use homogenised data, where available: you could frame this instead as highlighting the importance of using homogenised data.
296299. The conclusions concerning the two different methods are rather weak, perhaps because it was never very clear what the purpose of the comparison actually is. Gridded datasets offer an invaluable opportunity to examine spatial variability in trends and FAR over a whole region, but should be validated against station data if possible to ensure that they are locally accurate. However, it’s hard to get a sense of their relative merits here because two different methods have also been used, so there’s very little common ground for comparison.
299. CCscaling has not yet been defined.
Figure A7/A8. I don’t quite understand what these figures show. Is it the case that the upper bar shows the FAR computed from p_0 and p_1 computed from stationary distributions corresponding to the historical and current periods; while the second bar (shaded) shows the FAR based on a linear regression estimated over the ‘current’ climate only? Given that the shorter time periods have been tested for stationarity, and no trend signal could be detected, it’s not surprising that the confidence intervals of the hatched bars all include zero. A fairer and more useful comparison would be to estimate the regression coeffiecients over the whole period, to see whether the regression model adequately captures the observed difference between the two 30year slices.
Citation: https://doi.org/10.5194/egusphere20232879RC2  AC2: 'Reply on RC2', Erik Holmgren, 20 Mar 2024

Interactive discussion
Status: closed

RC1: 'Comment on egusphere20232879', Vikki Thompson, 10 Jan 2024
Exploring extreme event attribution by using longrunning meteorological observations
This paper assesses two methods of event attribution for two events in Sweden – a hot summer and an intense rainfall event. It is shown that the two different methods agree reasonably for the temperature event, but show more disagreement for rainfall.
I found the introduction and methods clearly written and enjoyable to read, with good background literature (although some perhaps less relevant to this particular study). The methods are clearly explained and the results from the second method well presented– but I struggled to identify the results from the first method, or a clear comparison between methods.
General comments:
Clearly labelling the two methods from the outset would be useful – the header of 2.2 is misleading, as observations are also used in the first method. Something like ‘GMST adjusted method’ and ‘using preindustrial observations’ would be more accurate.
In order to provide a comparison of the two methods a more thorough presentation of the results of the GMST adjusted method is needed. The GMST adjusted method could be applied to the same datasets as used in the preindustrial observations data – I am not clear if it is.
The study presents the use of preindustrial data for attribution as a good alternative to the GMST adjusted method, without full discussion of possible problems with the method. Greater emphasis on possible downfall of the preindustrial observational data would be useful. One major advantage of using a shorter observational record with GMST is that the data needed is available for more locations and variable globally. Although long observational records are available in Sweden, there are many parts of the world where this is not the case – this should be better highlighted.
Specific comments:
Title – doesn't capture the content, too vague
Abstract  ‘analogue approach’ this term is widely used for a different method using dynamical analogues (e.g. Climameter). Perhaps adding ‘statistical’ would make it more clear what you are doing (see also comment above about labelling the two methods).
Paragraph at line 40 could come sooner in the introduction (around line 23) as the two paragraphs either side flow better together (and have some repetition in the localglobal responses).
Lines 2530 perhaps irrelevant to this study as physical processes are not covered in this statistical assessment.
Line 18 – what types of events?
Fig1 – I find this a little unclear, is p0 more likely hot than p1?
To include or not the event in question?
Why days greater than 25, not just Tmax?
Fig.3 caption typo in dates (18821992)  and fig 6
Fig.6  no stations have at least one year missing 15% of days (none have the cross)? Is that correct?
e.g. line 160, figA3, Interchanging use of historical and preindustrial for the 18821911 period – I think it would be clearer to use preindustrial throughout as historical could mean any past period (I think sometimes you use it to refer to the full historical/observational record).
Paragraph at line 88 could be shortened, as the methods described are not those used in this study – perhaps it would be better to start with paragraph at line 97 stating what is done in this study, then mention that there are other methods used elsewhere.
Line 112 – data for this study / event definition, a subheader would be useful here.
Header 2.2  observations are used in the first method too
Section 2.3  the climatic indicators have already been mentioned in the section above, maybe this should go into an event definition section – which perhaps could be section 2.1, before the two methods.
Citation: https://doi.org/10.5194/egusphere20232879RC1  AC1: 'Reply on RC1', Erik Holmgren, 20 Mar 2024

RC2: 'Comment on egusphere20232879', Clair Barnes, 15 Feb 2024
For two classes of weather events – the frequency of extremely hot days and the maximum 1day precipitation accumulation  this paper compares the estimated fraction of risk attributable (FAR) to climate change using two approaches to event attribution: first by estimating exceedance probabilities in 30year time slices representing the factual and counterfactual climates, which are assumed to be stationary, at individual stations; and then using a nonstationary trend fitted to a spatial average computed from gridded data products. Both methods are found to produce relatively similar results for the FAR of the number of very hot days, while the FAR for extreme precipitation is found to be somewhat variable, particularly in the station data.
The paper is clearly written, and the discussion around potential homogeneity issues in the station observations is a useful and important one. However, it’s not entirely clear to me what the purpose of the comparison is here, or what the overall conclusions should be. This is perhaps because one method is used with station data, and another with gridded data, so it’s hard to understand whether differences in the results arise from the dataset or the method used: I think this could be a really useful comparison if both methods were used with both station and gridded data.
General comments:

The nonstationary method used here seems to only use 30 years of recent data to estimate the covariate β describing the strength of the relationship between the extreme and GMST (lines 114115). This is a very short time series: usually in WWA studies we would use as much data as possible to estimate this parameter, partly because a large sample size is usually needed to get stable estimates of the model parameters and partly to reduce the risk of conflating the GMST trend with decadal variability. I would suggest using longer time series to fit the trends, which would give a really useful and interesting comparison of whether the linear regression really captures the changes between the two snapshots. If that’s not possible due to data availability, you should highlight that only 30 years of data were used to estimate the nonstationary model parameters, and discuss what the implications might be.

A GEV or Gumbel distribution is used to model block maxima/minima: there’s no theoretical basis on which to use them to model txge25, which is a count variable. To simplify the statistical modelling, I’d suggest looking at maximum temperatures instead; if that’s not feasible, you could try fitting a nonstationary Gaussian distribution to the log of the counts.

My understanding is that, since both p_0 and p_1 are nonnegative, the FAR can never be greater than 1 (equation 2): however, in both Figures 5 and 7 it looks as though FARs above 1 occur, although the axes are truncated at 1 so it’s hard to see. Please check this, and also modify the axes so that the upper bounds of the confidence intervals are visible.
Specific comments:
Abstract: I find the terminology here a little vague: it’s not clear what is meant by ‘the reference method’ and, since both methods use obervations of some sort, this doesn’t help to understand which is which. It would be useful to add a line explaining that the ‘widely adopted’ method uses a transient/nonstationary model, and rather than referring to the ‘analogue approach’ (which is becoming synonymous with another method), I would perhaps refer to a factual/counterfactual comparison.
46, 52: The reader doesn’t know what ‘the reference method’ is yet, or ‘shifting and scaling’ – this needs some introduction.
6263 & 6768: The rapid attribution method could also be used on the longrunning meteorological observations, so I think it would be useful to distinguish more clearly between the two methods: maybe ‘we will also perform an analysis based on directly comparing the current and preindustrial periods in data from several stations with long observational records’. Also some repetition here, so 6163 could be removed altogether.
77: This should be more precisely defined: p_1 and p_0 are the probabilities of observing an event of equal or greater magnitude than some threshold value in the factual (current) and counterfactual (preindistrial) climates (‘exceedance probabilities’).
80. I found this a bit unclear – maybe ‘FAR describes the proportion of events of the same (or greater) magnitude that can be attributed to the forced change’?
84. Change to ‘The exceedance probability’
8896. I don’t think I’ve seen examples of climate models being used to estimate p_0, although they are certainly used to estimate probability ratios. I’d suggest moving this paragraph to the description of the datasets
1012. Not all distributions have these three parameters: to make this more general, I’d remove this line and simply say that ‘the mean \mu’ is shifted following…
105. ‘\mu and the standard deviation \sigma are…’
112128. This breaks up the flow a bit – I’d move this (and maybe 8896) into a separate subsection on datasets.
Figure 1. This was quite hard to read in black & white, could you change the colour scheme to something more colourblindfriendly?
129130. The WWA approach outlined in Philip et al. (2020) uses maximum likelihood estimation to estimate the parameters of a nonstationary GEV distribution directly from (35), rather than first fitting a linear regression to estimate the trend and then estimating the parameters of a stationary GEV separately. I wouldn’t expect this to make much difference to the overall conclusions but this should be checked and commented on – you can fit the nonstationary GEV distributions using the online Climate Explorer tool provided by KNMI (first upload the time series, then choose the ‘trends in return times of extremes’ option).
It would also be useful to be clearer about which time period was used for the regression and parameter estimation – and, if only 30 years is used, this would be a good opportunity to discuss the implications of using a relatively short time series.
131. How was this 95% interval estimated?
1412. Does this mean that the spread of all members was used to determine the confidence bounds? How was this done – was a parametric distribution used, or order statistics?
1589. Why was stationarity checked, and over which period? I can see the advantage of checking that each of the time periods studied could be treated as locally stationary, but as written, this could be read as suggesting that the full series was found to be stationary.
1757. You could add a line to explain why this is: the GEV with negative shape parameter has a finite upper bound, which can lead to observed events becoming theoretically impossibly in the shifted/scaled distribution. The Gumbel distribution, which has its shape parameter fixed at zero, has no upper limit and so does not exhibit this behaviour.
1789. As noted above, a Gumbel distribution isn’t theoretically justified for count data like txge25.
189. Please add a line interpreting this FAR in terms of the number of hot days.
191. How are percentiles of the FAR computed? Also, this notation is slightly confusing, because p_0 and p_1 have already been used to denote exceedance probabilities. Percentiles could perhaps be relabelled as Q_5.
Figure 4. I would expect the size of the circles to represent a range of values  it's not clear exactly what they refer to here.
2134. Why are P_5 and P_25 given here, rather than an upper and lower limit?
Figure 5 & 7. I don’t understand how the FAR is greater than one in some of these cases – perhaps some additional scaling has been applied? Please extend the xaxis to show the upper bounds of the confidence intervals. It would also be useful to add a vertical line at 0, highlighting the critical threshold for evidence of an effect.
236240. You could also discuss the fact that observations of precipitation are typically more variable than observations of temperatures; and that gridded data, by its very nature, will not tend to contain such extreme extreme values as a single station, which may result in a better constrained distribution. When trying to fit a distribution to only 30 years of data we don’t really expect to get an accurate estimate of the return level of any events with a return period of greater than 30 years (or, conversely, an accurate estimate of the return period of particularly extreme events): this may also lead to inflated estimates of the return period, which can in turn make the PR and FAR estimates unstable.
251258. I think this could fit better in section 2.3, where the climate indicators are introduced.
259262. This discussion of spatial variability in the trend is really interesting and could be referred to in the discussion of variation between the stations – I’d like to see Figure A1 in the main text, perhaps with the station regression coefficients overlaid so that the similarities/differences between the gridded product and the stations are really clear.
292. You could mention that attribution of extreme precipitation events is known to be sensitive to the event definition, both in terms of the spatial domain and the duration of the event.
2945. I think that most studies would try to use homogenised data, where available: you could frame this instead as highlighting the importance of using homogenised data.
296299. The conclusions concerning the two different methods are rather weak, perhaps because it was never very clear what the purpose of the comparison actually is. Gridded datasets offer an invaluable opportunity to examine spatial variability in trends and FAR over a whole region, but should be validated against station data if possible to ensure that they are locally accurate. However, it’s hard to get a sense of their relative merits here because two different methods have also been used, so there’s very little common ground for comparison.
299. CCscaling has not yet been defined.
Figure A7/A8. I don’t quite understand what these figures show. Is it the case that the upper bar shows the FAR computed from p_0 and p_1 computed from stationary distributions corresponding to the historical and current periods; while the second bar (shaded) shows the FAR based on a linear regression estimated over the ‘current’ climate only? Given that the shorter time periods have been tested for stationarity, and no trend signal could be detected, it’s not surprising that the confidence intervals of the hatched bars all include zero. A fairer and more useful comparison would be to estimate the regression coeffiecients over the whole period, to see whether the regression model adequately captures the observed difference between the two 30year slices.
Citation: https://doi.org/10.5194/egusphere20232879RC2  AC2: 'Reply on RC2', Erik Holmgren, 20 Mar 2024

Peer review completion
Journal article(s) based on this preprint
Our results show that while both methods lead to similar conclusions for two recent weather events in Sweden, the commonly used method risks underestimating the strength of the connection between the event and changes to the climate.
Viewed
HTML  XML  Total  BibTeX  EndNote  

293  96  35  424  26  27 
 HTML: 293
 PDF: 96
 XML: 35
 Total: 424
 BibTeX: 26
 EndNote: 27
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1
Erik Kjellström
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(1808 KB)  Metadata XML
Our results show that while both methods lead to similar conclusions for two recent weather events in Sweden, the commonly used method risks underestimating the strength of the connection between the event and changes to the climate.