the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Towards impact-based early warning of drought: a generic framework for drought impact prediction in the UK
Abstract. Drought impact forecasting is essential for enhancing preparedness and mitigation strategies. However, identifying key predictors and achieving reliable predictions remains challenging. Previous studies have shown promise in developing indicator-impact relationships and yet these are often region- and impact type-specific. Here, utilized the European Drought Impact Inventory (EDII), and a wide range of meteorological and hydrological predictors, including the Standardized Precipitation Index (SPI), Standardized Precipitation-Evapotranspiration Index (SPEI), and soil moisture indices (SSMI), to develop a generalized forecasting framework for predicting drought impacts in the UK across different lead times. We firstly compared multiple machine learning models for drought impact prediction and identified Random Forest (RF) as the most effective model. Our results show that RF delivers the highest accuracy for short-term forecasts (0–3 months), with performance declining beyond six months, similar to trends observed in weather prediction models. At longer lead times, the model incorporates a broader set of predictors to maintain accuracy. Key findings highlight the importance of long-accumulation-period drought indicators, particularly SPEI24, and deep-layer soil moisture (SSMI L4), which were identified as the most influential predictors. A generalized model approach was employed, aggregating drought impacts from various regions, and the model was validated using unseen datasets from within the UK, using parts of the EDII UK dataset held back from the training, confirming its robustness. A pilot application to a completely different country (Germany) highlights the potential for extrapolation to new domains. Gridded impact predictions were also developed, and successfully captured the spatial distribution of observed impacts, and a spatially explicit evaluation showed reasonable agreement between predicted and observed drought impacts. Although uncertainties persist, particularly for long lead times, our findings suggest that a generalized approach based on hydrometeorological indices provides an effective framework for operational drought impact forecasting, supporting early warning systems and decision-making in drought risk management.
- Preprint
(1806 KB) - Metadata XML
-
Supplement
(260 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-3176', Anonymous Referee #1, 08 Sep 2025
-
RC2: 'Comment on egusphere-2025-3176', Kerstin Stahl, 28 Sep 2025
Dear Burak and colleagues,
apologies for the delay of this review and thank you for the opportunity to comment on this manuscript. I enjoyed reading about your study and provide my comments below. They are minor.
Regards
Kerstin
(Kerstin Stahl)
Summary
The manuscript reports a study that applies a range of advanced statistical models to the task of predicting drought impact occurrence under certain meteorological and soil moisture conditions in the UK. To do that it used training data from a text-based category-coded impact database and indices from operational monitoring or hydrometeorological conditions. The contribution is a valuable application that tests and compares various statistical modelling options that have not previously compared to that extent. The study also evaluates the potential operational use of this application, specifically prediction/forecasting of impacts with various lead times. The paper is well generally well written and makes an important contribution. A few aspects need some improvement to provide a more focused and consistent message and therefore the impact the paper deserves. They should be fairly easy to implement.
Main comments
The title is quite long. In light of potentially misleading terminology, I suggest that 'impact-based' as well as "generic" might be removed from the title. But there may also be other solutions.
My main comment relates to those two terms and the focus and consistent message of the paper. The two terms come with ambiguity and are used very differently in the literature. I think their precise use could be improved throughout the manuscript.
(1) "impact-based forecasting" is used interchangeably with "impact forecasting/prediction" in the title and text. While cited sources have used that term, some of the literature on climate impacts uses "impact-based-forecasting" differently, e.g. selecting ensemble members of a physical model based on impact information; the impact, however, is then not directly forecasted. Strictly speaking, "impact forecasting/prediction" might therefore be more correct. However, at least consistency and introduction/discussion might be improved on that.
(2) "generic framework" suggests to me a standardized procedure of applying this in operation as indicated that it will be and/or a procedure that is transferrable to other data and regions. It speaks a bit against the rather detailed analytic comparison and analysis of several statistical models and the different forecasts that is the main aim and in fact in my opinion the main value and contribution of the study.
For consistency in the aims and main contribution made with the study, I strongly suggest to either tone down this 'generic framework aim' or explain in more detail what it is exactly in the end - perhaps including a flow chart or so. The methods generally have been applied previously, so what the general methodological 'developing' is, might also be clarified.
Figure 1 goes a bit into that direction but is not entitled "framework". So which part of it is the framework? And would that operational framework always use all model options? i.e. train all, but then apply/predict with the best? Or how is this transferred to the framework of application?
Data statement and line 102ff
The latest (and likely last) version of the EDII is available with doi and should be cited as:
Blauhut, V., Stephan, R., and Stahl, K.: The European Drought Impact report Inventory (EDII V2.0), [data set], Uni Freiburg, Freiburg, https://doi.org/10.6094/UNIFR/230922, 2022
This is our preferred reference, because the website that is given by the authors no longer functions correctly.
I don't see the reason to cite the new EDID database in the data statement as it was not used. Please consider that while EDID ingested a major part, but not all content of the former EDII, along with other databases, it uses different categories and different attributes than EDII. Therefore, naming it here and in lines 102ff as one and the same is misleading as using it for the same purpose might provide different time series of NI etc.
For your information: A paper on the new db is in progress and about to be submitted to NHESS. Futhermore, guidelines for interested contributors on how to transfer EDII-categories into the new EDID-systems are already available:
Szillat, K., Hlavsová, M., Rossi, L., Blauhut, V., Stahl, K.: Transformation of text-based drought impact data from EDII (European Drought Impact report Inventory) to EDID (European Drought Impact Database): Guidelines. Freiburg HydroNotes no. 8. https://doi.org/10.6094/UNIFR/271380, 2025
Minor comments
"Short term" = 0-3 months? For weather/floods anything more than a few days would be considered long term and not short term. Discussion and use of terminology might be improved.
line 13 'Here, "we" used ...?
Line 344: what is meant by 'regional information'? The impact occurrence? Why not use 'predictor' and 'predictand' or 'response' or so or better even, a variable name.
line 674. EDID is not global. Replace with something global or say 'Europe'.
Figures
Figure 5 what are dashed and solid lines? A legend with symbols and line types is strongly preferable over the difficult to read/miss caption text.
Also, in Figure 2 it is confusing that some components have a legend and others don't. It should at least be consistent.
The caption in Fig. 11 needs to name all panels (or a to d...etc..) but just singling out some is inconsistent.
Citation: https://doi.org/10.5194/egusphere-2025-3176-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,097 | 54 | 20 | 1,171 | 33 | 21 | 24 |
- HTML: 1,097
- PDF: 54
- XML: 20
- Total: 1,171
- Supplement: 33
- BibTeX: 21
- EndNote: 24
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
The manuscript "Towards impact-based early warning of drought: a generic framework for drought impact prediction in the UK" is well written and presents a relevant study. The authors evaluate multiple models for predicting drought impacts, carefully train and validate them with different lead times. Overall, the study is rigorous, the results are clearly presented, and the framework seems solid and well analyzed. I identified two shortcomings that should be revised.
First, the discussion of multicollinearity is unsatisfactory. Readers need to trust the authors’ assertion that multicollinearity is present but varies. Finally, this concern is brushed aside in line 605 with the sentence: “Although SPI and SPEI indices are highly correlated in the UK, the RF model is capable of managing this multicollinearity.” However, multicollinearity is indeed a problem for Linear Regression and LASSOCV, which are compared in the first step before being outperformed. A more explicit quantification is necessary (e.g. showing a correlation matrix or variance inflation factor).
Second, the dataset (1970–2012) is quite outdated, even though more recent data (up to 2024) seems to be available, as noted in the data availability statement. This weakens the study’s relevance and raises questions about whether it still represents state-of-the-art work. This is particularly disappointing since the introduction references more recent drought events (2018–2019: Turner et al., 2021; 2022: Barker et al., 2024) and sets expectations that are not met when entering the methods section. I suggest clearly stating the training period already in the abstract (e.g. “trained with data from 1970–2012”) and including a discussion on whether the model would be capable of forecasting more recent, unprecedented droughts. Ideally, if possible, predictions for these years could be shown.
Minor comments
• Defining the upper tercile as "extremes" seems overstated, particularly since the threshold includes the upper 33%.
• Several passages are overly long and difficult to follow due to heavy nominalization. Please streamline them for clarity or remove if they add no value. Examples: Line 60/ Line 142/Line 449