Preprints
https://doi.org/10.5194/egusphere-2026-1591
https://doi.org/10.5194/egusphere-2026-1591
08 Apr 2026
 | 08 Apr 2026
Status: this preprint is open for discussion and under review for Natural Hazards and Earth System Sciences (NHESS).

Outrunning flash floods: XGBoost and sparse impact reports deliver global medium-range probabilistic forecasts of flash flood occurrence

Fatima M. Pillosu, Mariana Claire, Calum Baugh, Florian Pappenberger, Christel Prudhome, and Hannah L. Cloke

Abstract. Flash floods are the world's most frequent and deadly type of flood. Yet, no medium-range forecasts of their occurrence exist over a continuous global domain – essential to fulfil the UN's "Early Warnings for All" target to protect everyone with early warning systems. This study addressed this gap in two phases. In a first phase, regional medium-range, data-driven forecasts of flash occurrence were developed by combining regional high-density, quality-controlled flash flood impact reports (e.g., NOAA's Storm Event Database over the Contiguous US) with global reanalysis and forecasts (e.g. from ERA5 for non-meteorological variables and ERA5-ecPoint for rainfall). Out of all the tested models, XGBoost gradient boosting achieved the best performance: it maintained high and constant discrimination skill across scores (e.g. ROC and Precision-Recall curves) and lead times, and forecast probabilities remained reliable below 10 % at day 1 and 2 % at day 5. In a second phase, a spatial-constrained sensitivity analysis evaluated how well the regional XGBoost model generalised to unseen regions. The sensitivity analysis revealed that a model trained on hydro-climatologically diverse and observation-dense sub-domains generalised better than those trained across the full domain with sparser data, suggesting a viable strategy for extending regionally trained forecasts of flash flood occurrence globally. Hence, this study provides the first empirical evidence that global, medium-range forecasts of flash flood occurrence are achievable with simple data-driven approaches and readily available data, closing one of the most pressing and long-standing gaps in modern hydrology.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Fatima M. Pillosu, Mariana Claire, Calum Baugh, Florian Pappenberger, Christel Prudhome, and Hannah L. Cloke

Status: open (until 20 May 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Fatima M. Pillosu, Mariana Claire, Calum Baugh, Florian Pappenberger, Christel Prudhome, and Hannah L. Cloke
Fatima M. Pillosu, Mariana Claire, Calum Baugh, Florian Pappenberger, Christel Prudhome, and Hannah L. Cloke
Metrics will be available soon.
Latest update: 08 Apr 2026
Download
Short summary
This paper presents the first global, medium-range probabilistic forecasts of flash flood occurrence. A single XGBoost model trained on coarse impact reports achieves skilful predictions, challenging assumptions that complex architectures and high-resolution data are required. At a time of heightened attention to flash flood risk and early warning, this work demonstrates that skilful global forecasting is achievable in data-sparse regions where flash flood risk is highest.
Share