the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Can Weather Patterns Contribute to Predicting Winter Flood Magnitudes Using Machine Learning?
Abstract. Fluvial floods pose severe socioeconomic and environmental risks globally, and are projected to change in frequency and severity in future decades. While it is crucial to understand these changes, the prediction of extreme events remains a significant challenge. Identifying predictable features driving extreme flood events provides a potential way forward with respect to improving such predictions. Weather patterns tend to be more stable and predictable than meteorological catchment-scale variables such as precipitation. However, the contribution of weather patterns to extreme flood prediction remains poorly understood. This study investigates the role of weather patterns, along with other sets of predictors, in influencing winter flood magnitudes above the 99th percentile within a large-sample machine learning framework, using natural benchmark catchments from the UK National River Flow Archive. Six generations of random forest models, each generation including additional sets of features, are explored on the national, regional, and catchment scale. Model results are interpreted using Shapley Additive Explanations (SHAP) to understand feature importance. Additionally, we analyze the conditional probabilities of the UK Met Office's MO-30 weather patterns during extreme flood events. Our findings show that weather patterns with cyclonic low pressure systems frequently co-occur with high flow magnitudes, which is also reflected in the SHAP value analysis. However, the predictive power of these weather patterns is limited and offers hardly any benefit. We also show regional nuances in the feature importance of predictors and model performance. The majority of the predictability comes from meteorological variables and antecedent precipitation. Our findings highlight the variability in model outcomes depending on the model structure and choice of predictors. This study also offers methodological guidance for developing large-sample machine learning models for flood estimation that integrate atmospheric predictors with traditional hydro-meteorological and geographical variables.
Competing interests: LS and MB are members of the editorial board of Hydrology and Earth System Sciences. The authors also have no other competing interests to declare.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(5191 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 26 Jun 2025)
-
RC1: 'Comment on egusphere-2025-1493', Anonymous Referee #1, 11 Jun 2025
reply
In the manuscript, if weather patterns contribute to predicting winter flood magnitudes was discussed using machine learning. To my knowledge, flood is mainly caused by intensive rainfall and antecedent soil conditions. In this study, they also considered these two factors, and add some other variables. Some main questions are below.
(1) Table 2, I cannot understand the relationship between total event count, number of catchments and catchment average event count.
(2) Line 154, ‘pre-filtered to contain only extreme flood magnitude days’, this will not ensure the flood event from beginning to the end.
(3) Line 163, the categorize small, medium and large is not appropriate. Because in hydrology, there is a standard for definition of small, medium and large catchments.
(4) Line 278, the WP associated with the most extreme precipitation, does not necessarily translate to the WP associated with extreme flood magnitude days across UK regions.’ I cannot understand the intrinsic relations between WP, extreme precipitation and extreme flood magnitude days.
(5) Line 363, CEE had the lowest baseline R2 (0.28), and only the final R2 of 0.37 in Generation 6 was statistically significant. Why the precision is so low? Are there any previous hydrological simulation in this region? Please compare this result with previous studies.
(6) Line 370, ‘The SE region’s relatively lower sensitivity to antecedent precipitation and hydrometeorological inputs suggests that other factors, such as urbanization and engineered drainage systems, may dominate flood generation.’ However, when you select the watersheds, they are not influenced by human activities.
(7) When using SHAP, you need to explain the definition of aridity, runoff ratio….
(8) Line 490, ‘The SHAP summary plot further supports the limited contribution of the WPs’. In traditional flood analysis, rainfall and soil moisture are the main contributors. We never consider WPs. In this study, WPs are focused, but still limited contribution. What is the innovation of this study?
(9) Line 501, ‘Interestingly, precipitation on the day of the event consistently ranks higher than antecedent precipitation’, actually, this is a common sense.
Citation: https://doi.org/10.5194/egusphere-2025-1493-RC1 -
RC2: 'Comment on egusphere-2025-1493', Anonymous Referee #2, 17 Jun 2025
reply
Title: Can Weather Patterns Contribute to Predicting Winter Flood
Magnitudes Using Machine Learning?
Â
The paper analyses two main relevant topics in Hydrology. The first one is the predictability of extreme flooding events. The second topic is the difference between national (UK) and regional models. The main finding from the first one is that WP is not relevant for prediction, mainly because other attributes and forcing already share the same information. In the second topic, the national model exhibits the overall best performance, but with considerable variability between some regional models, indicating that, in many cases, regional models capture the dynamics of the region more effectively. The results align with other research; however, some concerns arise from the framework and the presentation of the results.
Main comments
The title is not aligned with the framework and the results. The author used a progressive feature incorporation to explore the benefit of having them in the model. From this analysis, WP was not relevant or caused a deterioration of the performance in most of the regions. Therefore, the author should not use them; however, they insist on using them across the entire paper despite of the no-value. The same happens with other features in generations 2, 3, and 4. My suggestion is to reframe the title and the paper toward the characterization of the extreme events through different types of models, and keep just the relevant features, which will help to have a better interpretability of the results.
Another concern is how the results are presented. In Figure 4, the UK model is presented as the best model (Generation 6). This would mean the model should outperform regional models at least in 50% of the cases. However, Figure 5a shows us that the UK model has a lower median than NW and NE. That is possible given the variability in the model, as the authors describe; however, the figure may be misleading because to have a real comparison, the authors should compare the UK model with each regional model by selecting the same catchments in both, which appears not to be the case. In fact, Figure 5b shows that the concentration of blue dots is higher in the UK model, which is consistent with Figure 4. The author describes the Simpson’s paradox as the problem; however, they forgot that they are responsible for having a fair comparison and splitting of the data. Therefore, they should avoid the paradox, which, from the differences between Figures 4 and 5, is not the case.
I suggest a major revision of the paper, given that they need to reframe the paper to the actual results and check that all the results are presented in a fair way to avoid misleading.
Minor comments:
Line 24Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Check reference (?)
Line 29-30Â Â Â Â Â Â Â Â What about events with no high intensity but longer duration?
Line 43Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Any idea why they have been used before?
Line 45Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â How have other studies been done before? You should present the baseline to have a clear benefit of your approach.
Line 87Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â What about the look at table approach implemented in RF? How can this go against your results?
Line 98-99Â Â Â Â Â Â Â Â Why do you need to cluster the model if the RF architecture already does clustering? Why do you think your clustering can do it better than RF?
Line 151-152Â Â Is this analysis linear? If this is the case, what are the implications of that analysis?
Line 169Â Â Â Â Â Â Â Â Â Â Â Â Â Â Check reference (year?)
Table 3Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Add all variables considered in each category and the abbreviation used per group.
Line 173-175Â Â This is a result, so it should not be in the methodology.
Line 179Â Â Â Â Â Â Â Â Â Â Â Â Â Â Why did you use a two-period splitting when in ML, three periods is the common practice to avoid overfitting and leaking information?
Line 183-184Â Â This is not completely true because you used this period for the hyperparameter search, so it is not unseen data.
Line 197Â Â Â Â Â Â Â Â Â Â Â Â Â Â The metric can take negative values.
Figure 2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Numbers must be located at the center of the column (x-axis)
Figure 3Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â You should uniform the text sizes in the figure.
Line 293Â Â Â Â Â Â Â Â Â Â Â Â Â Â Could it be that WP30 is associated with the duration or total volume of the event? From figure 1, it is clear that WP30 is related to how big the WP is and the overlapping between the low-pressure area and the UK boundary.
Line 324Â Â Â Â Â Â Â Â Â Â Â Â Â Â Check reference (?)
Line 381Â Â Â Â Â Â Â Â Â Â Â Â Â Â Latitude and longitude do not have a hydrological meaning other than specific coordinates in space. For that reason, they are mainly used as an address to identify each catchment, and are much more specific than any catchment attributes. If you want to have more hydrological meaning, latitude and longitude should not be used as attributes.
Line 412Â Â Â Â Â Â Â Â Â Â Â Â Â Â Please describe the intra/inter concept before using it.
Line 423-424Â Â I agree that specific features could help, however, given the use of latitude and longitude, it will be hard the find features that are more specific than those. This is one of the problems of using attributes that are not hydrologically meaningful.
Line 429Â Â Â Â Â Â Â Â Â Â Â Â Â Â This goes against the findings you already mentioned and figures 4 and 5b.
Figure 5b            Could you replace the regional figure with the difference between the UK and the regional one?
Line 437Â Â Â Â Â Â Â Â Â Â Â Â Â Â Check reference. I agree that it is an important issue; however, you are responsible for the framework to avoid that. How are you calculating the median for the local and global metrics? You should always be calculating the metric per catchment and then computing the median independent of the model, and all over the same period and the same group of catchments.
Line 447-449Â Â Could this issue be just part of the overfitting, given the low number of events? I think more work must be done to clarify how the R2 is calculated. Maybe this conclusion is just an artifact of the computing method.
Line 465Â Â Â Â Â Â Â Â Â Â Â Â Â Â Given the use of latitude and longitude in all the generations, and the high performance with the first generation, it is weird that those features do not appear as one of the most important features in SHAP.
Line 486-489Â Â Many of these variables have other collinear variables, so you should try to prune the model with more independent variables. This way, you will have stronger relationships with the attributes.
Line 501Â Â Â Â Â Â Â Â Â Â Â Â Â Â Precipitation is well known in hydrology as one of the most important variables, so I would not say that this is interesting.
Line 505              That the UK model has different importances does not mean it does not capture the local variability of the importance. The UK’s importance is just overall more important. You can think of the regional models as specific branches of a big tree. In that case, each branch has different importance because they are independent. Therefore, this comment is unfair to the UK model.
Line 517Â Â Â Â Â Â Â Â Â Â Â Â Â Â There are ways to quantify uncertainty. In fact, RF has already un uncertainty quantification that you did not use (ensemble). Therefore, more effort should be made to consider it.
Line 518-530Â Â Why should researchers spend time refining WP if they had zero importance? Maybe it could be more beneficial to refine catchment attributes that you have already proved are important.
Line 543-544Â Â However, you defined the framework, why did you not change the percentile to 95% or another value to have more data? Do you think that the important features or relationships would change if you use other percentiles?
Â
Â
Â
Â
Â
Â
Â
Â
Citation: https://doi.org/10.5194/egusphere-2025-1493-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
290 | 40 | 9 | 339 | 7 | 7 |
- HTML: 290
- PDF: 40
- XML: 9
- Total: 339
- BibTeX: 7
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1