the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
From Worst-Case Scenarios to Extreme Value Statistics: Local Counterfactuals in Flood Frequency Analysis
Abstract. Many aspects of flood risk management require flood frequency analysis (FFA) which is, however, often limited by short observational records – especially for flash floods in small basins. In order to address this issue, we propose to extend the underlying data by local counterfactual scenarios. To that end, heavy precipitation events (HPEs) from nearby, hydrologically similar catchments are used to simulate flood peaks which are then included in the FFA for the catchment of interest. In order to demonstrate the added value of this approach, we used 23 years of radar-based precipitation and a hydrological model, fitted the Generalized Extreme Value (GEV) distribution to three different datasets – observed peaks, counterfactual peaks, and their combination -, and evaluated the resulting three GEV fits by means of the quantile skill score (QSS). For a sample of more than 13,000 German headwater catchments, we could show that local counterfactuals improved quantile estimation, with the level of improvement increasing with return period. The improvement declines when the radius of the transposition domain is extended beyond 30 km. Overall, our results provide a tangible perspective to enhance traditional FFA, producing narrower confidence intervals and more robust estimates for design floods and risk assessments.
- Preprint
(1798 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 02 Jan 2026)
- RC1: 'Comment on egusphere-2025-4951', Anonymous Referee #1, 28 Nov 2025 reply
-
RC2: 'Comment on egusphere-2025-4951', Anonymous Referee #2, 05 Dec 2025
reply
The authors present a method that, through a modeling exercise, combines storm transposition with catchment similarity to improve flood frequency analysis in small and medium catchments. They tested their method on a database of 13000 headwater catchments in Germany, where they applied an uncalibrated lumped hydrological model to 23 years of precipitation.
The method is interesting, well presented, and the results are promising. I think the paper can be accepted for publication once a few points are clarified.
Main comments:
- From a comprehensive analysis of 13000 catchments, I was hoping to see a map with the regional performance of this method, and the areas that might be problematic.
- What is the effect of the catchment area on method performance and the size of the transposition domain?
- The model you applied seems to have multiple limitations, and I am wondering what's the effect on your results. You choose to apply a CN model over lumped (small) basins. This implies that the response is driven by the cumulated value of precipitation and its distribution in time, and not by its intensity. Hortonian runoff is not simulated. The spatial variability of precipitation is lost - might be ok if your subbasins are very small. You represent only "quick runoff", while many of those catchments have annual maxima in winter, when soils are wet and slow runoff has a very high contribution on peaks.
- L121: In SST one of the most critical points is the definition of a similar transposition domain. With your method, you seem to transfer this to catchment similarity. Can you give more information on how the catchment similarity criteria are mixed, and how much your approach is sensitive to this choice?
- How sensitive is your method to not finding similar catchments in the transposition domain?
- One of your conclusions is that "the improvement declines when the radius of the transposition domain is extended beyond 30 km. I think this is based on figure 2 alone. In this figure, performance seems to generally decline with larger radius, but I don't see a performance decline after 30 km2. I'd argue that the performance seems mostly independent from the size of the transposition domain (while it might be more sensitive to the measure of catchment similarity).
- Transposing the HPE to the centroid of the catchment might result in biases (e.g. if the storm and the NC have an orientation, or if spatial distribution over the catchment is important - see Zhou, 2021)
Other comments:
- "Worst-Case scenarios" is in the title and the conclusions, but what do you define as a "worst case"? Is taking the strongest HPE within 10km the worst that can happen to a basin? Sometimes compound scenarios or very high (~500y?) return periods are considered.
- As you say, in your analysis FFA is strongly limited by the relatively short data availability (23 years). You apply a GEV over annual maxima using maximum likelihood, but usually for small data samples, POT and maybe L-moments estimation might be more appropriate.
- You use QS to evaluate particularly high return periods (200 years) over a short data record (23 years). If the quantile q is higher than every observations, is the best QS simply the closest to your observation? Is it correct to say that it's a better estimate?
- L26: I'm not sure why you describe flash floods here. You didn't specifically analyze effect on flash floods, and your approach is general.
- L31: It's subjective, but I would not call a 750km2 basin "small". Maybe small + medium?
- L98: Isn't CORINE updated every 6 years?
- L106: you refer to your other paper for the model application, but I think it would be useful to add some more information on the model setup and characteristics that are important for your results.
- L106: Please clarify: if I understand well, you apply a lumped CN model over basins with a median size of 15.7 km2. These basins are also combining into larger basins. I was confused how sometimes you talk about "upstream catchments" "transposition to each catchment in the CoI".
- L134: do you apply the HPE multiple times by transposing the same HPE to the centroid of each subcatchment? If so, do you think it's generates realistic precipitation fields?
- L157: do you mean that you disregard shape below 0 (0 is ok) and above 0.5?
- L195: GEV CoI is fitted over 22 years?
- L217: I don't see the supplement, also online.
- L220: are the HPE over larger domains less typical than CoI or just less correlated? How much do the HPE of the CoI overlap with the floods over NC?
- L254: aren't NCs analyzed as a set? So with 230 values.
- L258, L266, L267: I can't find those numbers reflected in figure 4. Am I reading it wrong?
- L284: why would catchment biases cancel out?
Zhou, Zhengzheng, et al. "The impact of spatiotemporal structure of rainfall on flood frequency over a small urban watershed: an approach coupling stochastic storm transposition and hydrologic modeling." Hydrology and Earth System Sciences Discussions 2021 (2021): 1-25.
Citation: https://doi.org/10.5194/egusphere-2025-4951-RC2 -
RC3: 'Comment on egusphere-2025-4951', Anonymous Referee #3, 15 Dec 2025
reply
The authors introduce a method that uses modeling to integrate storm transposition and catchment similarity, enhancing flood frequency analysis for small and medium catchments. They tested this approach on a database of 13000 headwater catchments in Germany, utilising an uncalibrated lumped hydrological model over 23 years precipitation.
The approach is engaging, clearly presented, and yields promising results. I believe the paper can be accepted for publication once a few clarifications are made
Main comments:
- The paper would benefit from clearer definitions and a distinct conceptual separation between several key ideas that are currently addressed somewhat implicitly. In particular, the notion of “local counterfactuals” should be more rigorously defined early in the manuscript, including its hydrological interpretation and how it differs from related concepts such as storm transposition, spatial counterfactuals, and regionalisation. While the background section is strong, readers unfamiliar with these concepts may find it difficult to immediately understand what is novel versus what is adapted from existing approaches.
- The selection of a 30 km radius and ten neighbouring catchments seems reasonable but is largely based on empirical judgement. The manuscript should offer a clearer rationale for these choices, whether based on meteorological homogeneity, hydrological similarity scales, or sensitivity analysis.
- The criteria used to define catchment similarity via the KDTree deserve further explanation.
- Applying an uncalibrated SCS-CN and GIUH-based lumped model across thousands of catchments introduces significant uncertainty. The model mainly captures fast runoff generation and overlooks Hortonian runoff, slow flow components, and spatial variability in precipitation. This is especially important for catchments with winter flood regimes, where soil saturation and slower processes may prevail. The authors should discuss how these simplifications could impact both annual maxima and GEV tail behaviour.
- The conclusions would benefit from clearer guidance on when the proposed method may be unsuitable.
- The manuscript frequently refers to “worst-case scenarios,” but this term is not clearly defined.
- The appropriateness of using annual maxima with maximum likelihood estimation for such short samples could be briefly discussed in relation to alternative approaches such as POT or L-moments
Citation: https://doi.org/10.5194/egusphere-2025-4951-RC3
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 282 | 59 | 25 | 366 | 16 | 17 |
- HTML: 282
- PDF: 59
- XML: 25
- Total: 366
- BibTeX: 16
- EndNote: 17
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The term “local counterfactual” should be defined more firmly in the opening section so that readers unfamiliar with the concept can immediately grasp its hydrological meaning.
The introduction would benefit from a clearer explanation of why a 30-km neighborhood and ten neighboring catchments were selected as the basis for local counterfactual generation.
The background section is strong, but it would be helpful to distinguish more explicitly between catchment similarity and storm similarity, as the manuscript presently assumes these are equivalent.
The use of an uncalibrated SCS-CN and GIUH model across more than 13,000 catchments introduces considerable uncertainty, and the authors should include either a brief validation example or a reference to previous calibration results.
The criteria used to define "catchment similarity" deserve more explanation, especially regarding how the attributes were scaled and weighted in the KDTree analysis.
The assumption that storms producing high runoff in a nearby basin are hydrologically meaningful for the catchment of interest should be justified with either empirical evidence or literature support.
The manuscript should explain how independence among counterfactual annual maxima is ensured, given that neighboring catchments may experience correlated rainfall events.
Mixing factual and counterfactual peaks in a single GEV fit may violate standard assumptions, and this issue requires at least a clear justification in the methods section.
Although the QSS results show improvements, the authors should comment on the fact that GEVNCs outperforms GEVCoI even without using any data from the catchment of interest, which may indicate over-smoothing or strong regional influences.
The improvement of GEVNCs with increasing return period is convincingly shown, yet the manuscript should discuss why the lower tail benefits less from the counterfactual approach.
The discussion should reflect that counterfactual extremes depend strongly on the selected time window and may not represent the full range of possible events.
The authors appropriately highlight the short time series, but they omit discussion of potential non-stationarity in rainfall over the 2001–2023 period, which may influence GEV tail behaviour.
The conclusion section accurately summarizes the study, but it should offer clearer guidance on when the counterfactual method might be unsuitable—particularly in regions with strong orographic gradients or highly heterogeneous rainfall patterns.