From Worst-Case Scenarios to Extreme Value Statistics: Local Counterfactuals in Flood Frequency Analysis
Abstract. Many aspects of flood risk management require flood frequency analysis (FFA) which is, however, often limited by short observational records – especially for flash floods in small basins. In order to address this issue, we propose to extend the underlying data by local counterfactual scenarios. To that end, heavy precipitation events (HPEs) from nearby, hydrologically similar catchments are used to simulate flood peaks which are then included in the FFA for the catchment of interest. In order to demonstrate the added value of this approach, we used 23 years of radar-based precipitation and a hydrological model, fitted the Generalized Extreme Value (GEV) distribution to three different datasets – observed peaks, counterfactual peaks, and their combination -, and evaluated the resulting three GEV fits by means of the quantile skill score (QSS). For a sample of more than 13,000 German headwater catchments, we could show that local counterfactuals improved quantile estimation, with the level of improvement increasing with return period. The improvement declines when the radius of the transposition domain is extended beyond 30 km. Overall, our results provide a tangible perspective to enhance traditional FFA, producing narrower confidence intervals and more robust estimates for design floods and risk assessments.
The term “local counterfactual” should be defined more firmly in the opening section so that readers unfamiliar with the concept can immediately grasp its hydrological meaning.
The introduction would benefit from a clearer explanation of why a 30-km neighborhood and ten neighboring catchments were selected as the basis for local counterfactual generation.
The background section is strong, but it would be helpful to distinguish more explicitly between catchment similarity and storm similarity, as the manuscript presently assumes these are equivalent.
The use of an uncalibrated SCS-CN and GIUH model across more than 13,000 catchments introduces considerable uncertainty, and the authors should include either a brief validation example or a reference to previous calibration results.
The criteria used to define "catchment similarity" deserve more explanation, especially regarding how the attributes were scaled and weighted in the KDTree analysis.
The assumption that storms producing high runoff in a nearby basin are hydrologically meaningful for the catchment of interest should be justified with either empirical evidence or literature support.
The manuscript should explain how independence among counterfactual annual maxima is ensured, given that neighboring catchments may experience correlated rainfall events.
Mixing factual and counterfactual peaks in a single GEV fit may violate standard assumptions, and this issue requires at least a clear justification in the methods section.
Although the QSS results show improvements, the authors should comment on the fact that GEVNCs outperforms GEVCoI even without using any data from the catchment of interest, which may indicate over-smoothing or strong regional influences.
The improvement of GEVNCs with increasing return period is convincingly shown, yet the manuscript should discuss why the lower tail benefits less from the counterfactual approach.
The discussion should reflect that counterfactual extremes depend strongly on the selected time window and may not represent the full range of possible events.
The authors appropriately highlight the short time series, but they omit discussion of potential non-stationarity in rainfall over the 2001–2023 period, which may influence GEV tail behaviour.
The conclusion section accurately summarizes the study, but it should offer clearer guidance on when the counterfactual method might be unsuitable—particularly in regions with strong orographic gradients or highly heterogeneous rainfall patterns.