the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A temporally continuous, probabilistic framework for observed multi-decadal flood susceptibility evolution in Canadian watersheds
Abstract. This study advances flood susceptibility analysis by introducing a temporally continuous, uncertainty-aware framework that moves beyond static or snapshot-based mapping. We leverage outputs from a machine learning model trained on a multi-decadal record of historic flood events which generated 24 annual flood susceptibility (FS) maps spanning 2000–2023. Annual watershed scores are derived from normalized pixel proportions and thresholds. Generalized Extreme Value (GEV) distributions fitted to these score series define watershed-specific tails of wetness and dryness, with uncertainty quantified via moving-block-bootstrap. Extreme years are refined using neighbour-year expansion to capture short-term hydroclimatic regimes and validated through change-point detection and Mann–Kendall trend analysis. Pixelwise envelopes are generated by aggregating FS values across selected extreme years and spatial smoothing for coherence. National-scale analysis reveals a clear increase in flood susceptibility in the 2020s across many watersheds, with notable clusters of extreme wet years from 2017–2023 in Atlantic Canada and the St. Lawrence River basin. The 2000s serve as a baseline period, the 2010s represent a transition decade with rising FS, and the 2020s demonstrate the strongest increase in wet extremes and spatial clustering. By explicitly treating flood susceptibility as a temporally evolving, stochastic process, this framework provides probabilistic bounds and diagnostic insights that extend beyond conventional static mapping, offering a robust basis for adaptive flood susceptibility assessment and long-term planning under changing hydroclimatic conditions.
- Preprint
(1849 KB) - Metadata XML
-
Supplement
(21062 KB) - BibTeX
- EndNote
Status: open (until 19 May 2026)
-
RC1: 'Comment on egusphere-2026-528', Anonymous Referee #1, 05 May 2026
reply
-
AC1: 'Reply on RC1', Heather McGrath, 12 May 2026
reply
We thank the reviewer for their careful reading of the manuscript and for the constructive comments provided. Below we address each comment in turn and describe the corresponding revisions made to the manuscript.Comment: Overall, the work seems to be lacking novelty. The work is focused on manipulating a Canada wide dataset that seems to have been derived and evaluated in previous work, including the approach of splitting the data into wet and dry classes for each WU from the flood susceptibility pixels, and having a wetness score. Adding ‘uncertainty’ or something that functions as such through a combination of bootstrap and GEV, in this reviewer’s opinion, is not enough.RESPONSE: We thank the reviewer for raising this concern and appreciate the opportunity to clarify the novelty of the work. We agree that the underlying annual FS maps and the machine learning framework build upon previously published work, and we do not present these components as a new susceptibility modelling approach. Rather, the novelty of this study lies in the development of a temporal probabilistic framework for characterizing FS dynamics and extremes using temporally continuous FS data at the national scale.In this framework, annual FS outputs are treated as a temporal signal from which long‑term susceptibility trajectories, regime behaviour, and extreme wet and dry years can be diagnosed. The contribution therefore lies not in a single new algorithm, but in the integration of multiple methodological components into a unified approach for temporal FS characterization.The Abstract, Introduction, and Discussion have been revised to emphasize and clarify this."specific comments"Comment: In the introduction the author mentions advances in flood mapping over last two decades, followed with citations from a 10-year period [P2, L30-32]. In general, the intro/lit review of the manuscript are heavily biased to 2025, I suggest the author do a better review of the existing literature and have that reflected in their manuscript.RESPONSE: Thank you for raising this. The intent in citing the recent studies was to demonstrate that the manuscript is situated within the current state of the literature. We agree however that statements referring to advances over the past decades should be supported by a broader range of foundational and earlier references. We have revised the Introduction accordingly.Comment: It seems odd to look at a study focused on the western US (subset of US) and then comment on how that wouldn’t translate to Canada [P2, L40-44].RESPONSE: We thank the reviewer for this comment. The cited western US study is presented as a conceptually related example. Our intent was to clarify why a station‑based, gauge‑centric framework was not adopted as the primary approach here, given the comparatively sparse, uneven, and discontinuous gauge coverage across Canada. These constraints limit the feasibility of a consistent national‑scale, spatially explicit analysis based solely on hydrometric observations. We have revised the Introduction (P2, L40–44) to clarify that the US study is methodologically distinct and that the proposed framework is designed to address the specific observational and spatial constraints of the Canadian context.Comment: Statements like GEV being a cornerstone of FFA for estimating quantiles [P2, L45] maybe should have a few citations from several years showing that it has consistently been used (or at least a review paper).RESPONSE: Thank you, we agree that describing the GEV distribution as a “cornerstone of flood frequency analysis” should be supported by a broader range of references. We have therefore added multiple citations spanning foundational theory, applied hydrologic studies, and review papers to document the sustained and widespread use of the GEV framework for estimating extreme quantiles in flood frequency analysis over several decades.Comment: “The training/test/validation dataset was balanced and had 268,049 samples.” [P3, L71-72]. What is meant by ‘balanced’ here?RESPONSE: By ‘balanced’ we mean balanced with respect to class labels. Approximately 50% of the samples correspond to wet (flooded) occurrences and the remaining 50% to dry (non‑flooded) occurrences. The text has been revised to clarify this.Comment: A statement is probably needed to describe what the F1-optimal threshold is [P3, L75] (the measure of success of a binary classifier?) and the reference for where the wet/dry cutoff value [P3, L75][P5, L105] came from should also be added.RESPONSE: Thank you for this comment. The F1-optimal threshold refers to the probability cutoff that maximizes the F1 score, defined as the harmonic mean of the precision and recall, thereby balancing false positive and false negatives in a binary classification. Specifically, in this work, the wet/dry cutoff was determined by evaluating model performance across candidate thresholds on a hold-out validation set and selecting the threshold that maximized the F1 score. We have revised the text to explicitly define it and added the reference to McGrath 2025 which describes this procedure in detail.Comment: It seems like all the Figures in the manuscript need to be updated/have increased resolution for readability, currently the readability issues are present both when printed and on screen.RESPONSE: Thank you for this comment. We agree that figure readability is critical. All figures have been revised with increased resolution and adjusted formatting to ensure clarity in both on‑screen viewing and print reproduction, and the updated figures are included in the revised manuscript.Comment: In section 3.3 the author mentions more than once that the SciPy python library was used for implementing the GEV. This is not necessary as the method should work independently from language/library/etc. The section should be rewritten to focus on GEV and how it was used, the formulation, etc., then if the author feels it necessary could add a final statement in the section stating that python/scipy was used for the work done in this manuscript.RESPONSE, Understood, we thank the reviewer for this suggestion, and agree Section 3.3 should focus on the GEV – rather than the implementation details. We have revised the section to present the GEV framework, return-level formulation and uncertainty treatment – independent of any specific programming library or language, and references to SciPy in the main text have been removed.Comment: In section 3.6 some tables are mentioned but seem to not be included in the manuscript, this should be addressed.RESPONSE: Thank you for noting this. The tables referenced in Section 3.6 are provided in the Supplement due to their size and complexity. The manuscript text has been updated to explicitly reference the Supplementary Tables.Comment: Section 3.7 mentions spatial enveloped, however, what is derived here is not really a spatial envelope of the data. Instead it is two raster datasets one of which contains the maximum per pixel values and the other containing the minimum per pixel values from the set of years being evaluated. I suggest renaming/rewording to a more appropriate terminology or better explaining what the spatial envelope is/means here.RESPONSE: Thank you for this comment. We agree that the term ‘spatial envelope’ may not be immediately clear to readers from all backgrounds. In this study, the spatial envelope, refers to pixelwise upper and lower bounds derived from the set of identified extreme wet and dry years, rather than a statistical confidence envelope. Section 3.7 text has been revised to clarify this definition.Comment: In section 3.7 (and results section 4.7) it is not clear what the impact of smoothing has on the results, it would be beneficial to show that impact with a table or figure. The raster maps also seem to not be provided in the manuscript or supplemental data.RESPONSE: Thank you for this comment. We agree that, while Section 3.7 explains the motivation for spatial smoothing, its impact on the results benefits from clearer demonstration.To illustrate this impact, we have added a new figure (Fig8), within WU 01AL000, showing (a) overview, (b) the unsmoothed wet FS values, (c) the smoothed FS values, and (d) sample transect/x-section showing the smoothed and unsmoothed pixel values.At the time of the initial submission, the derived rasters were still undergoing internal review and therefore could not be released. These datasets are now publicly available and can be accessed via NRCan website or STAC browser, Trends and Extremes - Flood Susceptibility Mapping. The manuscript has been updated to include this data access reference.Comment: Section 4.2 discusses/evaluates the wet and dry scores. These score equations can be rearranged into the following form(s): S=a+bx/N-(2a+b)*nwet/N or S=-a-bx/N+(2a+b)*nwet/N where S is the score, x = nwet-nwet+, N=ndry+nwet, a = beta, wdry, or wwet (since beta=wdry=wwet=1), and b=alpha or wwet+. These are linear equations, both dependent on nwet and nwet+ just with opposite signs, so wouldn’t it be obvious that results for the wet analysis are mirrored by that of dry (other than rounding effects)?RESPONSE: We thank the reviewer for this observation and agree that the wet and dry score formulations are algebraically related through the normalized class proportions. Because %dry=1−%wet, both scores can be expressed as linear combinations of %wet and %wet+, and therefore are expected to exhibit substantial inverse correspondence. However, they are not exact sign inverses. The differing coefficients applied to the wet+ subset (wet+ =1.5 versus α=2.0) produce different sensitivities to high-intensity wet conditions, particularly in the distribution tails used for GEV fitting and extreme-year identification. As a result, wet and dry analyses can select different years as extreme, despite their shared dependence on the same underlying proportions. We have clarified in the revised manuscript that the two scores should be interpreted as complementary but strongly related indicators, rather than fully independent measures or mirror-image measures.Comment: Section 4.2 provides some number ‘after BH correction’, but what are the numbers before correction for comparison?RESPONSE: Thank you for this comment. To provide context on the effect of multiple‑testing correction, we have revised Section 4.2 to report summary counts of significant wet and dry trends both before and after BH adjustment. Specifically, before correction, 604 (45.5%) and 605 (45.6%) of Work Units exhibited significant wet and dry trends, respectively (α = 0.05), while after BH correction, 408 (30.7%) wet and 406 (30.6%) dry trends remained significant. The main analysis continues to use BH‑adjusted q‑values for trend classification and mapping to control the false discovery rate at national scale.Comment: Figure 7 mentions “standardized anomaly” but this does not seem to be discussed anywhere else and it is not clear what it is.RESPONSE: We thank the reviewer for noting this. We agree, the use of the term ‘standardized anomaly’ was unclear. In Fig. 7, the red curve represents a per-gauge standardized (z-score) version of annual maximum discharge, included solely to aid visual comparison across sites with different discharge magnitudes. It’s not used for thresholding or analysis. The figure, caption and associated text have been revised.Comment: Data set used seems to be entirely synthetic and it is mentioned in the conclusions that external validation including comparison with river gauges data is a next step, I wonder why has this not been done in this manuscript (or the previous work where the data set came from)?RESPONSE: We thank the reviewer for this important comment. The susceptibility data are model-derived; but are based on a framework trained on observational flood-related data.A direct comparison with hydrometric gauge records is methodologically challenging at the national scale considered here, because the analysis operates on watershed‑scale susceptibility metrics, whereas gauges provide point‑based measurements of discharge or stage. While local and event-based comparison between FS values and gauge data have been explored in prior studies, these do not translate directly here. Nevertheless, to provide observational context, the manuscript includes exploratory comparisons in selected watersheds (Section 4.7) and qualitative consistency checks against Climate Trends and Variations Bulletins, the Canadian Disaster Database, and historical flood records.The primary contribution of this work is methodological—developing a framework to analyse flood susceptibility as a temporally evolving stochastic process. A more formal integration with hydrometric data is an important next step, but is beyond the scope of the present study.Comment: Flooding matters most where people, infrastructure, and agriculture are found. It seems like, from the work where XGBoost was used to derive the data set used here, urban was not well included. It also seems like little if any discussion of this was included in the manuscript, why?RESPONSE: We agree that flooding has the greatest societal relevance where people, infrastructure, and agriculture are present. This study intentionally focuses on FS rather than flood risk, where risk additionally depends on exposure and vulnerability.The FS maps used here include land use/land cover predictors containing urban, agricultural, and natural classes. Urban areas are therefore partially represented indirectly through their effects on hydrologic response and surface characteristics, rather than through explicit population or infrastructure variables. Detailed information on predictor selection, training data, and treatment of urban land cover is provided in a companion publication.The FS products can be combined with population, infrastructure, or agricultural datasets for application-specific risk assessments, at the users discretion.Comment: Assuming that the flood susceptibility of pixels has a relationship with proximity to lakes and rivers, and work unit/SSDA areas vary in size such that the ratio of permanent water pixels to total pixels is different (perhaps smaller in large WU/SSDAs and larger in small WU/SSDAs), how could/does this impact your analysis at the WU/SSDA scale, was this accounted for?RESPONSE: We agree that flood susceptibility is influenced by proximity to rivers and lakes, and that Work Units (WUs) vary in size and in the proportion of permanent water pixels. This heterogeneity is mitigated by the design of the framework, which focuses on within‑WU temporal variability rather than cross‑WU comparisons of absolute magnitude.At the WU scale, all metrics are computed as normalized pixel fractions (% wet, wet+, dry), ensuring that differences in WU size do not directly affect FS values. Proximity to rivers is explicitly represented at the pixel level through predictors such as Euclidean distance to the river network and the HAND index. Importantly, the analysis is concerned primarily with flooded or flood‑prone land surfaces, rather than permanent water bodies, which are not the target of susceptibility characterization.While we do not explicitly normalize by the areal extent of permanent water bodies, extreme wet and dry years are identified relative to each WU’s own score distribution, such that static differences in river or lake fraction primarily affect baseline levels rather than the detection of temporal anomalies. Explicit normalization by permanent water area is a potential refinement for future work."technical corrections"We thank the reviewer for identifying these technical and editorial issues. All points listed below have been addressed in the revised manuscript:
- The citation formatting in the sentence beginning “For example, Ibebuchi and Abu…” has been corrected to “For example, Ibebuchi and Abu (2025) applied an …” (P2, L40).
- The definition of overall accuracy has been clarified (P3, L65–66). Accuracy is now explicitly defined as the fraction of correctly classified wet and dry pixels derived from the confusion matrix, i.e., (TP+TN)/(TP+TN+FP+FN).
- The phrase “for full model details” following the citation list (P3, L66–67) has been removed for clarity, as the cited references sufficiently document the model.
- The sentence fragment “This resulted in a wet/dry cutoff θ_{raw} ≈ 0.383” (P3, L75) has been rewritten as a complete sentence describing the F1‑optimal threshold selection procedure.
- Thresholds were scanned across the range of predicted probabilities, and the cutoff that maximized the F1 score (the harmonic mean of precision and recall) on a hold-out validation set was selected; this resulted in a wet/dry cutoff θraw ≈ 0.383.
- The reference for the National Hydro Network Sub‑Sub Drainage Areas (Work Units) has been clarified to cite the official NRCan NHN GeoBase dataset.
- Spelling inconsistencies between “neighbor” and “neighbour” have been corrected to ensure consistent Canadian English usage throughout the manuscript and figures.
- The acronym WU has been standardized as Work Unit throughout the manuscript.
- Redundant text describing the bounded nature of FS scores and the Weibull GEV domain (P7–P8) has been consolidated to improve readability.
- Minor wording corrections were made to mathematical expressions, including the phrasing of time‑varying parameters (e.g., μ(t), σ(t)).
- The variable n following Equation 12 has been explicitly defined as the number of annual observations in the watershed score series.
- Typographical errors (e.g., “wet and ry years”) have been corrected.
- References to precipitation and flooding in 2017 and 2021 were revised to reflect the availability of Environment and Climate Change Canada reporting, and the text was adjusted to avoid implying equivalent national precipitation bulletins for 2021.
- For 2017 this aligns with documented elevation precipitation conditions reported by Environment and Climate Change Canada (2017) and wide spread flooding recorded in Canadian Disaster Database (CDD) (Government of Canada 2018). For 2021, similar increases are observed in association with documented flood activity, although no equivalent national precipitation bulletin is available for this year.
- Geographic naming errors (e.g., “Rocky Mounty range”) have been corrected.
- Statements regarding spatial persistence patterns were revised to include a supporting reference to Supplementary
- Redundant explanatory statements that followed directly from score definitions were removed.
- The typographical error in the list of GEV‑identified wet years (2014–2015) was corrected.
- The undefined notation xq(t)x_q(t)xq(t) was removed and replaced with descriptive wording (“return‑level thresholds through time”) for clarity.
We thank the reviewer once again for these valuable suggestions, which have contributed to improving the clarity and novelty of the manuscript.Citation: https://doi.org/10.5194/egusphere-2026-528-AC1
-
AC1: 'Reply on RC1', Heather McGrath, 12 May 2026
reply
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 547 | 286 | 71 | 904 | 230 | 54 | 93 |
- HTML: 547
- PDF: 286
- XML: 71
- Total: 904
- Supplement: 230
- BibTeX: 54
- EndNote: 93
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
"general comments"
Overall, the work seems to be lacking novelty. The work is focused on manipulating a Canada wide dataset that seems to have been derived and evaluated in previous work, including the approach of splitting the data into wet and dry classes for each WU from the flood susceptibility pixels, and having a wetness score. Adding ‘uncertainty’ or something that functions as such through a combination of bootstrap and GEV, in this reviewer’s opinion, is not enough.
"specific comments"
"technical corrections"