the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical Note: A Visual Diagnostic Framework for Identifying Non-Stationarity and Mixed Populations in Flood Series
Abstract. Practitioners are commonly faced with conducting flood frequency analysis (ffa) with a specific purpose in mind. They are faced with the temptation to use all the available data and assume that the conditions of ffa are met. Flood frequency analysis relies on the assumptions that the flood time series are: [1] stationary, and, [2] independent, widely known as independent and identically distributed (i.i.d.). It is commonly understood that these conditions do not always exist. In many cases, the sample is composed of mixed populations and low outliers often confuse the analyst by biasing the selection of a distribution. Magnitude outliers may come from a different generating mechanism than the main population of peaks. Timing outliers can also indicate alternative generating mechanisms. A diagnostic framework for visual screening of annual maxima and peaks-over-threshold data is described that can better inform the analyst of the nature of the flood series. This integration allows the identification of mixed populations that are often missed in standard routines.
- Preprint
(1031 KB) - Metadata XML
-
Supplement
(2012 KB) - BibTeX
- EndNote
Status: open (until 17 Jun 2026)
-
RC1: 'Comment on egusphere-2026-2067', Anonymous Referee #1, 25 May 2026
reply
-
AC1: 'Reply on RC1', Paul Whitfield, 08 Jun 2026
reply
Reviewer #1
While there is a useful scientific contribution here, I believe there are some substantial revisions necessary to they way this note is presented (both the text and the figures) before it is ready for publication. This is especially considering that the paper is meant to highlight a methodology for someone to use rather than presenting any new scientific knowledge contribution.
First is regarding the text and the tone of the note. I believe the manuscript is dismissive of the knowledge and experience of practitioners in performing flood frequency analysis. Phrases such as 'temptation (abstract)', 'confuse the analyst (abstract)', 'practitioner bias (par 250)', 'outlier confusion (255)') and more are unfair to the knowlede of practicing water resources professionals, and should be removed, especially if the intent is to create a tool that practitioners will use.
Response:
Thank you for this comment. It certainly was not our intention to ‘speak down’ to practitioners. We will revisit that text and make the language a ‘sharing’.
Second, regarding the overall text, there are examples where claims are provided without evidence. Example 1 (abstract) - 'in many cases the sample is composed of mixed populations'. No evidence is provided to support this claim, and in my experience in Canada this isn't true. Example 2 (par 250) refers to practitioner bias. It is not clear where the authors have observed this practitioner bias or how they know this is the case.
Response:
- While there are cases where samples are not mixed populations, it is common in Canada and in most cold regions that there is a ‘main population’ and others. This is supported in the manuscript by several references, but we admit that this is not exhaustive (Waylen and Woo 1982; Iacobellis et al. 2010;Fischer et al. 2016). The key point is that the assumption of ffa is that the events are all caused by the same generating mechanism. If that is not the case then alternatives should be used (e.g. Barth et al. 2019; Yu et al. 2022; Burn and Whitfield 2025; Fischer and Schumann 2024; Bai et al. 2026).
- “Practitioner bias”. We will use an alternative to this in the revision, and explain more fully the problems that are faced by those doing ffa. We intend on adding text to the discussion similar to:
The challenges encountered in conducting Flood Frequency Analysis (FFA) lie in the subjective decisions or methodological preferences that inadvertently skew the calculation of flood magnitudes and return periods (e.g., the 100-year flood). Because FFA inherently requires interpreting complex, short-duration datasets, analyst choices can significantly influence infrastructure design and hazard mapping (Zhang et al. 2020;Zhang & Stadnyk 2020; Pakhale et al. 2024, Zhao et al. 2024).
Common challenges include:
- Distribution Selection Bias: Analysts often default to preferred or familiar probability models (e.g., Log-Pearson Type III, Generalized Extreme Value). Choosing a distribution with heavier or lighter upper tails can drastically alter the predicted magnitude of extreme, rare events (Kidson & Richards 2005, He et al. 2015).
- Historical Data Truncation: Practitioners might limit their analysis to a recent "base period" of systematic gauging records. This can introduce severe frequency bias, as short observation periods frequently miss massive historical or paleoflood events, leading to an underestimation of flood risks (Zhao et al. 2024).
- Outlier Handling: Subjective decisions on whether to include, adjust, or completely discard exceptionally high or low outliers. Incorrectly classifying a massive outlier as a "rogue" event rather than part of the natural data can suppress design flood limits (Whitfield & Burn 2026)
- Stationarity Assumptions: Continuing to use traditional stationary FFA—assuming past climate and land use represent future conditions—despite observable non-stationarity from climate change and urbanization (Gado & Nguyen 2016; Vidrio-Sahagún et al. 2024).
- Data Series Preferences: Preferring Annual Maximum (AM) series over Peak-Over-Threshold (POT) methods. AM approaches can sometimes underestimate frequent, moderate flood events by missing secondary peaks that occur in a single water year (Dell‘Aira et al. 2023).
Federal Hydrologic and Hydraulic Procedures for Flood Hazard Delineation in Canada and Bulletin 17C in the United States provide guidance on standardized data screening and how to mitigate these subjective biases.
Along with overall issues with the tone and assumptions of the practitioners who perform ffa as part of their professional practice, the figures require substantial editing for clarity. In particular, on Figure 1 - 1) there are items in the legend which do not appear on the actual figure. 2) is is not clear what the light dashed vertical lines represent. 3) Text on the figures is places throughout and would be better placed in one consolidated note below eacher figure. 4) considering the long timeseries figures a and b would be better placed as wider panels stacked on top of eachother rather than side-by-side.
Response:
- We made a choice to include in the legend those features that might appear for consistency. We will modify the figure caption to indicate that the symbol legend indicates all possibilities.
- Thanks for noticing that. We will add to the caption to indicate that the gray dashed lines indicate WW reversals.
- We think this is a matter of preference. Remembering that these are screening plots, not presentation plots, our preference is to include them inside the figure. Further, all these results are produced in the output from the function.
- While we fully agree for presentation plots, here we are comparing two examples to demonstrate functionality. If the user wants to shape/layout differently they would be able to do so. We will consider making a different layout an option with the time series page width with two squarish plots for the others.
- We will modify the caption to include:
“Results are reported in plain text when the test conforms with practice. Red and bold is used to highlight areas than may be of concern.”
For figure 2 - 1) it is unclear what the colors of the bars in the proportion plot represent. 2) there appears to be colors on panel a points that arent shown in the legend. 3) all points in the legend are hollow whereas on the plot they are filled colors. 4) "proportion" is not a descriptive axis title for the sub-plot.
Response:
Sorry, this is simply not correct.
1&2&3. The caption clearly states:
“The colored symbols indicated the months of the year thus indicating seasonal distributions of floods and a histogram of these is placed in a corner of the plot to show the distribution of peaks across months.”
We will add text to the caption to state that the shape of the symbol indicated whether it is “normal” or an “outlier”.
- While proportion and fraction are mathematically correct and frequently used to describe this type of chart, proportion (or relative frequency) is considered more standard and professional in the field of statistics.
For figure 3 - The polor plots are quite novel but still have a number of issues. 1) there are two legends, and some of the colors are reused on both legends, making it difficult to tell what the color or symbols represent 2) there is seeming haphazard coloring and bolding of the note shown in the bottom left corner or fhte plot. 3) the clustering would likely be most useful in a table by year, is this provided with the analysis package?
Response:
- The only overlapping symbol colour is red. Red indicates an outlier of one of three types. We chose to be consistent so someone screening a plot would see outliers.
- This is certainly not haphazard. Normal text indicates conformation with practice. Bold indicates a deviation from practice, and red is used to highlight areas of concern.
- Indeed, the code returns an ordered set of cluster memberships.
- We will add text to the legend indicating the highlighting text rules.
“Results are reported in plain text when the test conforms with practice. Bold alone is used to indicated clustering is warranted. Red and bold is used to highlight areas than may be of concern.”
Overall the figures could be simplified for better clarity, and many of the notes within the figures would be better provided as accompanying text rather than right on the figures.
Response:
With all respect, we disagree. These are designed to be screening plots and the indication of results of tests directly on the figure means the user does not have to look somewhere else for a result. Again, the results of all are provided to the user in addition to these three figures. Our objective is to simplify screening and assist the user with ffa data.
References:
Barth, N. A., Villarini, G., and White, K.: Accounting for mixed populations in flood frequency analysis: Bulletin 17C perspective, Journal of Hydrologic Engineering, 24, 04019002, 10.1061/(ASCE)HE.1943-5584.000176, 2019.
Bobée, B.: Estimation des événements extrêmes de crue par l'analyse fréquentielle: une revue critique, La Houille Blanche, 100-105, 1999.
Bobée, B. and Rasmussen, P. F.: Recent advances in flood frequency analysis, Reviews of Geophysics, 33, 1111-1116, 1995.
Burn, D. H. and Whitfield, P. H.: Shifting cold regions streamflow regimes in North America affect flood frequency analysis, Hydrological Sciences Journal, 70, 51-70, 10.1080/02626667.2024.2422531, 2025.
Dell'Aira, F., Cancelliere, A., and Meier, C. I.: Underestimation of Frequent Floods when Using Annual Maxima for Frequency Analysis: Drivers, Spatial Variability, and Possible Solutions, Authorea Preprints, 2023.
Fischer, S. and Schumann, A. H.: Temporal changes in the frequency of flood types and their Impact on flood statistics, Journal of Hydrology X, 100171, 10.1016/j.hydroa.2024.100171, 2024.
Fischer, S., Schumann, A., and Schulte, M.: Characterisation of seasonal flood types according to timescales in mixed probability distributions, Journal of Hydrology, 539, 38-56, 10.1016/j.jhydrol.2016.05.005, 2016.
Gado, T. A. and Nguyen, V.-T.-V.: An at-site flood estimation method in the context of nonstationarity I. A simulation study, Journal of Hydrology, 535, 710–721, 2016.
He, J., Anderson, A., and Valeo, C.: Bias compensation in flood frequency analysis, Hydrological Sciences Journal, 60, 381-401, 2015.
Iacobellis, V., Fiorentino, M., Gioia, A., and Manfreda, S.: Best fit and selection of theoretical flood frequency distributions based on different runoff generation mechanisms, Water, 2, 239-256, 2010.
Kidson, R. and Richards, K. S.: Flood frequency analysis: assumptions and alternatives, Progress in Physical Geography, 29, 392-410, 10.1191/0309133305pp454ra, 2005.
Olson, S. A.: Estimating flood discharges at selected annual exceedance probabilities for unregulated, rural streams in Vermont, 2023, US Geological Survey2328-0328, 2025.
Pakhale, G., Khosa, R., and Gosain, A. K.: In Today's World, Is It Worth Performing Flood Frequency Analysis Using Observed Streamflow Data? Environmental Advances, 15, 100485, 2024.
Vidrio-Sahagún, C. T., Ruschkowski, J., He, J., and Pietroniro, A.: A practice-oriented framework for stationary and nonstationary flood frequency analysis, Environmental Modelling & Software, 173, 105940, 2024.
Waylen, P. and Woo, M.-K.: Prediction of annual floods generated by mixed processes, Water Resources Research, 18, 1283-1286, 10.1029/WR018i004p01283, 1982.
Whitfield, P. H. and Burn, D. H.: Rogue and Extreme Floods in North America, Journal of Hydrology, 2026.
Yu, G., Wright, D. B., Zhu, Z., Smith, C., and Holman, K. D.: Process-based flood frequency analysis in an agricultural watershed exhibiting nonstationary flood seasonality, Hydrology and Earth System Sciences, 23, 2225-2243, 2019.
Zhang, Z. and Stadnyk, T. A.: Investigation of attributes for identifying homogeneous flood regions for regional flood frequency analysis in Canada, 2020.
Zhang, Z., Stadnyk, T. A., and Burn, D. H.: Identification of a preferred statistical distribution for at-site flood frequency analysis in Canada, Canadian Water Resources Journal/Revue canadienne des ressources hydriques, 45, 43-58, 2020.
Zhao, F., Lange, S., Goswami, B., and Frieler, K.: Frequency bias causes overestimation of climate change impacts on global flood occurrence, Geophysical Research Letters, 51, e2024GL108855, 2024.
Citation: https://doi.org/10.5194/egusphere-2026-2067-AC1
-
AC1: 'Reply on RC1', Paul Whitfield, 08 Jun 2026
reply
-
RC2: 'Comment on egusphere-2026-2067', Anonymous Referee #2, 08 Jun 2026
reply
I am reviewing this technical note for the second time, as it was submitted some months ago to the Hydrological Sciences Journal. I did not read explicit responses to my points raised then (I couldn’t find them on the HESS website) but I can see some efforts made by the authors in revising the manuscript. The inclusion of additional European references improves the balance of the literature review, and the manuscript is now more explicit in stating that the underlying methods are described in other publications by the same authors (three papers in 2025-2026). Nevertheless, my main concern remains. The Technical Note still seems to derive most of its contribution from the presentation of graphical outputs generated by an R package, while the methodological basis for these outputs is largely placed outside the manuscript. Is this enough for a standalone contribution? I do not think that the existence of a plotting procedure is, on its own, sufficient for a scientific Technical Note unless it really is something new. HESS Technical Notes are expected to report new developments, significant advances, or novel aspects of experimental or theoretical methods and techniques. Although the plots presented here may be useful to practitioners, the manuscript does not sufficiently demonstrate, within the paper itself, the methodological novelty or standalone contribution required for a Technical Note in HESS.
Detailed comments are listed below.
Introduction: I would suggest to reorganize it so that the state of the art is first described and the objectives of this technical note are listed in the end. At the moment everything is mixed (e.g., “This note focuses…” at line 35, “In this work, …” at line 53, “This Technical Note describes…” at line 62) and the train-of-thoughts does not flow linearly
Overall structure: there seem to be many repetitions of the same concepts many times in the paper. E.g., the figures are described in Section 2.2, and again in Section 3, and again in Section 4.
Line 85: shouldn’t annual maxima from partial years be discussed in section 2.1?
Line 88: years without observations greater than the threshold and missing years should be treated differently, in my opinion.
Line 90: I would suggest to add more details on the tests here, and in general on the methods used in producing the details in the figures.
Line 101: the sentence seems incomplete. I guess the Authors mean that MGBT is used in here.
Line 104: the Grubb test was mentioned some lines before (for the low floods). Maybe it could be discussed there already.
Line 112: this sentence seems a residual from what is now explained at line 100.
Line 122: “allow”
Line 134: what is kappa? Is it the one described in Section 2.2.2?
Line 146: isn’t this description the same already done at line 88?
Line 165: why having seasonality histograms in Figure 2 when Figure 3 has that in more detail?
Line 172: the kernel densities could be scaled not to exceed the plot borders
Line 180: the methods are described in other 3 papers of the same authors in years 2025 and 2026. Is the contribution of this Technical Note enough for HESS?
Lines 210-235: maybe Section 4 is not the right place where to put the checklist, which is more general and reflects the discussions in Sections 2 and 3. Maybe the Authors meant to have a discussion section here (also the content in page 9 seems more a discussion than part of Section 4). In this discussion part, I would have expected something about having (or not) all the information needed to produce the plots. Sometimes the season of old floods is not available, for example.
Citation: https://doi.org/10.5194/egusphere-2026-2067-RC2 -
RC3: 'Comment on egusphere-2026-2067', Anonymous Referee #3, 11 Jun 2026
reply
Many people evaluate a new tool or workflow by asking whether it performs a task that could not previously be accomplished. However, in modern computational research, novelty can also arise from improvements in efficiency, accessibility, reproducibility, or ease of use. A workflow that reduces computation time, memory requirements, coding effort, or user expertise can provide significant value even if the underlying scientific objective is not new.
I believe a plotting tool such as this would be useful to the hydrologic community, and a technical report describing each component of the workflow would also be valuable, particularly for students, early-career researchers, and practicing engineers. As AI-assisted programming becomes increasingly common, many users are now able to develop and apply computational tools in fields where they may not have extensive software engineering experience. Well-documented, peer-reviewed descriptions of these workflows therefore become an important contribution, enabling users to understand the methodology, reproduce the results, and adapt the tool appropriately rather than treating it as a black box.
I appreciate therefore how the authors go through each component of the diagnostic tools in their methods section and provide substantial examples to the reader. I think the paper could benefit from a slight reframing as an introduction to the specific R-Package they have created. Figures are provided at the bottom of the preprint so I am unsure where they will go in the manuscript, but having figures of each type described in the method section accompanying the text describing would have made the explanations much easier to follow. Similarly, if figure limits allow, including figures illustrating the different trends biases, outliers, etc. would be useful. For the example plots having subplots correspond to the three plots for each of the examples might make more sense.
Generally, I think the paper does need some further grammatical review and standardization particularly for use of abbreviations and defining them. Some are defined and not used; some are used but not defined. Also, the text doesn’t need to break out of paragraph form into lists. I like the checklist for FFA but might be more appropriate as some sort of schematic rather than a list. Other examples of this are the 2 important problems with framework (lines 246 to 256) and list of when flood frequency analysis are violated (lines 262-265).
Line edits:
Line 13: grammatical error- change to “often confusing…”
Line 51: Indicate what CSHShydRology is.
Line 58: guidelines in countries are mentioned but not cited.
Line 70: amax and POT abbreviations are used but were not put in parentheses on previous line.
Line 130: Can you define possibilistic algorithms?
Line 153: LOESS line should be defined as Locally Estimated Scatterplot Smoothing somewhere in paper (I think loess was used earlier).
Line 161: Explain and cite why Gumbel is used for annual maximum series, and pareto used for POT series.
Citation: https://doi.org/10.5194/egusphere-2026-2067-RC3
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 258 | 104 | 23 | 385 | 48 | 20 | 19 |
- HTML: 258
- PDF: 104
- XML: 23
- Total: 385
- Supplement: 48
- BibTeX: 20
- EndNote: 19
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
While there is a useful scientific contribution here, I believe there are some substantial revisions necessary to they way this note is presented (both the text and the figures) before it is ready for publication. This is especially considering that the paper is meant to highlight a methodology for someone to use rather than presenting any new scientific knowledge contribution.
First is regarding the text and the tone of the note. I believe the manuscript is dismissive of the knowledge and experience of practitioners in performing flood frequency analysis. Phrases such as 'temptation (abstract)', 'confuse the analyst (abstract)', 'practitioner bias (par 250)', 'outlier confusion (255)') and more are unfair to the knowlede of practicing water resources professionals, and should be removed, especially if the intent is to create a tool that practitioners will use.
Second, regarding the overall text, there are examples where claims are provided without evidence. Example 1 (abstract) - 'in many cases the sample is composed of mixed populations'. No evidence is provided to support this claim, and in my experience in Canada this isn't true. Example 2 (par 250) refers to practitioner bias. It is not clear where the authors have observed this practitioner bias or how they know this is the case.
Along with overall issues with the tone and assumptions of the practitioners who perform ffa as part of their professional practice, the figures require substantial editing for clarity. In particular, on Figure 1 - 1) there are items in the legend which do not appear on the actual figure. 2) is is not clear what the light dashed vertical lines represent. 3) Text on the figures is places throughout and would be better placed in one consolidated note below eacher figure. 4) considering the long timeseries figures a and b would be better placed as wider panels stacked on top of eachother rather than side-by-side.
For figure 2 - 1) it is unclear what the colors of the bars in the proportion plot represent. 2) there appears to be colors on panel a points that arent shown in the legend. 3) all points in the legend are hollow whereas on the plot they are filled colors. 4) "proportion" is not a descriptive axis title for the sub-plot.
For figure 3 - The polor plots are quite novel but still have a number of issues. 1) there are two legends, and some of the colors are reused on both legends, making it difficult to tell what the color or symbols represent 2) there is seeming haphazard coloring and bolding of the note shown in the bottom left corner or fhte plot. 3) the clustering would likely be most useful in a table by year, is this provided with the analysis package?
Overall the figures could be simplified for better clarity, and many of the notes within the figures would be better provided as accompanying text rather than right on the figures.