Technical Note: A Visual Diagnostic Framework for Identifying Non-Stationarity and Mixed Populations in Flood Series

Whitfield, Paul; Burn, Donald

doi:10.5194/egusphere-2026-2067

Preprints

https://doi.org/10.5194/egusphere-2026-2067

Preprints

22 Apr 2026

| 22 Apr 2026

Technical Note: A Visual Diagnostic Framework for Identifying Non-Stationarity and Mixed Populations in Flood Series

Paul Whitfield and Donald Burn

Abstract. Practitioners are commonly faced with conducting flood frequency analysis (ffa) with a specific purpose in mind. They are faced with the temptation to use all the available data and assume that the conditions of ffa are met. Flood frequency analysis relies on the assumptions that the flood time series are: [1] stationary, and, [2] independent, widely known as independent and identically distributed (i.i.d.). It is commonly understood that these conditions do not always exist. In many cases, the sample is composed of mixed populations and low outliers often confuse the analyst by biasing the selection of a distribution. Magnitude outliers may come from a different generating mechanism than the main population of peaks. Timing outliers can also indicate alternative generating mechanisms. A diagnostic framework for visual screening of annual maxima and peaks-over-threshold data is described that can better inform the analyst of the nature of the flood series. This integration allows the identification of mixed populations that are often missed in standard routines.

Received: 10 Apr 2026 – Discussion started: 22 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1031 KB)

Supplement (2012 KB)

Download & links

Paul Whitfield and Donald Burn

Status: closed

RC1:
'Comment on egusphere-2026-2067', Anonymous Referee #1, 25 May 2026

While there is a useful scientific contribution here, I believe there are some substantial revisions necessary to they way this note is presented (both the text and the figures) before it is ready for publication. This is especially considering that the paper is meant to highlight a methodology for someone to use rather than presenting any new scientific knowledge contribution.
First is regarding the text and the tone of the note. I believe the manuscript is dismissive of the knowledge and experience of practitioners in performing flood frequency analysis. Phrases such as 'temptation (abstract)', 'confuse the analyst (abstract)', 'practitioner bias (par 250)', 'outlier confusion (255)') and more are unfair to the knowlede of practicing water resources professionals, and should be removed, especially if the intent is to create a tool that practitioners will use.
Second, regarding the overall text, there are examples where claims are provided without evidence. Example 1 (abstract) - 'in many cases the sample is composed of mixed populations'. No evidence is provided to support this claim, and in my experience in Canada this isn't true. Example 2 (par 250) refers to practitioner bias. It is not clear where the authors have observed this practitioner bias or how they know this is the case.
Along with overall issues with the tone and assumptions of the practitioners who perform ffa as part of their professional practice, the figures require substantial editing for clarity. In particular, on Figure 1 - 1) there are items in the legend which do not appear on the actual figure. 2) is is not clear what the light dashed vertical lines represent. 3) Text on the figures is places throughout and would be better placed in one consolidated note below eacher figure. 4) considering the long timeseries figures a and b would be better placed as wider panels stacked on top of eachother rather than side-by-side.
For figure 2 - 1) it is unclear what the colors of the bars in the proportion plot represent. 2) there appears to be colors on panel a points that arent shown in the legend. 3) all points in the legend are hollow whereas on the plot they are filled colors. 4) "proportion" is not a descriptive axis title for the sub-plot.
For figure 3 - The polor plots are quite novel but still have a number of issues. 1) there are two legends, and some of the colors are reused on both legends, making it difficult to tell what the color or symbols represent 2) there is seeming haphazard coloring and bolding of the note shown in the bottom left corner or fhte plot. 3) the clustering would likely be most useful in a table by year, is this provided with the analysis package?
Overall the figures could be simplified for better clarity, and many of the notes within the figures would be better provided as accompanying text rather than right on the figures.

Citation: https://doi.org/10.5194/egusphere-2026-2067-RC1
- AC1:
  'Reply on RC1', Paul Whitfield, 08 Jun 2026
  Reviewer #1
  While there is a useful scientific contribution here, I believe there are some substantial revisions necessary to they way this note is presented (both the text and the figures) before it is ready for publication. This is especially considering that the paper is meant to highlight a methodology for someone to use rather than presenting any new scientific knowledge contribution.
  First is regarding the text and the tone of the note. I believe the manuscript is dismissive of the knowledge and experience of practitioners in performing flood frequency analysis. Phrases such as 'temptation (abstract)', 'confuse the analyst (abstract)', 'practitioner bias (par 250)', 'outlier confusion (255)') and more are unfair to the knowlede of practicing water resources professionals, and should be removed, especially if the intent is to create a tool that practitioners will use.
  Response:
  Thank you for this comment. It certainly was not our intention to ‘speak down’ to practitioners. We will revisit that text and make the language a ‘sharing’.
  
  Second, regarding the overall text, there are examples where claims are provided without evidence. Example 1 (abstract) - 'in many cases the sample is composed of mixed populations'. No evidence is provided to support this claim, and in my experience in Canada this isn't true. Example 2 (par 250) refers to practitioner bias. It is not clear where the authors have observed this practitioner bias or how they know this is the case.
  Response:
  While there are cases where samples are not mixed populations, it is common in Canada and in most cold regions that there is a ‘main population’ and others. This is supported in the manuscript by several references, but we admit that this is not exhaustive (Waylen and Woo 1982; Iacobellis et al. 2010;Fischer et al. 2016). The key point is that the assumption of ffa is that the events are all caused by the same generating mechanism. If that is not the case then alternatives should be used (e.g. Barth et al. 2019; Yu et al. 2022; Burn and Whitfield 2025; Fischer and Schumann 2024; Bai et al. 2026).
  
  “Practitioner bias”. We will use an alternative to this in the revision, and explain more fully the problems that are faced by those doing ffa. We intend on adding text to the discussion similar to:
  
  The challenges encountered in conducting Flood Frequency Analysis (FFA) lie in the subjective decisions or methodological preferences that inadvertently skew the calculation of flood magnitudes and return periods (e.g., the 100-year flood). Because FFA inherently requires interpreting complex, short-duration datasets, analyst choices can significantly influence infrastructure design and hazard mapping (Zhang et al. 2020;Zhang & Stadnyk 2020; Pakhale et al. 2024, Zhao et al. 2024).
  Common challenges include:
  Distribution Selection Bias: Analysts often default to preferred or familiar probability models (e.g., Log-Pearson Type III, Generalized Extreme Value). Choosing a distribution with heavier or lighter upper tails can drastically alter the predicted magnitude of extreme, rare events (Kidson & Richards 2005, He et al. 2015).
  
  Historical Data Truncation: Practitioners might limit their analysis to a recent "base period" of systematic gauging records. This can introduce severe frequency bias, as short observation periods frequently miss massive historical or paleoflood events, leading to an underestimation of flood risks (Zhao et al. 2024).
  
  Outlier Handling: Subjective decisions on whether to include, adjust, or completely discard exceptionally high or low outliers. Incorrectly classifying a massive outlier as a "rogue" event rather than part of the natural data can suppress design flood limits (Whitfield & Burn 2026)
  
  Stationarity Assumptions: Continuing to use traditional stationary FFA—assuming past climate and land use represent future conditions—despite observable non-stationarity from climate change and urbanization (Gado & Nguyen 2016; Vidrio-Sahagún et al. 2024).
  
  Data Series Preferences: Preferring Annual Maximum (AM) series over Peak-Over-Threshold (POT) methods. AM approaches can sometimes underestimate frequent, moderate flood events by missing secondary peaks that occur in a single water year (Dell‘Aira et al. 2023).
  
  Federal Hydrologic and Hydraulic Procedures for Flood Hazard Delineation in Canada and Bulletin 17C in the United States provide guidance on standardized data screening and how to mitigate these subjective biases.
  
  Along with overall issues with the tone and assumptions of the practitioners who perform ffa as part of their professional practice, the figures require substantial editing for clarity. In particular, on Figure 1 - 1) there are items in the legend which do not appear on the actual figure. 2) is is not clear what the light dashed vertical lines represent. 3) Text on the figures is places throughout and would be better placed in one consolidated note below eacher figure. 4) considering the long timeseries figures a and b would be better placed as wider panels stacked on top of eachother rather than side-by-side.
  Response:
  We made a choice to include in the legend those features that might appear for consistency. We will modify the figure caption to indicate that the symbol legend indicates all possibilities.
  
  Thanks for noticing that. We will add to the caption to indicate that the gray dashed lines indicate WW reversals.
  
  We think this is a matter of preference. Remembering that these are screening plots, not presentation plots, our preference is to include them inside the figure. Further, all these results are produced in the output from the function.
  
  While we fully agree for presentation plots, here we are comparing two examples to demonstrate functionality. If the user wants to shape/layout differently they would be able to do so. We will consider making a different layout an option with the time series page width with two squarish plots for the others.
  
  We will modify the caption to include:
  
  “Results are reported in plain text when the test conforms with practice. Red and bold is used to highlight areas than may be of concern.”
  
  For figure 2 - 1) it is unclear what the colors of the bars in the proportion plot represent. 2) there appears to be colors on panel a points that arent shown in the legend. 3) all points in the legend are hollow whereas on the plot they are filled colors. 4) "proportion" is not a descriptive axis title for the sub-plot.
  Response:
  Sorry, this is simply not correct.
  1&2&3. The caption clearly states:
  “The colored symbols indicated the months of the year thus indicating seasonal distributions of floods and a histogram of these is placed in a corner of the plot to show the distribution of peaks across months.”
  We will add text to the caption to state that the shape of the symbol indicated whether it is “normal” or an “outlier”.
  While proportion and fraction are mathematically correct and frequently used to describe this type of chart, proportion (or relative frequency) is considered more standard and professional in the field of statistics.
  
  For figure 3 - The polor plots are quite novel but still have a number of issues. 1) there are two legends, and some of the colors are reused on both legends, making it difficult to tell what the color or symbols represent 2) there is seeming haphazard coloring and bolding of the note shown in the bottom left corner or fhte plot. 3) the clustering would likely be most useful in a table by year, is this provided with the analysis package?
  Response:
  The only overlapping symbol colour is red. Red indicates an outlier of one of three types. We chose to be consistent so someone screening a plot would see outliers.
  
  This is certainly not haphazard. Normal text indicates conformation with practice. Bold indicates a deviation from practice, and red is used to highlight areas of concern.
  
  Indeed, the code returns an ordered set of cluster memberships.
  
  We will add text to the legend indicating the highlighting text rules.
  
  “Results are reported in plain text when the test conforms with practice. Bold alone is used to indicated clustering is warranted. Red and bold is used to highlight areas than may be of concern.”
  
  Overall the figures could be simplified for better clarity, and many of the notes within the figures would be better provided as accompanying text rather than right on the figures.
  Response:
  With all respect, we disagree. These are designed to be screening plots and the indication of results of tests directly on the figure means the user does not have to look somewhere else for a result. Again, the results of all are provided to the user in addition to these three figures. Our objective is to simplify screening and assist the user with ffa data.
  
  References:
  Barth, N. A., Villarini, G., and White, K.: Accounting for mixed populations in flood frequency analysis: Bulletin 17C perspective, Journal of Hydrologic Engineering, 24, 04019002, 10.1061/(ASCE)HE.1943-5584.000176, 2019.
  Bobée, B.: Estimation des événements extrêmes de crue par l'analyse fréquentielle: une revue critique, La Houille Blanche, 100-105, 1999.
  Bobée, B. and Rasmussen, P. F.: Recent advances in flood frequency analysis, Reviews of Geophysics, 33, 1111-1116, 1995.
  Burn, D. H. and Whitfield, P. H.: Shifting cold regions streamflow regimes in North America affect flood frequency analysis, Hydrological Sciences Journal, 70, 51-70, 10.1080/02626667.2024.2422531, 2025.
  Dell'Aira, F., Cancelliere, A., and Meier, C. I.: Underestimation of Frequent Floods when Using Annual Maxima for Frequency Analysis: Drivers, Spatial Variability, and Possible Solutions, Authorea Preprints, 2023.
  Fischer, S. and Schumann, A. H.: Temporal changes in the frequency of flood types and their Impact on flood statistics, Journal of Hydrology X, 100171, 10.1016/j.hydroa.2024.100171, 2024.
  Fischer, S., Schumann, A., and Schulte, M.: Characterisation of seasonal flood types according to timescales in mixed probability distributions, Journal of Hydrology, 539, 38-56, 10.1016/j.jhydrol.2016.05.005, 2016.
  Gado, T. A. and Nguyen, V.-T.-V.: An at-site flood estimation method in the context of nonstationarity I. A simulation study, Journal of Hydrology, 535, 710–721, 2016.
  He, J., Anderson, A., and Valeo, C.: Bias compensation in flood frequency analysis, Hydrological Sciences Journal, 60, 381-401, 2015.
  Iacobellis, V., Fiorentino, M., Gioia, A., and Manfreda, S.: Best fit and selection of theoretical flood frequency distributions based on different runoff generation mechanisms, Water, 2, 239-256, 2010.
  Kidson, R. and Richards, K. S.: Flood frequency analysis: assumptions and alternatives, Progress in Physical Geography, 29, 392-410, 10.1191/0309133305pp454ra, 2005.
  Olson, S. A.: Estimating flood discharges at selected annual exceedance probabilities for unregulated, rural streams in Vermont, 2023, US Geological Survey2328-0328, 2025.
  Pakhale, G., Khosa, R., and Gosain, A. K.: In Today's World, Is It Worth Performing Flood Frequency Analysis Using Observed Streamflow Data? Environmental Advances, 15, 100485, 2024.
  Vidrio-Sahagún, C. T., Ruschkowski, J., He, J., and Pietroniro, A.: A practice-oriented framework for stationary and nonstationary flood frequency analysis, Environmental Modelling & Software, 173, 105940, 2024.
  Waylen, P. and Woo, M.-K.: Prediction of annual floods generated by mixed processes, Water Resources Research, 18, 1283-1286, 10.1029/WR018i004p01283, 1982.
  Whitfield, P. H. and Burn, D. H.: Rogue and Extreme Floods in North America, Journal of Hydrology, 2026.
  Yu, G., Wright, D. B., Zhu, Z., Smith, C., and Holman, K. D.: Process-based flood frequency analysis in an agricultural watershed exhibiting nonstationary flood seasonality, Hydrology and Earth System Sciences, 23, 2225-2243, 2019.
  Zhang, Z. and Stadnyk, T. A.: Investigation of attributes for identifying homogeneous flood regions for regional flood frequency analysis in Canada, 2020.
  Zhang, Z., Stadnyk, T. A., and Burn, D. H.: Identification of a preferred statistical distribution for at-site flood frequency analysis in Canada, Canadian Water Resources Journal/Revue canadienne des ressources hydriques, 45, 43-58, 2020.
  Zhao, F., Lange, S., Goswami, B., and Frieler, K.: Frequency bias causes overestimation of climate change impacts on global flood occurrence, Geophysical Research Letters, 51, e2024GL108855, 2024.
  
  Citation: https://doi.org/10.5194/egusphere-2026-2067-AC1
RC2:
'Comment on egusphere-2026-2067', Anonymous Referee #2, 08 Jun 2026

I am reviewing this technical note for the second time, as it was submitted some months ago to the Hydrological Sciences Journal. I did not read explicit responses to my points raised then (I couldn’t find them on the HESS website) but I can see some efforts made by the authors in revising the manuscript. The inclusion of additional European references improves the balance of the literature review, and the manuscript is now more explicit in stating that the underlying methods are described in other publications by the same authors (three papers in 2025-2026). Nevertheless, my main concern remains. The Technical Note still seems to derive most of its contribution from the presentation of graphical outputs generated by an R package, while the methodological basis for these outputs is largely placed outside the manuscript. Is this enough for a standalone contribution? I do not think that the existence of a plotting procedure is, on its own, sufficient for a scientific Technical Note unless it really is something new. HESS Technical Notes are expected to report new developments, significant advances, or novel aspects of experimental or theoretical methods and techniques. Although the plots presented here may be useful to practitioners, the manuscript does not sufficiently demonstrate, within the paper itself, the methodological novelty or standalone contribution required for a Technical Note in HESS.
Detailed comments are listed below.
Introduction: I would suggest to reorganize it so that the state of the art is first described and the objectives of this technical note are listed in the end. At the moment everything is mixed (e.g., “This note focuses…” at line 35, “In this work, …” at line 53, “This Technical Note describes…” at line 62) and the train-of-thoughts does not flow linearly
Overall structure: there seem to be many repetitions of the same concepts many times in the paper. E.g., the figures are described in Section 2.2, and again in Section 3, and again in Section 4.
Line 85: shouldn’t annual maxima from partial years be discussed in section 2.1?
Line 88: years without observations greater than the threshold and missing years should be treated differently, in my opinion.
Line 90: I would suggest to add more details on the tests here, and in general on the methods used in producing the details in the figures.
Line 101: the sentence seems incomplete. I guess the Authors mean that MGBT is used in here.
Line 104: the Grubb test was mentioned some lines before (for the low floods). Maybe it could be discussed there already.
Line 112: this sentence seems a residual from what is now explained at line 100.
Line 122: “allow”
Line 134: what is kappa? Is it the one described in Section 2.2.2?
Line 146: isn’t this description the same already done at line 88?
Line 165: why having seasonality histograms in Figure 2 when Figure 3 has that in more detail?
Line 172: the kernel densities could be scaled not to exceed the plot borders
Line 180: the methods are described in other 3 papers of the same authors in years 2025 and 2026. Is the contribution of this Technical Note enough for HESS?
Lines 210-235: maybe Section 4 is not the right place where to put the checklist, which is more general and reflects the discussions in Sections 2 and 3. Maybe the Authors meant to have a discussion section here (also the content in page 9 seems more a discussion than part of Section 4). In this discussion part, I would have expected something about having (or not) all the information needed to produce the plots. Sometimes the season of old floods is not available, for example.

Citation: https://doi.org/10.5194/egusphere-2026-2067-RC2
- AC2: 'Reply on RC2', Paul Whitfield, 16 Jun 2026
  
  Reviewer #2
  I am reviewing this technical note for the second time, as it was submitted some months ago to the Hydrological Sciences Journal. I did not read explicit responses to my points raised then (I couldn’t find them on the HESS website) but I can see some efforts made by the authors in revising the manuscript. The inclusion of additional European references improves the balance of the literature review, and the manuscript is now more explicit in stating that the underlying methods are described in other publications by the same authors (three papers in 2025-2026). Nevertheless, my main concern remains. The Technical Note still seems to derive most of its contribution from the presentation of graphical outputs generated by an R package, while the methodological basis for these outputs is largely placed outside the manuscript. Is this enough for a standalone contribution? I do not think that the existence of a plotting procedure is, on its own, sufficient for a scientific Technical Note unless it really is something new. HESS Technical Notes are expected to report new developments, significant advances, or novel aspects of experimental or theoretical methods and techniques. Although the plots presented here may be useful to practitioners, the manuscript does not sufficiently demonstrate, within the paper itself, the methodological novelty or standalone contribution required for a Technical Note in HESS.
  Response:
  Thanks for taking the time to again review this manuscript. Your previous review held the same opinion that sharing an integrated version of screening methodologies was not “new”. Our opinion is that there is a need to share these methods with others who conduct ffa and enable them to screen their data to better assess how they might need to conduct their analysis. We describe the methods that are used and why and how these will help an analyst with ffa.
  
  Detailed comments are listed below.
  Introduction: I would suggest to reorganize it so that the state of the art is first described and the objectives of this technical note are listed in the end. At the moment everything is mixed (e.g., “This note focuses…” at line 35, “In this work, …” at line 53, “This Technical Note describes…” at line 62) and the train-of-thoughts does not flow linearly
  Response:
  In revising the manuscript, we will reorganize the text in response to these comments.
  
  Overall structure: there seem to be many repetitions of the same concepts many times in the paper. E.g., the figures are described in Section 2.2, and again in Section 3, and again in Section 4.
  Line 85: shouldn’t annual maxima from partial years be discussed in section 2.1?
  Response:
  We will reorganize the text in response to these comments. Section 2 describes the methods that are used, and Section 3 describes how the results are presented in figures. We will add some text to emphasize this distinction, and reduce unnecessary overlap; however, some overlap will still exist.
  
  Line 88: years without observations greater than the threshold and missing years should be treated differently, in my opinion.
  Response:
  As we responded in the HSJ review you kindly provided:
  While we agree in principle, this is not practical. The user can screen amax data or POT data, but not both together. When amax data is screened, the missing data are years without an amax observation, when POT data are screened, the missing years are those without a POT observation; therefore, they are marked similarly.
  Since the reviewer thinks that these should be treated differently, we have made it possible to change the colours used for missing years, and years without a POT observation. Thus, the user can choose any colour that suit them if they want the lines coloured differently.
  
  Line 90: I would suggest to add more details on the tests here, and in general on the methods used in producing the details in the figures.
  Response:
  These are pretty standard methods that are available in R. The revised Supplementary Material provides a list of methods, functions, packages, and key references.
  
  Line 101: the sentence seems incomplete. I guess the Authors mean that MGBT is used in here.
  Response:
  Thanks for noticing this. Corrected.
  
  Line 104: the Grubb test was mentioned some lines before (for the low floods). Maybe it could be discussed there already.
  Response:
  We specifically ordered it in this way because censoring of low peaks based on Cohn et al. (2013) is well established and often warranted. The censoring of high peaks is not advisable, but knowing that the peaks are outliers in magnitude and/or timing is important information. This is described in Whitfield and Burn (2026). So, we respected that chronology in this section of text.
  
  Line 112: this sentence seems a residual from what is now explained at line 100.
  Response:
  Thanks for this comment. We have corrected this text.
  
  Line 122: “allow”
  Response:
  Thanks for catching this.
  
  Line 134: what is kappa? Is it the one described in Section 2.2.2?
  Response:
  It is the same kappa, and the wording will be modified to reflect that.
  
  Line 146: isn’t this description the same already done at line 88?
  Response:
  We believe that the description of the figure should be complete here, so the reader will have these details when they consider Figure 1. We will minimize overlap in the revision.
  
  Line 165: why having seasonality histograms in Figure 2 when Figure 3 has that in more detail?
  Response:
  These are complementary, and this was done to be methodologically consistent. The return period plot visually indicates the month that each observation occurred. The polar plot with fuzzy clustering determines if there are different clusters suggesting different generating mechanisms.
  
  Line 172: the kernel densities could be scaled not to exceed the plot borders
  Response:
  While this would be possible, our view is that trimming them at the figure margin does not interfere with their interpretation. Elsewise, we would have to include scaling information.
  
  Line 180: the methods are described in other 3 papers of the same authors in years 2025 and 2026. Is the contribution of this Technical Note enough for HESS?
  Response:
  Obviously, we believe that sharing the methods for screening flood peak data based on the current state of methods is important. The reviewer’s point might be that others can recreate the methods from the literature themselves, which certainly is true.
  Our opinion is that there is a need to share these methods with others who conduct ffa and enable them to screen their data to better assess how they might need to conduct their analysis. Other reviewers have agreed with this.
  R3: “A workflow that reduces computation time, memory requirements, coding effort, or user expertise can provide significant value even if the underlying scientific objective is not new.”
  HSJ R1: “In my opinion, this is a concise, well-written, and practically valuable Technical Note that addresses a critical yet often underemphasized step in flood frequency analysis, the screening of annual maxima and peaks-over-threshold (POT) data. I really appreciate how the authors focus on ensuring the validity of fundamental statistical assumptions such as independence, stationarity, and homogeneity before moving into frequency estimation. The paper offers a clear and well-motivated rationale for the proposed screening approach, and I like that it is implemented in a reproducible R environment, which makes it directly useful for both researchers and practitioners.
  What I particularly like is that the authors tackle a very practical yet overlooked issue in applied hydrology, the tendency to rely on FFA assumptions without verifying whether the data actually meet them. The integration of multiple statistical and graphical methods (Mann–Kendall, Wald–Wolfowitz, Pettitt, Grubbs–Beck, Rayleigh, and fuzzy clustering) into one cohesive screening framework is both timely and significant, especially in the context of climate-driven shifts in flood-generating mechanisms. The paper manages to combine methodological strength with real-world usability.”
  
  Lines 210-235: maybe Section 4 is not the right place where to put the checklist, which is more general and reflects the discussions in Sections 2 and 3. Maybe the Authors meant to have a discussion section here (also the content in page 9 seems more a discussion than part of Section 4). In this discussion part, I would have expected something about having (or not) all the information needed to produce the plots. Sometimes the season of old floods is not available, for example.
  Response:
  In the revision we could expand this type of discussion to address these two points.
  We are not sure that these two points are within the scope of the Technical note. Perhaps some text that acknowledges them and that addressing them in screening is a good idea but beyond the scope. We intended this screening be applicable to the type of flood data that are commonly publicly available. Such data typically has a date and a magnitude, and sometimes a quality flag. The present code allows for quality codes to be absent.
  
  Citation: https://doi.org/10.5194/egusphere-2026-2067-AC2

RC3: 'Comment on egusphere-2026-2067', Anonymous Referee #3, 11 Jun 2026

Many people evaluate a new tool or workflow by asking whether it performs a task that could not previously be accomplished. However, in modern computational research, novelty can also arise from improvements in efficiency, accessibility, reproducibility, or ease of use. A workflow that reduces computation time, memory requirements, coding effort, or user expertise can provide significant value even if the underlying scientific objective is not new.

I believe a plotting tool such as this would be useful to the hydrologic community, and a technical report describing each component of the workflow would also be valuable, particularly for students, early-career researchers, and practicing engineers. As AI-assisted programming becomes increasingly common, many users are now able to develop and apply computational tools in fields where they may not have extensive software engineering experience. Well-documented, peer-reviewed descriptions of these workflows therefore become an important contribution, enabling users to understand the methodology, reproduce the results, and adapt the tool appropriately rather than treating it as a black box.

I appreciate therefore how the authors go through each component of the diagnostic tools in their methods section and provide substantial examples to the reader. I think the paper could benefit from a slight reframing as an introduction to the specific R-Package they have created. Figures are provided at the bottom of the preprint so I am unsure where they will go in the manuscript, but having figures of each type described in the method section accompanying the text describing would have made the explanations much easier to follow. Similarly, if figure limits allow, including figures illustrating the different trends biases, outliers, etc. would be useful. For the example plots having subplots correspond to the three plots for each of the examples might make more sense.

Generally, I think the paper does need some further grammatical review and standardization particularly for use of abbreviations and defining them. Some are defined and not used; some are used but not defined. Also, the text doesn’t need to break out of paragraph form into lists. I like the checklist for FFA but might be more appropriate as some sort of schematic rather than a list. Other examples of this are the 2 important problems with framework (lines 246 to 256) and list of when flood frequency analysis are violated (lines 262-265).

Line edits:

Line 13: grammatical error- change to “often confusing…”

Line 51: Indicate what CSHShydRology is.

Line 58: guidelines in countries are mentioned but not cited.

Line 70: amax and POT abbreviations are used but were not put in parentheses on previous line.

Line 130: Can you define possibilistic algorithms?

Line 153: LOESS line should be defined as Locally Estimated Scatterplot Smoothing somewhere in paper (I think loess was used earlier).

Line 161: Explain and cite why Gumbel is used for annual maximum series, and pareto used for POT series.

Citation: https://doi.org/10.5194/egusphere-2026-2067-RC3

AC3: 'Reply on RC3', Paul Whitfield, 17 Jun 2026

Reviewer #3

Response:

It is pretty classic to provide the figures to be embedded in the final paper to be provided at the end. These will be placed where appropriate by the publisher.

We provided Supplementary Materials where all the “features” are demonstrated. A cross-reference table is also provided there so the reader can see much more detail than can be provided in the main paper. For example:

Time series plot	Description	Example Figures
Entire or part years	Solid points indicate the maximum of a complete year, open symbols indicate less than a complete year. Are partial years representative?	S1a, S4a, S7a,

Response:

We will address the grammatic issues the reviewer points out.

We respectfully disagree about the use of numbered text and lists. The target reader will be better served by numbered lists which we believe provide clearer easier to follow descriptions.

Line edits:

Line 13: grammatical error- change to “often confusing…”

Response:

Corrected

Line 51: Indicate what CSHShydRology is.

Response:

We have added text to explicitly state that it is an R-package.

Line 58: guidelines in countries are mentioned but not cited.

Response:

We do not think that that is necessary. Most readers should be aware of the specific Guidelines that they need to follow.

Line 70: amax and POT abbreviations are used but were not put in parentheses on previous line. Response:

Corrected

Line 130: Can you define possibilistic algorithms?

Response:

Some additional text will be added to explain the difference between possibilistic and probabilistic clustering.

Line 153: LOESS line should be defined as Locally Estimated Scatterplot Smoothing somewhere in paper (I think loess was used earlier).

Response:

Now defined where first mentioned. Thanks.

Line 161: Explain and cite why Gumbel is used for annual maximum series, and pareto used for POT series.

Response:

Text and references will be added to address this.

Citation: https://doi.org/10.5194/egusphere-2026-2067-AC3

Status: closed

RC1:
'Comment on egusphere-2026-2067', Anonymous Referee #1, 25 May 2026

While there is a useful scientific contribution here, I believe there are some substantial revisions necessary to they way this note is presented (both the text and the figures) before it is ready for publication. This is especially considering that the paper is meant to highlight a methodology for someone to use rather than presenting any new scientific knowledge contribution.
First is regarding the text and the tone of the note. I believe the manuscript is dismissive of the knowledge and experience of practitioners in performing flood frequency analysis. Phrases such as 'temptation (abstract)', 'confuse the analyst (abstract)', 'practitioner bias (par 250)', 'outlier confusion (255)') and more are unfair to the knowlede of practicing water resources professionals, and should be removed, especially if the intent is to create a tool that practitioners will use.
Second, regarding the overall text, there are examples where claims are provided without evidence. Example 1 (abstract) - 'in many cases the sample is composed of mixed populations'. No evidence is provided to support this claim, and in my experience in Canada this isn't true. Example 2 (par 250) refers to practitioner bias. It is not clear where the authors have observed this practitioner bias or how they know this is the case.
Along with overall issues with the tone and assumptions of the practitioners who perform ffa as part of their professional practice, the figures require substantial editing for clarity. In particular, on Figure 1 - 1) there are items in the legend which do not appear on the actual figure. 2) is is not clear what the light dashed vertical lines represent. 3) Text on the figures is places throughout and would be better placed in one consolidated note below eacher figure. 4) considering the long timeseries figures a and b would be better placed as wider panels stacked on top of eachother rather than side-by-side.
For figure 2 - 1) it is unclear what the colors of the bars in the proportion plot represent. 2) there appears to be colors on panel a points that arent shown in the legend. 3) all points in the legend are hollow whereas on the plot they are filled colors. 4) "proportion" is not a descriptive axis title for the sub-plot.
For figure 3 - The polor plots are quite novel but still have a number of issues. 1) there are two legends, and some of the colors are reused on both legends, making it difficult to tell what the color or symbols represent 2) there is seeming haphazard coloring and bolding of the note shown in the bottom left corner or fhte plot. 3) the clustering would likely be most useful in a table by year, is this provided with the analysis package?
Overall the figures could be simplified for better clarity, and many of the notes within the figures would be better provided as accompanying text rather than right on the figures.

Citation: https://doi.org/10.5194/egusphere-2026-2067-RC1
- AC1:
  'Reply on RC1', Paul Whitfield, 08 Jun 2026
  Reviewer #1
  While there is a useful scientific contribution here, I believe there are some substantial revisions necessary to they way this note is presented (both the text and the figures) before it is ready for publication. This is especially considering that the paper is meant to highlight a methodology for someone to use rather than presenting any new scientific knowledge contribution.
  First is regarding the text and the tone of the note. I believe the manuscript is dismissive of the knowledge and experience of practitioners in performing flood frequency analysis. Phrases such as 'temptation (abstract)', 'confuse the analyst (abstract)', 'practitioner bias (par 250)', 'outlier confusion (255)') and more are unfair to the knowlede of practicing water resources professionals, and should be removed, especially if the intent is to create a tool that practitioners will use.
  Response:
  Thank you for this comment. It certainly was not our intention to ‘speak down’ to practitioners. We will revisit that text and make the language a ‘sharing’.
  
  Second, regarding the overall text, there are examples where claims are provided without evidence. Example 1 (abstract) - 'in many cases the sample is composed of mixed populations'. No evidence is provided to support this claim, and in my experience in Canada this isn't true. Example 2 (par 250) refers to practitioner bias. It is not clear where the authors have observed this practitioner bias or how they know this is the case.
  Response:
  While there are cases where samples are not mixed populations, it is common in Canada and in most cold regions that there is a ‘main population’ and others. This is supported in the manuscript by several references, but we admit that this is not exhaustive (Waylen and Woo 1982; Iacobellis et al. 2010;Fischer et al. 2016). The key point is that the assumption of ffa is that the events are all caused by the same generating mechanism. If that is not the case then alternatives should be used (e.g. Barth et al. 2019; Yu et al. 2022; Burn and Whitfield 2025; Fischer and Schumann 2024; Bai et al. 2026).
  
  “Practitioner bias”. We will use an alternative to this in the revision, and explain more fully the problems that are faced by those doing ffa. We intend on adding text to the discussion similar to:
  
  The challenges encountered in conducting Flood Frequency Analysis (FFA) lie in the subjective decisions or methodological preferences that inadvertently skew the calculation of flood magnitudes and return periods (e.g., the 100-year flood). Because FFA inherently requires interpreting complex, short-duration datasets, analyst choices can significantly influence infrastructure design and hazard mapping (Zhang et al. 2020;Zhang & Stadnyk 2020; Pakhale et al. 2024, Zhao et al. 2024).
  Common challenges include:
  Distribution Selection Bias: Analysts often default to preferred or familiar probability models (e.g., Log-Pearson Type III, Generalized Extreme Value). Choosing a distribution with heavier or lighter upper tails can drastically alter the predicted magnitude of extreme, rare events (Kidson & Richards 2005, He et al. 2015).
  
  Historical Data Truncation: Practitioners might limit their analysis to a recent "base period" of systematic gauging records. This can introduce severe frequency bias, as short observation periods frequently miss massive historical or paleoflood events, leading to an underestimation of flood risks (Zhao et al. 2024).
  
  Outlier Handling: Subjective decisions on whether to include, adjust, or completely discard exceptionally high or low outliers. Incorrectly classifying a massive outlier as a "rogue" event rather than part of the natural data can suppress design flood limits (Whitfield & Burn 2026)
  
  Stationarity Assumptions: Continuing to use traditional stationary FFA—assuming past climate and land use represent future conditions—despite observable non-stationarity from climate change and urbanization (Gado & Nguyen 2016; Vidrio-Sahagún et al. 2024).
  
  Data Series Preferences: Preferring Annual Maximum (AM) series over Peak-Over-Threshold (POT) methods. AM approaches can sometimes underestimate frequent, moderate flood events by missing secondary peaks that occur in a single water year (Dell‘Aira et al. 2023).
  
  Federal Hydrologic and Hydraulic Procedures for Flood Hazard Delineation in Canada and Bulletin 17C in the United States provide guidance on standardized data screening and how to mitigate these subjective biases.
  
  Along with overall issues with the tone and assumptions of the practitioners who perform ffa as part of their professional practice, the figures require substantial editing for clarity. In particular, on Figure 1 - 1) there are items in the legend which do not appear on the actual figure. 2) is is not clear what the light dashed vertical lines represent. 3) Text on the figures is places throughout and would be better placed in one consolidated note below eacher figure. 4) considering the long timeseries figures a and b would be better placed as wider panels stacked on top of eachother rather than side-by-side.
  Response:
  We made a choice to include in the legend those features that might appear for consistency. We will modify the figure caption to indicate that the symbol legend indicates all possibilities.
  
  Thanks for noticing that. We will add to the caption to indicate that the gray dashed lines indicate WW reversals.
  
  We think this is a matter of preference. Remembering that these are screening plots, not presentation plots, our preference is to include them inside the figure. Further, all these results are produced in the output from the function.
  
  While we fully agree for presentation plots, here we are comparing two examples to demonstrate functionality. If the user wants to shape/layout differently they would be able to do so. We will consider making a different layout an option with the time series page width with two squarish plots for the others.
  
  We will modify the caption to include:
  
  “Results are reported in plain text when the test conforms with practice. Red and bold is used to highlight areas than may be of concern.”
  
  For figure 2 - 1) it is unclear what the colors of the bars in the proportion plot represent. 2) there appears to be colors on panel a points that arent shown in the legend. 3) all points in the legend are hollow whereas on the plot they are filled colors. 4) "proportion" is not a descriptive axis title for the sub-plot.
  Response:
  Sorry, this is simply not correct.
  1&2&3. The caption clearly states:
  “The colored symbols indicated the months of the year thus indicating seasonal distributions of floods and a histogram of these is placed in a corner of the plot to show the distribution of peaks across months.”
  We will add text to the caption to state that the shape of the symbol indicated whether it is “normal” or an “outlier”.
  While proportion and fraction are mathematically correct and frequently used to describe this type of chart, proportion (or relative frequency) is considered more standard and professional in the field of statistics.
  
  For figure 3 - The polor plots are quite novel but still have a number of issues. 1) there are two legends, and some of the colors are reused on both legends, making it difficult to tell what the color or symbols represent 2) there is seeming haphazard coloring and bolding of the note shown in the bottom left corner or fhte plot. 3) the clustering would likely be most useful in a table by year, is this provided with the analysis package?
  Response:
  The only overlapping symbol colour is red. Red indicates an outlier of one of three types. We chose to be consistent so someone screening a plot would see outliers.
  
  This is certainly not haphazard. Normal text indicates conformation with practice. Bold indicates a deviation from practice, and red is used to highlight areas of concern.
  
  Indeed, the code returns an ordered set of cluster memberships.
  
  We will add text to the legend indicating the highlighting text rules.
  
  “Results are reported in plain text when the test conforms with practice. Bold alone is used to indicated clustering is warranted. Red and bold is used to highlight areas than may be of concern.”
  
  Overall the figures could be simplified for better clarity, and many of the notes within the figures would be better provided as accompanying text rather than right on the figures.
  Response:
  With all respect, we disagree. These are designed to be screening plots and the indication of results of tests directly on the figure means the user does not have to look somewhere else for a result. Again, the results of all are provided to the user in addition to these three figures. Our objective is to simplify screening and assist the user with ffa data.
  
  References:
  Barth, N. A., Villarini, G., and White, K.: Accounting for mixed populations in flood frequency analysis: Bulletin 17C perspective, Journal of Hydrologic Engineering, 24, 04019002, 10.1061/(ASCE)HE.1943-5584.000176, 2019.
  Bobée, B.: Estimation des événements extrêmes de crue par l'analyse fréquentielle: une revue critique, La Houille Blanche, 100-105, 1999.
  Bobée, B. and Rasmussen, P. F.: Recent advances in flood frequency analysis, Reviews of Geophysics, 33, 1111-1116, 1995.
  Burn, D. H. and Whitfield, P. H.: Shifting cold regions streamflow regimes in North America affect flood frequency analysis, Hydrological Sciences Journal, 70, 51-70, 10.1080/02626667.2024.2422531, 2025.
  Dell'Aira, F., Cancelliere, A., and Meier, C. I.: Underestimation of Frequent Floods when Using Annual Maxima for Frequency Analysis: Drivers, Spatial Variability, and Possible Solutions, Authorea Preprints, 2023.
  Fischer, S. and Schumann, A. H.: Temporal changes in the frequency of flood types and their Impact on flood statistics, Journal of Hydrology X, 100171, 10.1016/j.hydroa.2024.100171, 2024.
  Fischer, S., Schumann, A., and Schulte, M.: Characterisation of seasonal flood types according to timescales in mixed probability distributions, Journal of Hydrology, 539, 38-56, 10.1016/j.jhydrol.2016.05.005, 2016.
  Gado, T. A. and Nguyen, V.-T.-V.: An at-site flood estimation method in the context of nonstationarity I. A simulation study, Journal of Hydrology, 535, 710–721, 2016.
  He, J., Anderson, A., and Valeo, C.: Bias compensation in flood frequency analysis, Hydrological Sciences Journal, 60, 381-401, 2015.
  Iacobellis, V., Fiorentino, M., Gioia, A., and Manfreda, S.: Best fit and selection of theoretical flood frequency distributions based on different runoff generation mechanisms, Water, 2, 239-256, 2010.
  Kidson, R. and Richards, K. S.: Flood frequency analysis: assumptions and alternatives, Progress in Physical Geography, 29, 392-410, 10.1191/0309133305pp454ra, 2005.
  Olson, S. A.: Estimating flood discharges at selected annual exceedance probabilities for unregulated, rural streams in Vermont, 2023, US Geological Survey2328-0328, 2025.
  Pakhale, G., Khosa, R., and Gosain, A. K.: In Today's World, Is It Worth Performing Flood Frequency Analysis Using Observed Streamflow Data? Environmental Advances, 15, 100485, 2024.
  Vidrio-Sahagún, C. T., Ruschkowski, J., He, J., and Pietroniro, A.: A practice-oriented framework for stationary and nonstationary flood frequency analysis, Environmental Modelling & Software, 173, 105940, 2024.
  Waylen, P. and Woo, M.-K.: Prediction of annual floods generated by mixed processes, Water Resources Research, 18, 1283-1286, 10.1029/WR018i004p01283, 1982.
  Whitfield, P. H. and Burn, D. H.: Rogue and Extreme Floods in North America, Journal of Hydrology, 2026.
  Yu, G., Wright, D. B., Zhu, Z., Smith, C., and Holman, K. D.: Process-based flood frequency analysis in an agricultural watershed exhibiting nonstationary flood seasonality, Hydrology and Earth System Sciences, 23, 2225-2243, 2019.
  Zhang, Z. and Stadnyk, T. A.: Investigation of attributes for identifying homogeneous flood regions for regional flood frequency analysis in Canada, 2020.
  Zhang, Z., Stadnyk, T. A., and Burn, D. H.: Identification of a preferred statistical distribution for at-site flood frequency analysis in Canada, Canadian Water Resources Journal/Revue canadienne des ressources hydriques, 45, 43-58, 2020.
  Zhao, F., Lange, S., Goswami, B., and Frieler, K.: Frequency bias causes overestimation of climate change impacts on global flood occurrence, Geophysical Research Letters, 51, e2024GL108855, 2024.
  
  Citation: https://doi.org/10.5194/egusphere-2026-2067-AC1
RC2:
'Comment on egusphere-2026-2067', Anonymous Referee #2, 08 Jun 2026

I am reviewing this technical note for the second time, as it was submitted some months ago to the Hydrological Sciences Journal. I did not read explicit responses to my points raised then (I couldn’t find them on the HESS website) but I can see some efforts made by the authors in revising the manuscript. The inclusion of additional European references improves the balance of the literature review, and the manuscript is now more explicit in stating that the underlying methods are described in other publications by the same authors (three papers in 2025-2026). Nevertheless, my main concern remains. The Technical Note still seems to derive most of its contribution from the presentation of graphical outputs generated by an R package, while the methodological basis for these outputs is largely placed outside the manuscript. Is this enough for a standalone contribution? I do not think that the existence of a plotting procedure is, on its own, sufficient for a scientific Technical Note unless it really is something new. HESS Technical Notes are expected to report new developments, significant advances, or novel aspects of experimental or theoretical methods and techniques. Although the plots presented here may be useful to practitioners, the manuscript does not sufficiently demonstrate, within the paper itself, the methodological novelty or standalone contribution required for a Technical Note in HESS.
Detailed comments are listed below.
Introduction: I would suggest to reorganize it so that the state of the art is first described and the objectives of this technical note are listed in the end. At the moment everything is mixed (e.g., “This note focuses…” at line 35, “In this work, …” at line 53, “This Technical Note describes…” at line 62) and the train-of-thoughts does not flow linearly
Overall structure: there seem to be many repetitions of the same concepts many times in the paper. E.g., the figures are described in Section 2.2, and again in Section 3, and again in Section 4.
Line 85: shouldn’t annual maxima from partial years be discussed in section 2.1?
Line 88: years without observations greater than the threshold and missing years should be treated differently, in my opinion.
Line 90: I would suggest to add more details on the tests here, and in general on the methods used in producing the details in the figures.
Line 101: the sentence seems incomplete. I guess the Authors mean that MGBT is used in here.
Line 104: the Grubb test was mentioned some lines before (for the low floods). Maybe it could be discussed there already.
Line 112: this sentence seems a residual from what is now explained at line 100.
Line 122: “allow”
Line 134: what is kappa? Is it the one described in Section 2.2.2?
Line 146: isn’t this description the same already done at line 88?
Line 165: why having seasonality histograms in Figure 2 when Figure 3 has that in more detail?
Line 172: the kernel densities could be scaled not to exceed the plot borders
Line 180: the methods are described in other 3 papers of the same authors in years 2025 and 2026. Is the contribution of this Technical Note enough for HESS?
Lines 210-235: maybe Section 4 is not the right place where to put the checklist, which is more general and reflects the discussions in Sections 2 and 3. Maybe the Authors meant to have a discussion section here (also the content in page 9 seems more a discussion than part of Section 4). In this discussion part, I would have expected something about having (or not) all the information needed to produce the plots. Sometimes the season of old floods is not available, for example.

Citation: https://doi.org/10.5194/egusphere-2026-2067-RC2
- AC2: 'Reply on RC2', Paul Whitfield, 16 Jun 2026
  
  Reviewer #2
  I am reviewing this technical note for the second time, as it was submitted some months ago to the Hydrological Sciences Journal. I did not read explicit responses to my points raised then (I couldn’t find them on the HESS website) but I can see some efforts made by the authors in revising the manuscript. The inclusion of additional European references improves the balance of the literature review, and the manuscript is now more explicit in stating that the underlying methods are described in other publications by the same authors (three papers in 2025-2026). Nevertheless, my main concern remains. The Technical Note still seems to derive most of its contribution from the presentation of graphical outputs generated by an R package, while the methodological basis for these outputs is largely placed outside the manuscript. Is this enough for a standalone contribution? I do not think that the existence of a plotting procedure is, on its own, sufficient for a scientific Technical Note unless it really is something new. HESS Technical Notes are expected to report new developments, significant advances, or novel aspects of experimental or theoretical methods and techniques. Although the plots presented here may be useful to practitioners, the manuscript does not sufficiently demonstrate, within the paper itself, the methodological novelty or standalone contribution required for a Technical Note in HESS.
  Response:
  Thanks for taking the time to again review this manuscript. Your previous review held the same opinion that sharing an integrated version of screening methodologies was not “new”. Our opinion is that there is a need to share these methods with others who conduct ffa and enable them to screen their data to better assess how they might need to conduct their analysis. We describe the methods that are used and why and how these will help an analyst with ffa.
  
  Detailed comments are listed below.
  Introduction: I would suggest to reorganize it so that the state of the art is first described and the objectives of this technical note are listed in the end. At the moment everything is mixed (e.g., “This note focuses…” at line 35, “In this work, …” at line 53, “This Technical Note describes…” at line 62) and the train-of-thoughts does not flow linearly
  Response:
  In revising the manuscript, we will reorganize the text in response to these comments.
  
  Overall structure: there seem to be many repetitions of the same concepts many times in the paper. E.g., the figures are described in Section 2.2, and again in Section 3, and again in Section 4.
  Line 85: shouldn’t annual maxima from partial years be discussed in section 2.1?
  Response:
  We will reorganize the text in response to these comments. Section 2 describes the methods that are used, and Section 3 describes how the results are presented in figures. We will add some text to emphasize this distinction, and reduce unnecessary overlap; however, some overlap will still exist.
  
  Line 88: years without observations greater than the threshold and missing years should be treated differently, in my opinion.
  Response:
  As we responded in the HSJ review you kindly provided:
  While we agree in principle, this is not practical. The user can screen amax data or POT data, but not both together. When amax data is screened, the missing data are years without an amax observation, when POT data are screened, the missing years are those without a POT observation; therefore, they are marked similarly.
  Since the reviewer thinks that these should be treated differently, we have made it possible to change the colours used for missing years, and years without a POT observation. Thus, the user can choose any colour that suit them if they want the lines coloured differently.
  
  Line 90: I would suggest to add more details on the tests here, and in general on the methods used in producing the details in the figures.
  Response:
  These are pretty standard methods that are available in R. The revised Supplementary Material provides a list of methods, functions, packages, and key references.
  
  Line 101: the sentence seems incomplete. I guess the Authors mean that MGBT is used in here.
  Response:
  Thanks for noticing this. Corrected.
  
  Line 104: the Grubb test was mentioned some lines before (for the low floods). Maybe it could be discussed there already.
  Response:
  We specifically ordered it in this way because censoring of low peaks based on Cohn et al. (2013) is well established and often warranted. The censoring of high peaks is not advisable, but knowing that the peaks are outliers in magnitude and/or timing is important information. This is described in Whitfield and Burn (2026). So, we respected that chronology in this section of text.
  
  Line 112: this sentence seems a residual from what is now explained at line 100.
  Response:
  Thanks for this comment. We have corrected this text.
  
  Line 122: “allow”
  Response:
  Thanks for catching this.
  
  Line 134: what is kappa? Is it the one described in Section 2.2.2?
  Response:
  It is the same kappa, and the wording will be modified to reflect that.
  
  Line 146: isn’t this description the same already done at line 88?
  Response:
  We believe that the description of the figure should be complete here, so the reader will have these details when they consider Figure 1. We will minimize overlap in the revision.
  
  Line 165: why having seasonality histograms in Figure 2 when Figure 3 has that in more detail?
  Response:
  These are complementary, and this was done to be methodologically consistent. The return period plot visually indicates the month that each observation occurred. The polar plot with fuzzy clustering determines if there are different clusters suggesting different generating mechanisms.
  
  Line 172: the kernel densities could be scaled not to exceed the plot borders
  Response:
  While this would be possible, our view is that trimming them at the figure margin does not interfere with their interpretation. Elsewise, we would have to include scaling information.
  
  Line 180: the methods are described in other 3 papers of the same authors in years 2025 and 2026. Is the contribution of this Technical Note enough for HESS?
  Response:
  Obviously, we believe that sharing the methods for screening flood peak data based on the current state of methods is important. The reviewer’s point might be that others can recreate the methods from the literature themselves, which certainly is true.
  Our opinion is that there is a need to share these methods with others who conduct ffa and enable them to screen their data to better assess how they might need to conduct their analysis. Other reviewers have agreed with this.
  R3: “A workflow that reduces computation time, memory requirements, coding effort, or user expertise can provide significant value even if the underlying scientific objective is not new.”
  HSJ R1: “In my opinion, this is a concise, well-written, and practically valuable Technical Note that addresses a critical yet often underemphasized step in flood frequency analysis, the screening of annual maxima and peaks-over-threshold (POT) data. I really appreciate how the authors focus on ensuring the validity of fundamental statistical assumptions such as independence, stationarity, and homogeneity before moving into frequency estimation. The paper offers a clear and well-motivated rationale for the proposed screening approach, and I like that it is implemented in a reproducible R environment, which makes it directly useful for both researchers and practitioners.
  What I particularly like is that the authors tackle a very practical yet overlooked issue in applied hydrology, the tendency to rely on FFA assumptions without verifying whether the data actually meet them. The integration of multiple statistical and graphical methods (Mann–Kendall, Wald–Wolfowitz, Pettitt, Grubbs–Beck, Rayleigh, and fuzzy clustering) into one cohesive screening framework is both timely and significant, especially in the context of climate-driven shifts in flood-generating mechanisms. The paper manages to combine methodological strength with real-world usability.”
  
  Lines 210-235: maybe Section 4 is not the right place where to put the checklist, which is more general and reflects the discussions in Sections 2 and 3. Maybe the Authors meant to have a discussion section here (also the content in page 9 seems more a discussion than part of Section 4). In this discussion part, I would have expected something about having (or not) all the information needed to produce the plots. Sometimes the season of old floods is not available, for example.
  Response:
  In the revision we could expand this type of discussion to address these two points.
  We are not sure that these two points are within the scope of the Technical note. Perhaps some text that acknowledges them and that addressing them in screening is a good idea but beyond the scope. We intended this screening be applicable to the type of flood data that are commonly publicly available. Such data typically has a date and a magnitude, and sometimes a quality flag. The present code allows for quality codes to be absent.
  
  Citation: https://doi.org/10.5194/egusphere-2026-2067-AC2

RC3: 'Comment on egusphere-2026-2067', Anonymous Referee #3, 11 Jun 2026

Line edits:

Line 13: grammatical error- change to “often confusing…”

Line 51: Indicate what CSHShydRology is.

Line 58: guidelines in countries are mentioned but not cited.

Line 70: amax and POT abbreviations are used but were not put in parentheses on previous line.

Line 130: Can you define possibilistic algorithms?

Line 153: LOESS line should be defined as Locally Estimated Scatterplot Smoothing somewhere in paper (I think loess was used earlier).

Line 161: Explain and cite why Gumbel is used for annual maximum series, and pareto used for POT series.

Citation: https://doi.org/10.5194/egusphere-2026-2067-RC3

AC3: 'Reply on RC3', Paul Whitfield, 17 Jun 2026

Reviewer #3

Response:

It is pretty classic to provide the figures to be embedded in the final paper to be provided at the end. These will be placed where appropriate by the publisher.

Time series plot	Description	Example Figures
Entire or part years	Solid points indicate the maximum of a complete year, open symbols indicate less than a complete year. Are partial years representative?	S1a, S4a, S7a,

Response:

We will address the grammatic issues the reviewer points out.

We respectfully disagree about the use of numbered text and lists. The target reader will be better served by numbered lists which we believe provide clearer easier to follow descriptions.

Line edits:

Line 13: grammatical error- change to “often confusing…”

Response:

Corrected

Line 51: Indicate what CSHShydRology is.

Response:

We have added text to explicitly state that it is an R-package.

Line 58: guidelines in countries are mentioned but not cited.

Response:

We do not think that that is necessary. Most readers should be aware of the specific Guidelines that they need to follow.

Line 70: amax and POT abbreviations are used but were not put in parentheses on previous line. Response:

Corrected

Line 130: Can you define possibilistic algorithms?

Response:

Some additional text will be added to explain the difference between possibilistic and probabilistic clustering.

Line 153: LOESS line should be defined as Locally Estimated Scatterplot Smoothing somewhere in paper (I think loess was used earlier).

Response:

Now defined where first mentioned. Thanks.

Line 161: Explain and cite why Gumbel is used for annual maximum series, and pareto used for POT series.

Response:

Text and references will be added to address this.

Citation: https://doi.org/10.5194/egusphere-2026-2067-AC3

Paul Whitfield and Donald Burn

Supplement

https://doi.org/10.5194/egusphere-2026-2067-supplement

Paul Whitfield and Donald Burn

Viewed

Total article views: 395 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
266	106	23	395	48	21	20

HTML: 266
PDF: 106
XML: 23
Total: 395
Supplement: 48
BibTeX: 21
EndNote: 20

Views and downloads (calculated since 22 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	115	53	11	179
May 2026	115	41	5	161
Jun 2026	28	10	7	45
Jul 2026	8	2	0	10

Cumulative views and downloads (calculated since 22 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	115	53	11	179
May 2026	115	41	5	161
Jun 2026	28	10	7	45
Jul 2026	8	2	0	10

Viewed (geographical distribution)

Total article views: 382 (including HTML, PDF, and XML) Thereof 382 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 16 Jul 2026

Short summary

A visual screening of annual maxima and peaks-over-threshold data is introduced to improve the reliability of flood frequency analysis (FFA). The screening identifies magnitude and timing outlier detection to identify mixed populations and alternative generating mechanisms that are frequently overlooked. It provides practitioners with a structured method to validate assumptions, avoid bias and enhances flood risk assessments.


Total:	0
HTML:	0
PDF:	0
XML:	0