Typology of flood trends across West Africa
Abstract. West Africa is a region highly vulnerable to global change, where hydro-climatic extremes become increasingly frequent and intense. While previous floods studies have been limited to few river basins with scattered observational data, the availability of a recent regional hydrometric in-situ database offers the opportunity for studying hydrological extremes at the west African scale. This study investigates trends in hydrological extremes using annual maximum streamflow (AMAX) over 80 West African catchments from 1950 to 2018. A rigorous flood frequency framework based on Extreme Value Theory is used for trends detection in return levels, comparing a wide variety of non stationary models. In order to synthesize the regional patterns, an objective typology of flood evolution trajectories is developed using k-means clustering. The results reveal widespread non-stationarity in flood extremes, affecting 85 % of the catchments, with contrasted trajectories of extremes. The clustering reveals 6 main types of trends in hydrological extremes. While most catchments exhibit a general decline in flood magnitude until the major droughts affecting the region (1970s–1990s), recent decades show divergent evolutions, ranging from stabilization to moderate or strong recovery. At the regional scale, a north–south gradient emerges. The Sahelian catchments display trajectories ranging from persistent decreases to weak flood intensification, whereas the Sudano-Guinean basins are predominantly characterized by decreasing trends, with additional nuances related to the shape and magnitude of these declines. Overall, these results challenge the assumption of an homogeneous signal of hydrological intensification over West Africa and provide a more nuanced depiction of typical Sahelian and Sudano-Guinean flood evolution patterns. Furthermore the contrasted trends identified are only weakly explained by catchment physical or hydrological characteristics, underscoring the complexity of non-stationary hydrological dynamics in the region. By documenting the diversity of long-term flood trajectories over 1950–2018, this study refines the regional narrative of hydrological changes in West Africa and has important implications for anticipating future hydro-climatic extremes and for supporting the strengthening of resilience of ecosystems and societies.
Dear Authors,
I have carefully evaluated the manuscript entitled “Typology of flood trends across West Africa.” The study addresses a scientifically important problem for hydrology and environmental risk management in a region where long-term discharge observations are difficult to access and where flood hazards interact with rapid demographic and land-use changes. The manuscript’s main strengths lie in its regional scope, its use of an in-situ hydrometric database covering 80 catchments over several decades, its attempt to move beyond simple linear trend detection, and its effort to synthesize heterogeneous flood trajectories through a typology of return-level evolution. The combination of non-stationary extreme value modelling and clustering has clear potential to enrich the literature on flood non-stationarity in West Africa.
Despite these strengths, the manuscript contains important weaknesses that should be addressed before publication. Several methodological choices need clearer justification and sensitivity testing, especially the extraction of AMAX series, the non-stationary GEV model selection procedure, the treatment of change points, the interpretation of 2-year return levels, and the clustering strategy. Some conclusions are stronger than the evidence currently supports, particularly those concerning regional robustness, spatial organization, and hydrological mechanisms.
I therefore recommend a Major Revision.
Lines 25-36: The introduction begins with a broad framing around global change, agricultural production, ecosystems, adaptation strategies, local knowledge, and vegetation recovery. These elements are relevant to West African environmental change, but the link with the manuscript’s actual object, namely AMAX flood trends, remains indirect. This broad framing risks diluting the hydrological motivation of the paper and may suggest that the study will address socio-ecological resilience or adaptation processes, which it does not. The introduction would be stronger if this section more explicitly distinguished contextual background from the specific hydrological problem being studied. The authors could retain the broader regional context, but should more quickly connect it to observed flood regimes, design hydrology, and the need for regional-scale trend typologies.
Lines 37-48: The discussion of increasing flood damage and obsolete design methods is important, but the text tends to merge three different dimensions of flood risk: hydrological hazard, exposure, and vulnerability. An increase in affected populations does not necessarily demonstrate an increase in hydrological extremes, and this distinction is central because the manuscript later shows that flood magnitudes have not increased uniformly across West Africa. If the argument is that stationarity-based design may be unsafe, the manuscript should separate evidence for changing discharge extremes from evidence for increasing damages. Recent West African flood-risk studies could help clarify this distinction, for example 10.1007/s43621-026-02881-y and 10.1080/19475705.2026.2675778. This would make the motivation more consistent with the later finding that hazard trends are spatially heterogeneous.
Lines 87-107: The distinction between Sahelian Hortonian-dominated systems and Sudano-Guinean Hewlettian-dominated systems is useful, but the manuscript does not clearly specify how catchments are assigned to these zones in the later interpretation. Since the north-south gradient becomes one of the central results, the classification of stations into Sahelian, Sudanian, Sudano-Guinean, or other hydro-climatic domains should be explicit and reproducible. A map or table assigning each station to a zone would reduce ambiguity. Without this, the regional interpretation may be influenced by visual inspection rather than by a defined spatial criterion.
Lines 116-125: The procedure used to define the flood season is central to AMAX extraction, but several parameters are not reproducibly specified. The minimum peak width is given, yet the “sufficient relative peak height” threshold is not stated, and the manual fine-tuning procedure is not described in enough detail. Since missing data are assessed within the detected flood season, small differences in season boundaries could affect whether an annual maximum is retained or discarded. The authors should provide the exact threshold values, explain how the climatological hydrograph was constructed under missing data, describe the manual inspection protocol, and indicate whether the resulting flood-season windows are available in supplementary material.
Lines 124-129: The 10% missing-data threshold within the flood season is a pragmatic choice, but its effect on AMAX selection is not evaluated. In catchments with short flood seasons or sharp flood peaks, a few missing days can be critical if they coincide with peak-flow conditions. This may introduce temporal bias if missingness changes over time, especially after the sharp decline in valid maxima reported later. A sensitivity analysis using alternative missing-data thresholds, or at least a summary of how many AMAX values are lost under different thresholds, would strengthen confidence in the trend results.
Lines 126-133: The selection criteria require at least 35 years of data, no more than seven consecutive missing years, and at least 10 years before and after 1980. These criteria are defensible, but their influence on spatial and temporal representativeness is not sufficiently shown. Because only 80 of 395 stations remain, the selected network may overrepresent basins with better monitoring continuity or particular institutional histories. The statement that recent records remain sufficiently well distributed regionally needs quantitative support, such as maps of retained stations by decade, distributions of record length and last observation year, and a sensitivity test showing whether the main cluster patterns persist under stricter or looser selection rules.
Lines 135-138: The methodology is described as generic and replicable, but parts of the workflow depend on manual inspection of flood seasonality and on a specific chain of model comparisons. Replicability requires more than a conceptual description. The manuscript should specify whether code, station lists, AMAX series, selected flood-season windows, fitted model parameters, and clustering outputs will be made available. Without these elements, independent reproduction of the typology would be difficult, particularly because small differences in AMAX extraction or model selection may change cluster membership.
Lines 165-178: The non-stationary model families are appropriate for capturing different temporal shapes, but the constraints on the transition year need clearer justification. The text states that the transitional year is estimated around the 1970s/1980s, while the breakpoint is imposed in the second third of observed years. For stations with different record lengths and missing periods, this may imply different allowable calendar ranges. Since many results are interpreted around the drought period, the authors should provide the exact admissible range for each station or the general rule in calendar years. Otherwise, there is a risk that the model structure partly imposes the timing of the inferred changes.
Lines 175-178: The treatment of the scale parameter as σ(t) = σ1 × µ(t) deserves more discussion. This formulation effectively imposes a constant coefficient of variation and ties variability changes directly to location changes. It may prevent independent variance trends and can create constraints if µ(t) approaches low values. Because the manuscript later interprets time-varying σ as evidence of changes in interannual variability, the authors should justify this link function, discuss its limitations, and consider whether a log-link formulation for σ(t), or a simpler independent time trend in log σ, would provide a more interpretable alternative.
Lines 180-218: The model selection procedure combines multiple likelihood ratio tests, direct likelihood comparisons, and a final BIC comparison. This multi-step structure may be sensitive to the order of comparisons and to repeated testing. The initial stationarity test accepts non-stationarity if at least one non-stationary model rejects the stationary model, but the significance level is not specified and no adjustment is made for testing several alternatives. This can inflate the probability of detecting non-stationarity under the null. The authors should state the test threshold, examine false-positive rates through simulation under stationary GEV or Gumbel samples with similar record lengths, and report whether the 85% non-stationarity result remains stable.
Lines 182-192: The likelihood ratio test assumptions are not fully satisfied for several candidate models. Change-point models often involve parameters that are not identified under the null, and the standard chi-square approximation may not be valid. This is particularly important here because breakpoint and multi-linear models play a major role in the typology. The authors should either justify the asymptotic approximation for their specific model set or use a parametric bootstrap to assess the significance of non-stationarity and breakpoint terms. Without such a check, part of the detected complexity may reflect model-selection artefacts rather than hydrological change.
Lines 219-223: The interpretation of the 2-year return level is potentially misleading. A 2-year return level is not a rare event in flood-frequency terms, and an increase in the 2-year return level does not, by itself, mean that “rare events are becoming more frequent.” It means that the discharge associated with a 50% annual exceedance probability has changed. To make statements about frequency, the authors should calculate how the exceedance probability of a fixed historical threshold evolves through time. This distinction matters because the manuscript uses the 2-year return level as the basis for trend classification, while risk management applications often depend on less frequent floods.
Lines 225-228: The claim that using 10-year return levels would barely change the results needs to be demonstrated rather than asserted. The manuscript notes that this would hold unless σ(t) shows a strong trend, but 29% of non-stationary models include time-varying scale. Differences in shape parameter selection between GEV and Gumbel models may also become more important for higher return periods. A supplementary sensitivity analysis comparing clustering results for 2-, 5-, 10-, and perhaps 20-year return levels would help determine whether the typology describes general flood evolution or mainly changes around frequent annual maxima.
Lines 229-241: The clustering method is described only in general terms. For reproducibility and interpretation, the manuscript should specify the distance metric, initialization method, number of random starts, convergence criterion, random seed, and whether cluster stability was assessed. The normalization by the mean return level over 1950-2020 also needs justification because the discharge records end in 2018 and the final two years are model-derived. This probably has little effect, but it should be clarified. Since the clustering is central to the paper’s originality, its sensitivity to normalization choices and initialization should be reported.
Lines 245-268: The results describe non-stationarity as predominant and identify a clear zonal organization. These are important findings, but the strength of the wording should reflect uncertainty in model selection and data coverage. For instance, the conclusion that time-varying σ indicates greater interannual flood variability is not straightforward because σ is not estimated independently from µ in the “+” models. The authors should report confidence intervals or uncertainty bands for return-level trajectories and provide model-selection uncertainty, for example through BIC differences, bootstrap frequencies, or alternative model-choice criteria. This would make it clearer which stations have strong evidence for a specific trend type and which are more ambiguous.
Lines 270-276: The decision to use six clusters, despite the highest silhouette score occurring at five clusters, is understandable but weakly justified. The term “messy” does not provide a scientific basis for selecting K. Since the typology is a key contribution, the authors should show the silhouette scores for a range of K values, provide the centroids for K = 5 and K = 6, and quantify cluster stability under resampling. If six clusters are retained for interpretability, the manuscript should state this as an expert-guided choice supported by diagnostics, not as an entirely objective classification.
Lines 277-296: The Upward cluster contains only three stations, all located near Niamey tributaries, and should be interpreted separately from the regional typology. A cluster of three nearby stations may reflect a local hydrological signal, shared data characteristics, or a specific set of catchment changes rather than a general flood-evolution type. Since the Sirba and nearby tributaries are central to the strongest increase reported, the manuscript should discuss local hydrological evidence more deeply, including rainfall-runoff dynamics and known drivers in this area. The following references are relevant for this specific discussion: 10.1088/2515-7620/adaac9 and 10.3390/hydrology11030034.
Lines 284-291: The spatial interpretation of clusters needs refinement. The manuscript states that the Upward and V-shape clusters are co-located in the northern part of the region, but also states that no spatial coherence is observed for the V-shape stations. These statements can both be true at different spatial scales, yet the text does not clearly distinguish broad zonal coherence from basin-scale coherence. The authors should clarify whether spatial structure is assessed visually or statistically. A formal test of spatial clustering, even a simple comparison against random station-label permutations, would support claims about regional organization.
Lines 297-304: The analysis of 24 catchment features using Kolmogorov-Smirnov tests against a uniform distribution is interesting, but the statistical design is fragile. The manuscript conducts many tests across clusters and variables, with small sample sizes for some clusters and likely dependence among descriptors. Without correction for multiple testing or assessment of effect size, isolated significant features may be misleading. The authors should consider false-discovery control, permutation tests, or a multivariate approach that accounts for correlated features. The goal should be to distinguish weak descriptive signals from evidence of hydrological controls.
Lines 297-304 and Appendix A: The feature set includes geomorphological, climatic, land-cover, and hydrological descriptors, but many hydrological indices are computed from the same discharge records used to detect trends. This creates a risk of circular interpretation: clusters derived from AMAX trends are then compared with flow signatures derived from the same underlying streamflow series. Such comparisons may still be useful descriptively, but they should not be interpreted as independent explanatory features. The manuscript should separate external descriptors, such as area, altitude, aridity, and land cover, from discharge-derived signatures, and discuss this dependence explicitly.
Lines 305-314: The descriptions of cluster characteristics are more definitive than the statistical evidence appears to allow. For example, statements that stationary behaviors correspond to the smallest and predominantly agricultural catchments, or that V-shape catchments have sustained flows and low variability, should be supported by effect sizes and uncertainty, not only by boxplots and significance stars. Because the authors also acknowledge that many features are weakly discriminative, the cluster-feature interpretation should be phrased more cautiously. A table summarizing the most discriminant features, their median percentile shifts, and adjusted p-values would improve transparency.
Lines 318-335: The statement that flood non-stationarity since 1950 can be considered a robust regional characteristic is stronger than the current evidence supports. The finding is clearly important for the selected 80 AMAX series and the chosen model-selection framework, but the manuscript has not yet shown how sensitive the 85% estimate is to data selection, missingness thresholds, LRT assumptions, or model family choices. A more defensible conclusion would be that non-stationary models are frequently selected among the retained stations under the proposed modelling framework. If the authors wish to retain the stronger wording, they should support it with sensitivity experiments and uncertainty estimates.
Lines 351-356: The comparison between observed trends and future flood projections is relevant, but the manuscript should avoid implying a direct inconsistency unless metrics, spatial domains, baseline periods, and modelling assumptions are comparable. The observed analysis is based on AMAX-derived return levels from selected gauges, while the cited projections likely depend on climate and hydrological modelling chains. Differences may therefore arise from scale, forcing, model structure, or internal variability. The discussion would be more convincing if it explicitly framed this comparison as a qualitative contrast and specified what would be required for a formal evaluation.
Lines 363-380: The discussion correctly acknowledges that the feature analysis does not reveal clear attribution pathways. However, the subsequent discussion of Hortonian versus Hewlettian processes, persistent changes, and irreversible hydrological behavior remains largely conceptual. The presented results suggest spatial associations that may be consistent with different runoff-generation processes, but they do not demonstrate process shifts. The authors should make this distinction explicit. If mechanistic interpretation is retained, it should be supported by time-varying rainfall intensity, soil-surface condition, land-cover, runoff coefficient, or baseflow analyses.
Lines 397-409: The comparison between BIC and AIC is useful, but it appears only in the discussion and not in the methods or results. Since model-choice sensitivity directly affects the station-level trends and the subsequent clusters, these results should be reported more systematically. A station-level table or supplementary map showing where AIC and BIC disagree would help readers identify whether disagreements are spatially structured or concentrated in key basins. The manuscript should also clarify whether cluster membership changes when AIC-selected models are used.
Lines 401-409: The manuscript recognizes that precipitation and land-use covariates could improve the models, but this limitation has broader implications for the study’s conclusions. In this regard, this study could me mentioned (10.1016/j.scitotenv.2020.143792).
Lines 410-415: The clustering limitations are discussed, but key supporting evidence is said to be “not shown.” Since the number of clusters and their interpretability are central to the manuscript, diagnostics for K = 4, 5, 6, 7, and 8 should be included in supplementary material. This should include centroids, cluster sizes, silhouette scores, and perhaps a measure of membership stability. Without these diagnostics, readers cannot assess whether the six-cluster typology is a stable structure in the data or a useful but somewhat subjective simplification.
Lines 418-445: The conclusion should be more closely aligned with the demonstrated evidence. Phrases such as “robust regional characteristic,” “clear north-south gradient,” and “challenge the previously assumed homogeneous signal” are plausible, but they should be tempered by the limitations of station selection, missing data, model-selection uncertainty, and the absence of formal attribution. The conclusion would be stronger if it distinguished three levels of inference: what is directly shown by the retained AMAX series, what is inferred from the clustering, and what remains hypothetical regarding mechanisms and future changes. This would preserve the significance of the study while avoiding overinterpretation.