the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multi-Machine Learning Ensemble Regionalization of Hydrological Parameters for Enhances Flood Prediction in Ungauged Mountainous Catchments
Abstract. Machine learning-based parameter regionalization is an important method for flood prediction in ungauged mountainous catchments. However, single machine learning parameter regionalization often exhibits limitations in prediction accuracy and robustness. Therefore, this study proposes a multi-machine learning ensemble regionalization method that integrates Gradient Boosting Machine (GBM), K-Nearest Neighbors (KNN), and Extremely Randomized Trees (ERT) methods (GBM-KNN-ERT) to regionalize the sensitive parameters of the Topography-Based Subsurface Storm Flow (Top-SSF) model. Validated across 80 mountainous catchments in southwestern China, the GBM-KNN-ERT method demonstrates superior performance with 90 % of ungauged catchments achieving the Nash-Sutcliffe Efficiency (NSE) above 0.9, representing a 67.44 % improvement over single machine learning parameter regionalization. Notably, the GBM-KNN-ERT method shows improved robustness to climate change and changes in the number of donor catchments compared to other regionalization methods. An optimal balance between accuracy and computational efficiency was achieved using 20–40 high quality donor catchments (NSE greater than 0.85). This study provides systematic evidence that multi-machine learning ensemble can effectively address regionalization challenges in ungauged mountainous regions, offering a reliable tool for water resource management and flood disaster mitigation.
- Preprint
(2091 KB) - Metadata XML
-
Supplement
(452 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-1795', Saeed Golian, 16 Jun 2025
Comments on manuscript entitled ‘Multi-Machine Learning Ensemble Regionalization of Hydrological Parameters for Enhances Flood Prediction in Ungauged Mountainous Catchments’ by Li et al.
The manuscript deals with developing a multi-machine learning ensemble method for regionalization of a hydrologic model (Top-SSF) over 80 catchments in southwestern China. The authors showed the improvement in performance using multi-machine learning method over single methods. While the manuscript is well-structured and results are clearly presented, there are some points need to be addressed before the publication of the manuscript. Please find the comments as follows:
Line 107: what’s the range for catchments area?
Legend of Figure 1: please use the term ‘Hydrometry station’
Line 122: Hourly flow data
Line 150: TOPMODEL not TOPMODE
Section 3.1: More details should be provided. For example: What kind of hydrologic model is Top-SSF? Continuous or event-based? Lumped or (semi)distributed? And how it is going to be applied in this research? To simulate flood events? Or a whole time series (continuous modelling)? What are the inputs to the model, e.g. precipitation and temperature data?
Result section, Lines 362-365: Why performance of the different machine learning methods for parameter regionalization is compared against the Top-SSF model and not against the observed flood events?
Figures 11a and d: how can the NSE be greater than 1?
Section 5.4: Not clear how the calculations carried out to simulate peak discharges. Which events in future are selected for this analysis? Did the whole time series of projected precipitation in baseline and future periods fed to the hydrologic model? Or just a few storms selected?
Citation: https://doi.org/10.5194/egusphere-2025-1795-RC1 - AC1: 'Reply on RC1', Kai Li, 12 Aug 2025
-
RC2: 'Comment on egusphere-2025-1795', Paul Muñoz, 18 Aug 2025
Overall evaluation
The manuscript addresses an important problem: improving flood prediction in ungauged mountainous catchments through machine learning (ML) regionalization of hydrological parameters. The work has merit, particularly in its effort to combine multiple ML models into an ensemble and test the approach under a climate change scenario. That said, several aspects require clarification and expansion before the manuscript can be recommended for publication. In particular, the rationale for model choice, the justification for ensemble performance, interpretability of results, and methodological transparency need to be strengthened. The manuscript would benefit from stronger connections between the ML methodology and hydrological processes.
Specific comments
- The selected ML methods represent different learning paradigms (tree-based, instance-based, etc.). However, more complex techniques such as multilayer perceptron or deep learning networks were not included. The authors should justify why these were excluded and explain how they ensure a fair complementary across models that learn from very different principles.
- The highlight statement that “The GBM-KNN-ERT method demonstrates superior performance compared to other methods” is vague. Please clarify which performance metrics are referred to, and quantify the magnitude of improvement.
- The manuscript does not clearly explain how the trade-off between predictive accuracy and computational efficiency was considered. Given that ensemble methods can be computationally demanding, the authors should discuss the optimal balance and whether the proposed approach is practical for operational use.
- The manuscript claims that the GBM-KNN-ERT method exhibits stability under climate change. However, the explanation of how climate change is incorporated is insufficient. While SSP585 projections (2022–2100) are mentioned, the methods used to integrate these into the ML framework and evaluate stability should be described in greater detail.
- In the introduction, floods in mountainous catchments are mentioned, but it is unclear whether the focus is on general floods or flash floods (typically defined as occurring within 6 hours of rainfall). Given the rapid response of mountainous catchments, the authors should explicitly state which type of events are considered.
- The terms purple soil, yellow soil, and red soil are used without explanation. Please clarify whether this classification is standard in Chinese soil taxonomy, and provide references or definitions.
- In Fig. 1, the catchments and provincial borders are both shown in grey, making them difficult to distinguish. Please revise the figure with clearer colour contrasts.
- In Fig. 2, placeholders such as MLx, and P are not clearly defined. These should be explicitly labelled with their meanings (e.g., precipitation, slope, land cover index) rather than generic placeholders.
- Qp and Tp, introduced around line 271, should be defined at first mention for clarity.
- The manuscript reports Tp values of 2–4 hours during calibration/validation for the benchmarking model, but does not discuss whether these response times are realistic for flash flood conditions in mountainous catchments. Please provide context on catchment response times and evaluate whether a Tp of 4 hours is sufficient.
- Terminology: “calibration/validation” terminology is more common in physically based models, while ML studies usually refer to “training/testing.” This should be acknowledged for clarity.
- The manuscript states that donor catchments were selected either by mode 1 or mode 2. However, it would be more scientifically justifiable to select donor catchments based on similarity in physical and climatic characteristics (e.g., area, slope, precipitation regime, land cover).
- The manuscript notes that multi-model ensembles improve performance, but does not explain why. Please discuss what learning principles of the individual ML models (e.g., robustness of tree-based splits, flexibility of KNN, etc.) contribute to improvements in parameter estimation, and why the ensemble captures strengths across models.
- The sentence “75 of the catchments had NSE > 0,” might br incomplete. Please revise to show the correct threshold (e.g., NSE > 0.0).
- The manuscript should provide optimal parameters used in each ML model (e.g., number of trees, learning rates, neighbours in KNN) either in the main text or as supplementary material. This is necessary for reproducibility.
- While the manuscript presents aggregated performance metrics (NSE, Qp, Tp), it would be very valuable to also show hydrograph examples comparing observed vs. simulated discharge for both a high-performing and a low-performing catchment. Such visualizations would illustrate how the multi-model ensemble improves (or fails to improve) peak flow timing and magnitude compared to single ML models.
- While the ensemble approach clearly improves technical performance, the paper should strengthen its scientific justification by explaining whether the gains are due to model complementarity, data-dependence, or calibration bias. Without this, it remains unclear whether the ensemble would generalize to other regions or datasets.
Citation: https://doi.org/10.5194/egusphere-2025-1795-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
360 | 46 | 15 | 421 | 24 | 8 | 22 |
- HTML: 360
- PDF: 46
- XML: 15
- Total: 421
- Supplement: 24
- BibTeX: 8
- EndNote: 22
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1