Multi-Machine Learning Ensemble Regionalization of Hydrological Parameters for Enhances Flood Prediction in Ungauged Mountainous Catchments

Li, Kai; Guo, Linmao; Wang, Genxu; Gao, Jihui; Sun, Xiangyang; Huang, Peng; Li, Jinlong; Ma, Jiapei; Zhang, Xinyu

doi:10.5194/egusphere-2025-1795

Preprints

https://doi.org/10.5194/egusphere-2025-1795

Preprints

10 Jun 2025

| 10 Jun 2025

Multi-Machine Learning Ensemble Regionalization of Hydrological Parameters for Enhances Flood Prediction in Ungauged Mountainous Catchments

Kai Li, Linmao Guo, Genxu Wang, Jihui Gao, Xiangyang Sun, Peng Huang, Jinlong Li, Jiapei Ma, and Xinyu Zhang

Abstract. Machine learning-based parameter regionalization is an important method for flood prediction in ungauged mountainous catchments. However, single machine learning parameter regionalization often exhibits limitations in prediction accuracy and robustness. Therefore, this study proposes a multi-machine learning ensemble regionalization method that integrates Gradient Boosting Machine (GBM), K-Nearest Neighbors (KNN), and Extremely Randomized Trees (ERT) methods (GBM-KNN-ERT) to regionalize the sensitive parameters of the Topography-Based Subsurface Storm Flow (Top-SSF) model. Validated across 80 mountainous catchments in southwestern China, the GBM-KNN-ERT method demonstrates superior performance with 90 % of ungauged catchments achieving the Nash-Sutcliffe Efficiency (NSE) above 0.9, representing a 67.44 % improvement over single machine learning parameter regionalization. Notably, the GBM-KNN-ERT method shows improved robustness to climate change and changes in the number of donor catchments compared to other regionalization methods. An optimal balance between accuracy and computational efficiency was achieved using 20–40 high quality donor catchments (NSE greater than 0.85). This study provides systematic evidence that multi-machine learning ensemble can effectively address regionalization challenges in ungauged mountainous regions, offering a reliable tool for water resource management and flood disaster mitigation.

Received: 16 Apr 2025 – Discussion started: 10 Jun 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2091 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2091 KB)

Supplement (452 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

14 Jan 2026

Multi-Machine Learning Ensemble Regionalization of Hydrological Parameters for Enhancing Flood Prediction in Ungauged Mountainous Catchments

Kai Li, Linmao Guo, Genxu Wang, Jihui Gao, Xiangyang Sun, Peng Huang, Jinlong Li, Jiapei Ma, and Xinyu Zhang

Hydrol. Earth Syst. Sci., 30, 205–225, https://doi.org/10.5194/hess-30-205-2026,https://doi.org/10.5194/hess-30-205-2026, 2026

Short summary

Kai Li, Linmao Guo, Genxu Wang, Jihui Gao, Xiangyang Sun, Peng Huang, Jinlong Li, Jiapei Ma, and Xinyu Zhang

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1795', Saeed Golian, 16 Jun 2025

Comments on manuscript entitled ‘Multi-Machine Learning Ensemble Regionalization of Hydrological Parameters for Enhances Flood Prediction in Ungauged Mountainous Catchments’ by Li et al.
The manuscript deals with developing a multi-machine learning ensemble method for regionalization of a hydrologic model (Top-SSF) over 80 catchments in southwestern China. The authors showed the improvement in performance using multi-machine learning method over single methods. While the manuscript is well-structured and results are clearly presented, there are some points need to be addressed before the publication of the manuscript. Please find the comments as follows:
Line 107: what’s the range for catchments area?
Legend of Figure 1: please use the term ‘Hydrometry station’
Line 122: Hourly flow data
Line 150: TOPMODEL not TOPMODE
Section 3.1: More details should be provided. For example: What kind of hydrologic model is Top-SSF? Continuous or event-based? Lumped or (semi)distributed? And how it is going to be applied in this research? To simulate flood events? Or a whole time series (continuous modelling)? What are the inputs to the model, e.g. precipitation and temperature data?
Result section, Lines 362-365: Why performance of the different machine learning methods for parameter regionalization is compared against the Top-SSF model and not against the observed flood events?
Figures 11a and d: how can the NSE be greater than 1?
Section 5.4: Not clear how the calculations carried out to simulate peak discharges. Which events in future are selected for this analysis? Did the whole time series of projected precipitation in baseline and future periods fed to the hydrologic model? Or just a few storms selected?

Citation: https://doi.org/10.5194/egusphere-2025-1795-RC1
- AC1: 'Reply on RC1', Kai Li, 12 Aug 2025
  
  Many thanks for your comments. Please find our detailed answers in the attached document.
  Kind regards,
  The Authors
  
  Citation: https://doi.org/10.5194/egusphere-2025-1795-AC1
RC2:
'Comment on egusphere-2025-1795', Paul Muñoz, 18 Aug 2025
Overall evaluation
The manuscript addresses an important problem: improving flood prediction in ungauged mountainous catchments through machine learning (ML) regionalization of hydrological parameters. The work has merit, particularly in its effort to combine multiple ML models into an ensemble and test the approach under a climate change scenario. That said, several aspects require clarification and expansion before the manuscript can be recommended for publication. In particular, the rationale for model choice, the justification for ensemble performance, interpretability of results, and methodological transparency need to be strengthened. The manuscript would benefit from stronger connections between the ML methodology and hydrological processes.
Specific comments
The selected ML methods represent different learning paradigms (tree-based, instance-based, etc.). However, more complex techniques such as multilayer perceptron or deep learning networks were not included. The authors should justify why these were excluded and explain how they ensure a fair complementary across models that learn from very different principles.

The highlight statement that “The GBM-KNN-ERT method demonstrates superior performance compared to other methods” is vague. Please clarify which performance metrics are referred to, and quantify the magnitude of improvement.

The manuscript does not clearly explain how the trade-off between predictive accuracy and computational efficiency was considered. Given that ensemble methods can be computationally demanding, the authors should discuss the optimal balance and whether the proposed approach is practical for operational use.

The manuscript claims that the GBM-KNN-ERT method exhibits stability under climate change. However, the explanation of how climate change is incorporated is insufficient. While SSP585 projections (2022–2100) are mentioned, the methods used to integrate these into the ML framework and evaluate stability should be described in greater detail.

In the introduction, floods in mountainous catchments are mentioned, but it is unclear whether the focus is on general floods or flash floods (typically defined as occurring within 6 hours of rainfall). Given the rapid response of mountainous catchments, the authors should explicitly state which type of events are considered.

The terms purple soil, yellow soil, and red soil are used without explanation. Please clarify whether this classification is standard in Chinese soil taxonomy, and provide references or definitions.

In Fig. 1, the catchments and provincial borders are both shown in grey, making them difficult to distinguish. Please revise the figure with clearer colour contrasts.

In Fig. 2, placeholders such as MLx, and P are not clearly defined. These should be explicitly labelled with their meanings (e.g., precipitation, slope, land cover index) rather than generic placeholders.

Qp and Tp, introduced around line 271, should be defined at first mention for clarity.

The manuscript reports Tp values of 2–4 hours during calibration/validation for the benchmarking model, but does not discuss whether these response times are realistic for flash flood conditions in mountainous catchments. Please provide context on catchment response times and evaluate whether a Tp of 4 hours is sufficient.

Terminology: “calibration/validation” terminology is more common in physically based models, while ML studies usually refer to “training/testing.” This should be acknowledged for clarity.

The manuscript states that donor catchments were selected either by mode 1 or mode 2. However, it would be more scientifically justifiable to select donor catchments based on similarity in physical and climatic characteristics (e.g., area, slope, precipitation regime, land cover).

The manuscript notes that multi-model ensembles improve performance, but does not explain why. Please discuss what learning principles of the individual ML models (e.g., robustness of tree-based splits, flexibility of KNN, etc.) contribute to improvements in parameter estimation, and why the ensemble captures strengths across models.

The sentence “75 of the catchments had NSE > 0,” might br incomplete. Please revise to show the correct threshold (e.g., NSE > 0.0).

The manuscript should provide optimal parameters used in each ML model (e.g., number of trees, learning rates, neighbours in KNN) either in the main text or as supplementary material. This is necessary for reproducibility.

While the manuscript presents aggregated performance metrics (NSE, Qp, Tp), it would be very valuable to also show hydrograph examples comparing observed vs. simulated discharge for both a high-performing and a low-performing catchment. Such visualizations would illustrate how the multi-model ensemble improves (or fails to improve) peak flow timing and magnitude compared to single ML models.

While the ensemble approach clearly improves technical performance, the paper should strengthen its scientific justification by explaining whether the gains are due to model complementarity, data-dependence, or calibration bias. Without this, it remains unclear whether the ensemble would generalize to other regions or datasets.
Citation: https://doi.org/10.5194/egusphere-2025-1795-RC2
- AC2: 'Reply on RC2', Kai Li, 19 Sep 2025
  
  Many thanks for your comments. Please find our detailed answers in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1795-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1795', Saeed Golian, 16 Jun 2025

Comments on manuscript entitled ‘Multi-Machine Learning Ensemble Regionalization of Hydrological Parameters for Enhances Flood Prediction in Ungauged Mountainous Catchments’ by Li et al.
The manuscript deals with developing a multi-machine learning ensemble method for regionalization of a hydrologic model (Top-SSF) over 80 catchments in southwestern China. The authors showed the improvement in performance using multi-machine learning method over single methods. While the manuscript is well-structured and results are clearly presented, there are some points need to be addressed before the publication of the manuscript. Please find the comments as follows:
Line 107: what’s the range for catchments area?
Legend of Figure 1: please use the term ‘Hydrometry station’
Line 122: Hourly flow data
Line 150: TOPMODEL not TOPMODE
Section 3.1: More details should be provided. For example: What kind of hydrologic model is Top-SSF? Continuous or event-based? Lumped or (semi)distributed? And how it is going to be applied in this research? To simulate flood events? Or a whole time series (continuous modelling)? What are the inputs to the model, e.g. precipitation and temperature data?
Result section, Lines 362-365: Why performance of the different machine learning methods for parameter regionalization is compared against the Top-SSF model and not against the observed flood events?
Figures 11a and d: how can the NSE be greater than 1?
Section 5.4: Not clear how the calculations carried out to simulate peak discharges. Which events in future are selected for this analysis? Did the whole time series of projected precipitation in baseline and future periods fed to the hydrologic model? Or just a few storms selected?

Citation: https://doi.org/10.5194/egusphere-2025-1795-RC1
- AC1: 'Reply on RC1', Kai Li, 12 Aug 2025
  
  Many thanks for your comments. Please find our detailed answers in the attached document.
  Kind regards,
  The Authors
  
  Citation: https://doi.org/10.5194/egusphere-2025-1795-AC1
RC2:
'Comment on egusphere-2025-1795', Paul Muñoz, 18 Aug 2025
Overall evaluation
The manuscript addresses an important problem: improving flood prediction in ungauged mountainous catchments through machine learning (ML) regionalization of hydrological parameters. The work has merit, particularly in its effort to combine multiple ML models into an ensemble and test the approach under a climate change scenario. That said, several aspects require clarification and expansion before the manuscript can be recommended for publication. In particular, the rationale for model choice, the justification for ensemble performance, interpretability of results, and methodological transparency need to be strengthened. The manuscript would benefit from stronger connections between the ML methodology and hydrological processes.
Specific comments
The selected ML methods represent different learning paradigms (tree-based, instance-based, etc.). However, more complex techniques such as multilayer perceptron or deep learning networks were not included. The authors should justify why these were excluded and explain how they ensure a fair complementary across models that learn from very different principles.

The highlight statement that “The GBM-KNN-ERT method demonstrates superior performance compared to other methods” is vague. Please clarify which performance metrics are referred to, and quantify the magnitude of improvement.

The manuscript does not clearly explain how the trade-off between predictive accuracy and computational efficiency was considered. Given that ensemble methods can be computationally demanding, the authors should discuss the optimal balance and whether the proposed approach is practical for operational use.

The manuscript claims that the GBM-KNN-ERT method exhibits stability under climate change. However, the explanation of how climate change is incorporated is insufficient. While SSP585 projections (2022–2100) are mentioned, the methods used to integrate these into the ML framework and evaluate stability should be described in greater detail.

In the introduction, floods in mountainous catchments are mentioned, but it is unclear whether the focus is on general floods or flash floods (typically defined as occurring within 6 hours of rainfall). Given the rapid response of mountainous catchments, the authors should explicitly state which type of events are considered.

The terms purple soil, yellow soil, and red soil are used without explanation. Please clarify whether this classification is standard in Chinese soil taxonomy, and provide references or definitions.

In Fig. 1, the catchments and provincial borders are both shown in grey, making them difficult to distinguish. Please revise the figure with clearer colour contrasts.

In Fig. 2, placeholders such as MLx, and P are not clearly defined. These should be explicitly labelled with their meanings (e.g., precipitation, slope, land cover index) rather than generic placeholders.

Qp and Tp, introduced around line 271, should be defined at first mention for clarity.

The manuscript reports Tp values of 2–4 hours during calibration/validation for the benchmarking model, but does not discuss whether these response times are realistic for flash flood conditions in mountainous catchments. Please provide context on catchment response times and evaluate whether a Tp of 4 hours is sufficient.

Terminology: “calibration/validation” terminology is more common in physically based models, while ML studies usually refer to “training/testing.” This should be acknowledged for clarity.

The manuscript states that donor catchments were selected either by mode 1 or mode 2. However, it would be more scientifically justifiable to select donor catchments based on similarity in physical and climatic characteristics (e.g., area, slope, precipitation regime, land cover).

The manuscript notes that multi-model ensembles improve performance, but does not explain why. Please discuss what learning principles of the individual ML models (e.g., robustness of tree-based splits, flexibility of KNN, etc.) contribute to improvements in parameter estimation, and why the ensemble captures strengths across models.

The sentence “75 of the catchments had NSE > 0,” might br incomplete. Please revise to show the correct threshold (e.g., NSE > 0.0).

The manuscript should provide optimal parameters used in each ML model (e.g., number of trees, learning rates, neighbours in KNN) either in the main text or as supplementary material. This is necessary for reproducibility.

While the manuscript presents aggregated performance metrics (NSE, Qp, Tp), it would be very valuable to also show hydrograph examples comparing observed vs. simulated discharge for both a high-performing and a low-performing catchment. Such visualizations would illustrate how the multi-model ensemble improves (or fails to improve) peak flow timing and magnitude compared to single ML models.

While the ensemble approach clearly improves technical performance, the paper should strengthen its scientific justification by explaining whether the gains are due to model complementarity, data-dependence, or calibration bias. Without this, it remains unclear whether the ensemble would generalize to other regions or datasets.
Citation: https://doi.org/10.5194/egusphere-2025-1795-RC2
- AC2: 'Reply on RC2', Kai Li, 19 Sep 2025
  
  Many thanks for your comments. Please find our detailed answers in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1795-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (08 Oct 2025) by Elena Toth

AR by Kai Li on behalf of the Authors (12 Oct 2025) Author's tracked changes Manuscript

EF by Mario Ebel (13 Oct 2025) Supplement

EF by Vitaly Muravyev (15 Oct 2025) Author's response

ED: Referee Nomination & Report Request started (15 Oct 2025) by Elena Toth

RR by Saeed Golian (27 Oct 2025)

RR by Paul Muñoz (01 Dec 2025)

Suggestions for revision or reasons for rejection

Specific comment 1 in my first review
Indeed, LSTM-type architectures are not suitable for static regionalization, they are designed for sequential data. The concern about dataset size (80 catchments) is also legitimate, since many deep networks require larger datasets to generalize well.
However, I have to disagree with the statement that “DL models often function as black boxes, contrary to our goal of developing an interpretable tool”. The field has advanced considerably, and there exists substantial literature on interpretable and physically informed neural networks (e.g., PINNs, hybrid models, physics-constrained loss functions). These approaches explicitly address the interpretability issue and are widely used in water sciences. To avoid unintentionally misrepresenting the current state of the field, I suggest removing or revising the claim that deep learning is unsuitable due to being a “black box.” Instead, the authors could emphasize more defensible points such as the limited dataset size, which makes simpler ML models more robust and less prone to overfitting, or the desire to maintain transparency through model structure or feature-based interpretability. Something else that could be included is that transparency does not only depend on the choice of algorithm. It can also be improved through feature engineering (e.g., transformations, dimensionality reduction, constraint-based preprocessing) or by modifying the architecture of the ML model itself to encode physical knowledge or monotonic relationships.
Specific comment 3
I recommend adding one more clarification to ensure transparency and reproducibility. Training time alone can be subjective because it depends on the computational resources used (e.g., CPU model, number of cores, RAM, and software environment). Without this information, it is difficult for readers to interpret or compare the absolute running times shown in Table 4. I therefore suggest that the authors briefly specify the hardware that was used, to provide a clear reference point and avoid potential misunderstandings about computational cost.

Hide

ED: Publish subject to minor revisions (review by editor) (13 Dec 2025) by Elena Toth

AR by Kai Li on behalf of the Authors (18 Dec 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (30 Dec 2025) by Elena Toth

AR by Kai Li on behalf of the Authors (05 Jan 2026) Manuscript

Journal article(s) based on this preprint

14 Jan 2026

Multi-Machine Learning Ensemble Regionalization of Hydrological Parameters for Enhancing Flood Prediction in Ungauged Mountainous Catchments

Kai Li, Linmao Guo, Genxu Wang, Jihui Gao, Xiangyang Sun, Peng Huang, Jinlong Li, Jiapei Ma, and Xinyu Zhang

Hydrol. Earth Syst. Sci., 30, 205–225, https://doi.org/10.5194/hess-30-205-2026,https://doi.org/10.5194/hess-30-205-2026, 2026

Short summary

Kai Li, Linmao Guo, Genxu Wang, Jihui Gao, Xiangyang Sun, Peng Huang, Jinlong Li, Jiapei Ma, and Xinyu Zhang

Supplement

https://doi.org/10.5194/egusphere-2025-1795-supplement

Kai Li, Linmao Guo, Genxu Wang, Jihui Gao, Xiangyang Sun, Peng Huang, Jinlong Li, Jiapei Ma, and Xinyu Zhang

Viewed

Total article views: 971 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
808	133	30	971	70	21	34

HTML: 808
PDF: 133
XML: 30
Total: 971
Supplement: 70
BibTeX: 21
EndNote: 34

Views and downloads (calculated since 10 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	153	21	9	183
Jul 2025	46	8	1	55
Aug 2025	121	16	5	142
Sep 2025	396	13	4	413
Oct 2025	38	11	3	52
Nov 2025	33	25	4	62
Dec 2025	18	32	3	53
Jan 2026	3	6	1	10
Feb 2026	0
Mar 2026	1	0	1
Apr 2026	0

Cumulative views and downloads (calculated since 10 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	153	21	9	183
Jul 2025	46	8	1	55
Aug 2025	121	16	5	142
Sep 2025	396	13	4	413
Oct 2025	38	11	3	52
Nov 2025	33	25	4	62
Dec 2025	18	32	3	53
Jan 2026	3	6	1	10
Feb 2026	0
Mar 2026	1	0	1
Apr 2026	0

Viewed (geographical distribution)

Total article views: 956 (including HTML, PDF, and XML) Thereof 956 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 11 Apr 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2091 KB)
Metadata XML

Short summary

We propose a multi-machine learning ensemble (GBM-KNN-ERT) to improve Top-SSF parameter regionalization for flood prediction in ungauged mountainous catchments, overcoming single machine learning limits. Validated in 80 mountainous catchments in southwestern China, the ensemble achieved NSE greater than 0.9 for 90 % of catchments, showing superior accuracy and robustness to climate change and donor catchment variability. The ensemble provides a robust regionalization method.


Total:	0
HTML:	0
PDF:	0
XML:	0