Technical note: Does Multiple Basin Training Strategy Guarantee Superior Machine Learning Performance for Streamflow Predictions in Gaged Basins?

Tran, Vinh Ngoc; Nguyen, Tam V.; Kim, Jongho; Ivanov, Valeriy Y.

doi:10.5194/egusphere-2025-769

Preprints

https://doi.org/10.5194/egusphere-2025-769

Preprints

18 Jun 2025

| 18 Jun 2025

Technical note: Does Multiple Basin Training Strategy Guarantee Superior Machine Learning Performance for Streamflow Predictions in Gaged Basins?

Vinh Ngoc Tran, Tam V. Nguyen, Jongho Kim, and Valeriy Y. Ivanov

Abstract. In recent years, machine learning (ML) has witnessed growing prominence and popularity in hydrological science, offering convenience and ease of use without requiring extensive hydrological expertise or the complexity associated with process-based models. There exists debate regarding optimal training approaches, with some researchers advocating for multi-basin training while questioning the validity of single-basin approaches. This study examines the relationship between training dataset size (number of basins) and model performance. Through comparative analysis, we found that increasing the number of basins for ML training does not necessarily guarantee improved performance of the trained ML model. Specifically, the state-of-the-art global ML (G model) trained by Google with nearly 6,000 global basins underperforms compared to regional ML models trained with hundreds of basins in contiguous US and Great Britain regions for predicting streamflow in both gauged and ungauged basins. Furthermore, we compared the G model with our single-basin (S) ML models, trained for 609 global locations individually, and found that the G model does not consistently outperform S models, as results show S models outperforming the G model in 46 % of case studies. Therefore, the training approach should not be a criterion for judging model validity; instead, the focus should be on the trained model's performance.

Received: 19 Feb 2025 – Discussion started: 18 Jun 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Vinh Ngoc Tran, Tam V. Nguyen, Jongho Kim, and Valeriy Y. Ivanov

Status: closed

RC1:
'Comment on egusphere-2025-769', Frederik Kratzert, 10 Jul 2025

Detailed review in supplement.

Citation: https://doi.org/10.5194/egusphere-2025-769-RC1
- AC1: 'Reply on RC1', Vinh Ngoc Tran, 17 Aug 2025
  
  Thank you for your efforts in reviewing our manuscript. Please find our responses in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-769-AC1
RC2:
'Comment on egusphere-2025-769', Anonymous Referee #2, 08 Aug 2025
While the authors raise several important and timely questions regarding training strategies in hydrological machine learning, I believe the current version of the manuscript is affected by some methodological issues that limit the strength of its conclusions. After reviewing both the manuscript and Reviewer 1’s detailed comments, I respectfully offer the following major and minor suggestions for improvement.
Major Comments
Model Comparison Framework and Associated Limitations

I share Reviewer 1’s concern regarding the comparability of the models evaluated in this study. The current experimental design involves comparisons among models trained on datasets with different resolutions, quality levels, and real-time availability. In particular, the contrast between:
Model G (a global operational model using coarse, real-time data),

Regional models (using high-quality reanalysis data), and

Single-basin models with lagged streamflow inputs (S-6),

raises concerns about fairness and interpretability. Because these models operate under different data assumptions, it becomes challenging to isolate the effect of training strategies alone. As such, the conclusions drawn about the relative performance of global, regional, and single-basin approaches may be difficult to support in their current form.
Alignment Between Research Question and Study Design

The manuscript appears to address two related but distinct questions:
Whether models trained on multi-basin data outperform those trained on single-basin data (as stated in the title), and

Whether locally optimized models using basin-specific data can outperform globally trained models using only limited local information (the implicit research question).

These are both important questions, but they require different analytical frameworks and modeling assumptions. To strengthen the manuscript, it may be helpful for the authors to clarify the primary research question and ensure that the experimental design is tailored to directly address it. Additionally, the first question has already been explored in earlier studies.
Constructive Outlook on Global Model Development

Despite the challenges highlighted in this study, I remain optimistic about the ongoing development of global hydrological models. As data availability and computational methods continue to improve, I believe there is great potential for globally trained models to better incorporate local information.
Rather than viewing the current findings as a critique of global approaches, I suggest framing them as an opportunity to guide future research toward:
Better integration of local knowledge within global models,

Scalable methods for collecting and assimilating basin-specific data, and

Hybrid modeling strategies that combine global generality with local specificity.

This perspective may help position the work within a more forward-looking and solution-oriented context.
Minor Comments
Overlap Criterion for Gauges

The criterion used to identify overlapping gauges (“with distances not exceeding 1 km”) would benefit from further justification. Given potential uncertainties in station locations and historical relocations, a brief explanation or reference supporting this threshold would help strengthen the methodological transparency.
Peak Flow Threshold Consistency

The thresholds for identifying peak flow events differ between figures (e.g., 95th percentile in Figures 1 and 2, versus 99th percentile in Figure 3). Clarifying the rationale behind these choices would improve the consistency and comparability of the analyses.
Citation: https://doi.org/10.5194/egusphere-2025-769-RC2
- AC2: 'Reply on RC2', Vinh Ngoc Tran, 17 Aug 2025
  
  We thank you for your constructive review.
  Our responses to your comments are included in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-769-AC2

Status: closed

RC1:
'Comment on egusphere-2025-769', Frederik Kratzert, 10 Jul 2025

Detailed review in supplement.

Citation: https://doi.org/10.5194/egusphere-2025-769-RC1
- AC1: 'Reply on RC1', Vinh Ngoc Tran, 17 Aug 2025
  
  Thank you for your efforts in reviewing our manuscript. Please find our responses in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-769-AC1
RC2:
'Comment on egusphere-2025-769', Anonymous Referee #2, 08 Aug 2025
While the authors raise several important and timely questions regarding training strategies in hydrological machine learning, I believe the current version of the manuscript is affected by some methodological issues that limit the strength of its conclusions. After reviewing both the manuscript and Reviewer 1’s detailed comments, I respectfully offer the following major and minor suggestions for improvement.
Major Comments
Model Comparison Framework and Associated Limitations

I share Reviewer 1’s concern regarding the comparability of the models evaluated in this study. The current experimental design involves comparisons among models trained on datasets with different resolutions, quality levels, and real-time availability. In particular, the contrast between:
Model G (a global operational model using coarse, real-time data),

Regional models (using high-quality reanalysis data), and

Single-basin models with lagged streamflow inputs (S-6),

raises concerns about fairness and interpretability. Because these models operate under different data assumptions, it becomes challenging to isolate the effect of training strategies alone. As such, the conclusions drawn about the relative performance of global, regional, and single-basin approaches may be difficult to support in their current form.
Alignment Between Research Question and Study Design

The manuscript appears to address two related but distinct questions:
Whether models trained on multi-basin data outperform those trained on single-basin data (as stated in the title), and

Whether locally optimized models using basin-specific data can outperform globally trained models using only limited local information (the implicit research question).

These are both important questions, but they require different analytical frameworks and modeling assumptions. To strengthen the manuscript, it may be helpful for the authors to clarify the primary research question and ensure that the experimental design is tailored to directly address it. Additionally, the first question has already been explored in earlier studies.
Constructive Outlook on Global Model Development

Despite the challenges highlighted in this study, I remain optimistic about the ongoing development of global hydrological models. As data availability and computational methods continue to improve, I believe there is great potential for globally trained models to better incorporate local information.
Rather than viewing the current findings as a critique of global approaches, I suggest framing them as an opportunity to guide future research toward:
Better integration of local knowledge within global models,

Scalable methods for collecting and assimilating basin-specific data, and

Hybrid modeling strategies that combine global generality with local specificity.

This perspective may help position the work within a more forward-looking and solution-oriented context.
Minor Comments
Overlap Criterion for Gauges

The criterion used to identify overlapping gauges (“with distances not exceeding 1 km”) would benefit from further justification. Given potential uncertainties in station locations and historical relocations, a brief explanation or reference supporting this threshold would help strengthen the methodological transparency.
Peak Flow Threshold Consistency

The thresholds for identifying peak flow events differ between figures (e.g., 95th percentile in Figures 1 and 2, versus 99th percentile in Figure 3). Clarifying the rationale behind these choices would improve the consistency and comparability of the analyses.
Citation: https://doi.org/10.5194/egusphere-2025-769-RC2
- AC2: 'Reply on RC2', Vinh Ngoc Tran, 17 Aug 2025
  
  We thank you for your constructive review.
  Our responses to your comments are included in the attached document.
  
  Citation: https://doi.org/10.5194/egusphere-2025-769-AC2

Vinh Ngoc Tran, Tam V. Nguyen, Jongho Kim, and Valeriy Y. Ivanov

Viewed

Total article views: 2,994 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,888	77	29	2,994	40	64

HTML: 2,888
PDF: 77
XML: 29
Total: 2,994
BibTeX: 40
EndNote: 64

Views and downloads (calculated since 18 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	136	27	7	170
Jul 2025	94	9	4	107
Aug 2025	623	12	3	638
Sep 2025	1,953	15	9	1,977
Oct 2025	67	5	2	74
Nov 2025	15	9	4	28

Cumulative views and downloads (calculated since 18 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	136	27	7	170
Jul 2025	94	9	4	107
Aug 2025	623	12	3	638
Sep 2025	1,953	15	9	1,977
Oct 2025	67	5	2	74
Nov 2025	15	9	4	28

Viewed (geographical distribution)

Total article views: 2,892 (including HTML, PDF, and XML) Thereof 2,892 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 23 Nov 2025

Short summary

Our research questions whether machine learning models for predicting streamflow need to be trained on data from multiple basins at once. We compared three approaches: a global model trained on thousands of basins, regional models using hundreds of basins, and individual single-basin models. We found that regional and single-basin models often performed better than the global model. This suggests we should judge models by their actual performance rather than their training approach.


Total:	0
HTML:	0
PDF:	0
XML:	0