the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: Does Multiple Basin Training Strategy Guarantee Superior Machine Learning Performance for Streamflow Predictions in Gaged Basins?
Abstract. In recent years, machine learning (ML) has witnessed growing prominence and popularity in hydrological science, offering convenience and ease of use without requiring extensive hydrological expertise or the complexity associated with process-based models. There exists debate regarding optimal training approaches, with some researchers advocating for multi-basin training while questioning the validity of single-basin approaches. This study examines the relationship between training dataset size (number of basins) and model performance. Through comparative analysis, we found that increasing the number of basins for ML training does not necessarily guarantee improved performance of the trained ML model. Specifically, the state-of-the-art global ML (G model) trained by Google with nearly 6,000 global basins underperforms compared to regional ML models trained with hundreds of basins in contiguous US and Great Britain regions for predicting streamflow in both gauged and ungauged basins. Furthermore, we compared the G model with our single-basin (S) ML models, trained for 609 global locations individually, and found that the G model does not consistently outperform S models, as results show S models outperforming the G model in 46 % of case studies. Therefore, the training approach should not be a criterion for judging model validity; instead, the focus should be on the trained model's performance.
- Preprint
(2803 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-769', Frederik Kratzert, 10 Jul 2025
- AC1: 'Reply on RC1', Vinh Ngoc Tran, 17 Aug 2025
-
RC2: 'Comment on egusphere-2025-769', Anonymous Referee #2, 08 Aug 2025
While the authors raise several important and timely questions regarding training strategies in hydrological machine learning, I believe the current version of the manuscript is affected by some methodological issues that limit the strength of its conclusions. After reviewing both the manuscript and Reviewer 1’s detailed comments, I respectfully offer the following major and minor suggestions for improvement.
Major Comments
- Model Comparison Framework and Associated Limitations
I share Reviewer 1’s concern regarding the comparability of the models evaluated in this study. The current experimental design involves comparisons among models trained on datasets with different resolutions, quality levels, and real-time availability. In particular, the contrast between:
- Model G (a global operational model using coarse, real-time data),
- Regional models (using high-quality reanalysis data), and
- Single-basin models with lagged streamflow inputs (S-6),
raises concerns about fairness and interpretability. Because these models operate under different data assumptions, it becomes challenging to isolate the effect of training strategies alone. As such, the conclusions drawn about the relative performance of global, regional, and single-basin approaches may be difficult to support in their current form.
- Alignment Between Research Question and Study Design
The manuscript appears to address two related but distinct questions:
- Whether models trained on multi-basin data outperform those trained on single-basin data (as stated in the title), and
- Whether locally optimized models using basin-specific data can outperform globally trained models using only limited local information (the implicit research question).
These are both important questions, but they require different analytical frameworks and modeling assumptions. To strengthen the manuscript, it may be helpful for the authors to clarify the primary research question and ensure that the experimental design is tailored to directly address it. Additionally, the first question has already been explored in earlier studies.
- Constructive Outlook on Global Model Development
Despite the challenges highlighted in this study, I remain optimistic about the ongoing development of global hydrological models. As data availability and computational methods continue to improve, I believe there is great potential for globally trained models to better incorporate local information.
Rather than viewing the current findings as a critique of global approaches, I suggest framing them as an opportunity to guide future research toward:
- Better integration of local knowledge within global models,
- Scalable methods for collecting and assimilating basin-specific data, and
- Hybrid modeling strategies that combine global generality with local specificity.
This perspective may help position the work within a more forward-looking and solution-oriented context.
Minor Comments
- Overlap Criterion for Gauges
The criterion used to identify overlapping gauges (“with distances not exceeding 1 km”) would benefit from further justification. Given potential uncertainties in station locations and historical relocations, a brief explanation or reference supporting this threshold would help strengthen the methodological transparency.
- Peak Flow Threshold Consistency
The thresholds for identifying peak flow events differ between figures (e.g., 95th percentile in Figures 1 and 2, versus 99th percentile in Figure 3). Clarifying the rationale behind these choices would improve the consistency and comparability of the analyses.
Citation: https://doi.org/10.5194/egusphere-2025-769-RC2 - AC2: 'Reply on RC2', Vinh Ngoc Tran, 17 Aug 2025
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,385 | 57 | 14 | 1,456 | 27 | 40 |
- HTML: 1,385
- PDF: 57
- XML: 14
- Total: 1,456
- BibTeX: 27
- EndNote: 40
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1