the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Process diagnostics of snowmelt runoff in global hydrological models: Part I – Model evaluation from the perspective of robustness
Abstract. Accurate simulation of snowmelt runoff (SMR) is critical for water resource management. However, despite the abundance of global hydrological models, little is known about their SMR performance. This study first presents a comprehensive evaluation of SMR across 15 state-of-the-art large-scale models and runoff products by focusing on their biases in first-order indices, i.e., the total volume (Qsum), peak flow (Qmax), and centroid timing (CTQ) of runoff in the snowmelt period. Then by introducing 1,513 snow-dominated basins with increasing basin complexities, we further proposed a novel model robustness metric to quantify how the model performance changes with basin complexity. Our results reveal that (1) most models exhibit underestimated Qsum and Qmax and predict CTQ too early. These biases are particularly pronounced in regions such as the western United States, northern Europe, and northeastern China. (2) Model biases systematically increase with basin complexity, with CTQ exhibiting the strongest sensitivity to increasing mean elevation and topographic variability, while that of Qsum and Qmax is mainly shaped by mean elevation and the diversity of vegetation types in the basin. (3) The robustness assessment further shows that observation-constrained runoff products exhibit the most outstanding performance, followed by the ISIMIP3a and ISIMIP2a models. Overall, global hydrological models exhibit stronger performance in simulating SMR than land surface models. Notably, land surface models perform substantially better for CTQ than for Qsum or Qmax, highlighting their structural advantage in capturing melt timing relative to runoff magnitude. This study provides a benchmark for SMR evaluation and a new framework for assessing model performance under basin complexity, offering crucial insights for future model development and uncertainty reduction.
- Preprint
(2536 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-6071', Anonymous Referee #1, 15 Mar 2026
-
RC2: 'Comment on egusphere-2025-6071', Anonymous Referee #2, 16 Mar 2026
This study presents a large-scale evaluation of snowmelt runoff (SMR) simulation across 15 global hydrological models and runoff products using 1,513 snow-dominated basins worldwide. The authors evaluate three key hydrograph characteristics: total runoff (Qsum), peak flow (Qmax), and centroid timing (CTQ), and introduces a new robustness index to quantify how model performance degrades with increasing basin complexity. The results show that most models underestimate runoff magnitude and predict earlier snowmelt timing, and that model performance degrades as basin complexity increases. Overall, the manuscript is well written and provides a valuable large sample diagnostic of model performance. The concept of evaluating model robustness across environmental complexity gradients is particularly interesting and could offer useful insights for model development. However, several issues need clarification before the manuscript is suitable for publication.
Major comments:
- The robustness metric is central to the manuscript but requires further justification and interpretation. The index combines the Stratified Mean Absolute Bias (SMAB) and the slope of bias vs. complexity into a Euclidean distance metric. Why was Euclidean distance selected as the combination method? Though the authors mentioned it was done in prior studies, it would be more helpful for readers if some justification can be added here. Also, is the robustness index comparable across different runoff metrics (Qsum, Qmax, CTQ)?
- The basin complexity index is defined as the sum of normalized DEM, DEMstd, LAI, and PFTh. While this approach is straightforward, the four variables may not contribute equally to hydrological complexity. Some factors, like elevation and topographic variability, may already be strongly correlated. It would be better if the authors can discuss why these four metrics are selected and whether dependencies among these variables influence the analysis.
- In this study, all runoff outputs are routed using RAPID to ensure comparability. However, the manuscript assumes that routing effects are minimal because long-term mean metrics are used. This assumption needs more justification because routing parameters can influence peak flow magnitude and CTQ. The authors should either provide a short sensitivity analysis, or cite studies showing that routing effects are negligible at the spatial scale considered.
- Page 19, lines 340: what’s the potential reason that ISIMIP 3a outperforms ISIMIP 2a in simulating Qsum and Qmax, is it because of the forcing data?
- The manuscript concludes that GHMs outperform LSMs for Qsum and Qmax, while LSMs perform better for CTQ. This is an interesting finding, but the discussion remains somewhat speculative. The authors attribute the differences to energy-balance representations and runoff parameterizations, but more concrete explanations or references would strengthen this argument. In addition, some differences could result from calibration, forcing datasets, and resolution differences. These factors should be discussed more carefully.
Minor comments:
- The snowmelt period is defined as the interval between maximum SWE and when SWE falls below 1 mm. This definition may not capture multi-peak melt seasons or rain-on-snow events. A brief discussion of limitations would be helpful.
- ERA5 SWE is used to define snowmelt timing. However, ERA5 has known biases in mountainous regions. The manuscript should briefly discuss how this may affect the analysis.
- “Stern conditions” is not commonly used in scientific literature, especially in hydrology or Earth system science papers. It’s a bit awkward in this context. Do you mean “challenging conditions” or “complex environmental conditions”?
- Page 18, in the title of Figure 6, change “blue circles denote GHMs, yellow squares denote LSMs, green triangles denote DGVMS, and grey diamonds denote data products.” to “circle denote GHMs, squares denote LSMs, triangles denote DGVMS, and diamonds denote data products.” Because shapes represent model types and colors shows the robustness.
Citation: https://doi.org/10.5194/egusphere-2025-6071-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 247 | 139 | 19 | 405 | 19 | 34 |
- HTML: 247
- PDF: 139
- XML: 19
- Total: 405
- BibTeX: 19
- EndNote: 34
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Overall, this manuscript presents a substantial amount of analysis on snowmelt runoff characteristics across a large sample of basins and multiple models/products, and the overall writing and presentation are generally clear. The topic is relevant and the study has clear value for large-scale model evaluation in cold-region hydrology. In particular, the authors made considerable efforts in constructing the intercomparison framework and diagnosing runoff volume, peak, and timing during snowmelt periods. However, several issues remain insufficiently addressed, especially regarding the parameter calibration and the formulation and interpretation of the newly proposed RI metric. My detailed comments are as below.
Major comments:
Other: