Preprints
https://doi.org/10.5194/egusphere-2025-6460
https://doi.org/10.5194/egusphere-2025-6460
02 Feb 2026
 | 02 Feb 2026
Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

Technical note: Separating signal from noise in large-domain hydrologic model evaluation – Benchmarking model performance under sampling uncertainty

Gaby J. Gründemann, Wouter J. M. Knoben, Yalan Song, Katie van Werkhoven, and Martyn P. Clark

Abstract. Large-domain hydrologic modeling studies are becoming increasingly common. The evaluation of the resulting models is however often limited to the use of aggregated performance scores that show where model accuracy is higher and lower. Moreover, the inherent uncertainty in such scores, stemming from the choice of time periods used for their calculation, often remains unaccounted for. Here we use a collection of simple benchmarks whilst accounting for this sampling uncertainty to provide context for the performance scores of a large-domain hydrologic model. The benchmarks suggest that there are considerable constraints on the model's performance in approximately one-third of the basins used for model calibration and in approximately half of the basins where model parameters are regionalized. Sampling uncertainty has limited impact: in most basins the model is either clearly better or worse than the benchmarks, though accounting for sampling uncertainty remains important when the performance of different models is more similar. The areas where the benchmarks outperform the model only partially overlap with areas where the model achieves lower performance scores, and this suggests that improvements may be possible in more regions than a first glance at model performance values may indicate. A key advantage of using these benchmarks is that they are easy and fast to compute, particularly compared to the cost of configuring and running the model. This makes benchmarking a valuable tool that can complement more detailed model evaluation techniques by quickly identifying areas that should be investigated more thoroughly.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Gaby J. Gründemann, Wouter J. M. Knoben, Yalan Song, Katie van Werkhoven, and Martyn P. Clark

Status: open (until 16 Mar 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Gaby J. Gründemann, Wouter J. M. Knoben, Yalan Song, Katie van Werkhoven, and Martyn P. Clark

Data sets

Data for "Separating Signal from Noise in Large-Domain Hydrologic Model Evaluation: Benchmarking model performance under sampling uncertainty" Gaby Gründemann, Wouter Knoben, Yalan Song, Katie van Werkhoven, and Martyn Clark https://doi.org/10.5281/zenodo.18028487

Gaby J. Gründemann, Wouter J. M. Knoben, Yalan Song, Katie van Werkhoven, and Martyn P. Clark
Metrics will be available soon.
Latest update: 03 Feb 2026
Download
Short summary
The quality of large-domain hydrologic model simulations is often quantified with so-called accuracy metrics. Here we use simple benchmarks to provide relevant context for these accuracy metrics. Results show that areas where the model cannot beat the benchmarks do not always align with areas where the accuracy metrics are low. This suggests that model improvements are possible in regions that under more typical model evaluation approaches (i.e., without benchmarks) might not be obvious.
Share