On the importance of observation uncertainty when evaluating and comparing models: a hydrological example
Abstract. The comparison of models in geosciences involves refining a single model or comparing various model structures. However, such model comparison studies are potentially invalid without considering the uncertainty estimates of observations in evaluating relative model performance. The temporal sampling of the observation and simulation time series is an additional source of uncertainty as a few observation and simulation pairs, in the form of outliers, might have a disproportionate effect on the model skill score. In this study we highlight the importance of including observation uncertainty and temporal sampling uncertainty when comparing or evaluating hydrological models.
In hydrology, large-sample hydrology datasets contain a collection of catchments with hydro-meteorological time series, catchment boundaries and catchment attributes that provide an excellent test-bed for model evaluation and comparison studies. In this study, two model experiments that cover different purposes for model evaluation are set up using 396 catchments from the CAMELS-GB dataset. The first experiment, intra-model, mimics a model refinement case by evaluating the streamflow estimates of the distributed wflow_sbm hydrological model with and without additional calibration. The second experiment, inter-model, is a model comparison based on the streamflow estimates of the distributed PCR-GLOBWB and wflow_sbm hydrological models.
The temporal sampling uncertainty, the result of outliers in observation and simulation pairs, is found to be substantial throughout the case study area. High temporal sampling uncertainty indicates that the model skill scores used to evaluate model performance are heavily influenced by only a few data points in the time series. This is the case for half of the simulations (210) of the first intra-model experiment and 53 catchment simulations of the second inter-model experiment as indicated by larger sampling uncertainty than the difference in the KGE-NP model skill score. These cases highlight the importance of reporting and determining the cause of temporal sampling uncertainty before drawing conclusions on large-sample hydrology based model performance. The streamflow observation uncertainty analysis shows similar results. One third of the catchments simulations (123) of the intra-model experiment contains smaller streamflow simulation differences between models than streamflow observation uncertainties, compared to only 4 catchment simulations of the inter-model experiment due to larger differences between streamflow simulations. These catchments simulations should be excluded before drawing conclusions based on large-samples of catchments. The results of this study demonstrate that it is crucial for benchmark efforts based on large-samples of catchments to include streamflow observation uncertainty and temporal sampling uncertainty to obtain more robust results.
Status: final response (author comments only)
Viewed (geographical distribution)