Metrics that Matter: Objective Functions and Their Impact on Signature Representation in Conceptual Hydrological Models
Abstract. Although objective functions (OFs) are widely discussed in the literature, many modelling studies still default to a few common metrics, without much consideration of their relative strengths and weaknesses. This paper systematically investigates the impact of OF choice on the representation of various streamflow characteristics across 47 conceptual models and 10 hydro-climatically diverse catchments selected from the CARAVAN dataset. We use eight different OFs for calibration, including the Kling–Gupta efficiency (KGE), Nash–Sutcliffe efficiency (NSE), and their respective logarithmic variants, as well as four more recently proposed metrics. We evaluate the representation of 15 hydrological signatures that capture a relevant selection of streamflow characteristics to determine generalizable strengths and weaknesses of individual OFs across different models and catchments. Results show that the choice of OF can significantly affect a model's capability to simulate different hydrological signatures such as runoff ratios, extreme flow percentiles, and certain baseflow characteristics. While certain signatures, particularly those related to flow variability, are relatively insensitive to OF choice, others exhibit large performance shifts across different OFs. Generally, no single OF simultaneously achieved high performance across all tested signatures, highlighting that a single-objective calibration is unlikely to lead to an all-purpose model. Our results reinforce calls to choose objective functions deliberately and in line with the objectives of a study. They also provide initial guidance on which metrics highlight particular facets of streamflow behaviour.
It is somewhat depressing that after all the decades of past work on aleatory and epistemic uncertainties in both data, models and identification of parameters, there are still studies that essentially ignore the impacts (except for a conclusion that parameter uncertainty is not an issue because different random seeds give similar optimal parameter sets for some OFs).
But why have you not considered that some of the data you are using might be disinformative for model evaluation; that there may be parameter sets close to your optima that will give similar "performance" however that is measured; that different periods of data (with different errors) will give different optimal parameter sets, etc etc (see for example Beven, K. J., 2024, A short history of philosophies of hydrological model evaluation and hypothesis testing, WIRES Water, e1761, 69 (5): 519-527, https://doi.org/10.1002/wat2.1761 and the references therein).
We have known these things for a very long time - but the real issue to be addressed is whether a model (even when optimised as in this study) can really be considered as fit for purpose when there are often glaring visual issues in performance (during wetting up periods at the end of summer for example) that are glossed over by the types of global OFs used here). That was one of the reasons why I rejected the concept of optimal parameter sets more than 30 years ago now in favour of seeking models that might be consistent with the observations and what we know about their uncertainities. Trying to assess those uncertainties is, of course, a much more difficult problem than simply applying an optimisation algorithm (particularly for the epistemic uncertainties), but just thinking about what might be involved in doing so is a really valuable exercise.
Apologies in advance for this little rant but if we do not approach the modelling process with a bit deeper thought, how are we going to progress the science? That surely requires ways of rejecting models and then trying to do better, not of accepting that an optimised model is de facto considered satisfactory.
Keith Beven