A signal processingbased interpretation of the NashSutcliffe efficiency
 ^{1}Institute of Engineering Innovation, University of Tokyo, Tokyo, 1138656, Japan
 ^{2}Meteorological Research Institute, Tsukuba, 3050052, Japan
 ^{1}Institute of Engineering Innovation, University of Tokyo, Tokyo, 1138656, Japan
 ^{2}Meteorological Research Institute, Tsukuba, 3050052, Japan
Abstract. The NashSutcliffe efficiency (NSE) is a widely used score in hydrology but is not common in the other environmental sciences. One of the reasons for its unpopularity is that its scientific meaning is somehow unclear in the literature. This study attempts to establish a solid foundation for NSE from the viewpoint of signal progressing. Thus, a forecast is viewed as a received signal containing a wanted signal (observations) contaminated by an unwanted signal (noise). This view underlines an important role of the error model between forecasts and observations.
By assuming an additive error model, it is easy to point out that NSE is equivalent to an important quantity in signal processing: the signaltonoise ratio. Moreover, NSE and the KlingGupta efficiency (KGE) are shown to be equivalent, at least when there are no biases, in the sense that they measure the relative magnitude of the power of noise to the power of variation of observations. The scientific meaning of NSE explains why it is reasonable to choose NSE=0 as the boundary between skilful and unskilful forecasts in practice, and this has no relation with the benchmark forecast that is equal to the mean of observations. Corresponding to NSE=0, the critical values of KGE is given approximately by 0.5.
In the general cases, when the additive error model is replaced by a mixed adaptivemultiplicative error model, the traditional NSE is shown not to be a welldefined notion. Therefore, an extension of NSE is derived, which only requires to divide the traditional noisetosignal ratio by the multiplicative factor. This has a practical implication: if the multiplicative factor is not considered, the traditional NSE and KGE underestimate (overestimate) the generalized ones when the multiplicative factor is greater (smaller) than one. In particular, the benchmark forecast turns out to be the worst forecast under the view of the generalized NSE.
Le Duc and Yohei Sawada
Status: final response (author comments only)

CC1: 'Comment on egusphere2022955', John Ding, 30 Sep 2022
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/egusphere2022955/egusphere2022955CC1supplement.pdf

AC1: 'Reply on CC1', Le Duc, 02 Oct 2022
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/egusphere2022955/egusphere2022955AC1supplement.pdf

AC1: 'Reply on CC1', Le Duc, 02 Oct 2022

RC1: 'Comment on egusphere2022955', Anonymous Referee #1, 06 Nov 2022
The paper deals with a novel interpretation of the NSE score measure starting from the observation that it is mostly used in hydrology and poorly exploited in other sciences.This interpration is based on a signal processing viewpoint.While the paper is interesting and useful the following concerns are at the hand.The NSE interpretation provided is based on a model error for forecast which is given in eq 5 and is basically driven by a Gaussian random error being it equal to noise in the signal processing viewpoint. This basic foundation provides by itself a lack of generality with respect to application of NSE to other sciencies included hydrology. In hydrology the NSE is intended as a model performancemetric where the difference between model and observation is not limited to noise. Differences between model and observation but also differences between observation and reality and also differences between different models can be analyzed by means of NSE or KGE whose meaning is quite clear and leaves no doubts in my personal opinion.In a hydrologic model, but also in other earth sciences, different models may arise because different processes are modelled ina stochastic or determinist way and/or because some processes are described or neglected following the fact that they can be more or less important according to the timespace scale of application and modeling purpose. Hence the difference between model output and observations may be very different from what given in eq 5, It can be deterministic or stochastic, and affected by deterministic or stochastic (or both) variability.As a consequence it seems that the the proposed analysis, while interesting and well founded in the context of the signal processing field (or any other fields where only noie provides difference between model output and observation). In the same light one may not accept the "general case" version of NSE which is obtained by considering the multiplicative error, beside the additive error, defined in eqs (32). Even in this case the "general case" should be adressed as relative to which field of application, besides the field of signal processes or affine methods.I believe the authors should strongly adress this issue in a revised version of the manuscript.On any other respect the paper is technically and scientifically well sounded, nevertheless I have to say I couldn't thoroughly check notation and mathematics.

AC2: 'Reply on RC1', Le Duc, 25 Nov 2022
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/egusphere2022955/egusphere2022955AC2supplement.pdf

AC2: 'Reply on RC1', Le Duc, 25 Nov 2022

RC2: 'Review of egusphere2022955', Anonymous Referee #2, 07 Nov 2022
This is a challenging review because the paper presents cool new insights into wellknown quantities but at the same time does not well connect to the existing hydrological modelling literature and practice (as will become clear below).
This manuscript proposes to explain the wellknown NashSutcliffeEfficiency (NSE) from a signal processing viewpoint and also analyzes the KlingGuptaEfficiency. It has the important merit to put the finger on the fact that NSE, KGE or even the correlation between observations and simulations are all measures of the same thing, which the authors call here the irreducible part of the noisetosignal ratio. It has always been clear in model calibration that different variancebased model performance criteria all measure the same thing – i.e. that they all have their optimal value for the same model simulation. This is of course completely clear from a mathematical viewpoint: there is a single simulation (i.e. a single parameter set) that leads to a minimal squared error. This is also visible in the numerous scatter plots that show model performances for different variancebased model performance criteria in hydrological modelling papers.
This fact might indeed have become blurred in recent discussions of how to make more efficient use of variancebased model criteria, e.g. with the emergence of KGE. However, as I will discuss below, the paper is probably confusing for many readers and requires rewriting to become easily accessible (thus interesting) for a wider readership.
The paper adopts a forecasting viewpoint where the purpose of a model is to forecast the next state (in time) based on observed values of the current state. This use of environmental inputoutput models is a very special case and not the standard use of NSE; NSE is essentially used for the calibration of inputoutput models in a simulation mode starting from initial conditions (not in a forecasting mode!) but not for the assessment of forecast quality.
The current version of the paper is thus not ready for publication, in my view because:
 First of all, the paper presents a link to signal processing without always explicitly mentioning i) what is known in the signal processing (or other) literature and presented here for a hydrological readership and ii) what is new. The paper demonstrates the link between correlation and what is called the here irreducible noisetosignal ratio component for an additive error model, without clearly saying what is new about this presentation of the correlation as a measure of noisiness of the observed signal under the additive error model.
 As mentioned above, NSE is primarily used in hydrologic model calibration and not in forecast assessment. In classical model calibration, the simulation model is not updated with past observations of the system state and all observations are available at the time of calibration. Due to this terminology issue, some discussed points are confusing (e.g. line 95 following). Accordingly, in the present state, this paper might contribute to further confusion rather than a better understanding of NSE in other fields of environmental science where models are rarely used in a forecast setting.
 The new proposed view in term of signal and noise leads the authors to the statement that for additive errors “when NSE goes below zero, the power of noise starts dominating the power of variation of observations”; this is what was already presented in similar terms in the work of Ding (2018) (see also his comment on the present paper in the public discussion); the NSE value is the ratio between the difference of residual variance and the observations variance to the observations variance. The relation to the “benchmark model” (being the observed mean) is nevertheless present in the hydrological model calibration literature: the requirement that a simulation should be better than the observed mean simply corresponds to the requirement that there should be more signal than noise. This is not problematic in model calibration where all observations are accessible at time of calibration (again: NSE is primarily used for calibration; in forecasting, the requirement also makes sense: the model should be better than the simplest possible forecast, which is the mean of *past* observations). Rather than trying to show that there is no link between the “benchmark concept” and the variance view (Ding, 2018) and the signal processing view (this paper), it would be more interesting to show the actual link. And it would be of prime importance to discuss the actual problem of NSE: it is not useful to compare m
 Furthermore, it is important to point out (see above) that it has long been known that NSE and KGE and correlation are all equivalent in terms of identifying a model with the minimum error (smallest error variance); this is not new but identifying the solution with the smallest error variance is rarely the single main objective of hydrologic model calibration (read e.g. key model estimation literature in hydrology by Keith Beven or Hoshin Gupta).
 The paper omits to discuss one of the key issues with using NSE for model performance assessment: if the signal is very strong, it is easy to obtain a good signal to noise ratio – i.e. easy to achieve good model performance; in other words, NSE should not be used to compare different case studies (different modelled signals), a problem that is still not sufficiently recognized when using NSE. This problem leads to many erroneous conclusions of the type: “NSE is better for simulation of signal A than for simulation of signal B – we can conclude that the model can better reproduce signal A than signal B”.
 The part that is interesting from my viewpoint is the part on a score that is invariant under any general translation; from a hydrological process modelling view point, however, we would need to be able to assign the translation to a clear physical phenomenon such as measurement bias. The definition of an upper limit for NSE under translation (i.e. bias correction) and he explicit link between NSE>= 0 and a corresponding correlation value (rho>=0.7) (under an additive error model) is certainly also interesting. This could open new ways of comparing model performance for different case studies for example.
 However, I do not think it is appropriate to assign this new skill score a name related to NSE – an abbreviation that should be exclusively used for the original formulation. Any new skill score should have a new name.
Additional detailed comments (see also the commented pdf) :
 I think the term forecast should not be used in this paper; in hydrology, today, forecast does not have the general meaning of “a simulated value” but is related to realtime forecasting; in general, there is an unclear use of the term forecast; most hydrological models are not used in a forecasting mode (predicting future states based on the current observed state) but in a simulation mode starting with initial conditions and observed inputs only; accordingly, the discussion related to forecast availability in lines 56 is misleading; NSE is primarily used in model calibration and not in forecast skill assessment; in calibration, the availability of the observations is a precondition
 Similarly, I would avoid “skillful forecast”; in hydrology, NSE is used in the context of model calibration and performance assessment but not primarily to judge the skill of forecasts
 What is meant by “and this has no relation with the benchmark forecast that is equal to the mean of observations”; obviously the value of NSE=0 has a very clear relation to the benchmark since if the all simulations equal the mean of the observations, we get NSE=0;
 What is meant by “Corresponding to NSE=0, the critical values of KGE is given approximately by 0.5.”; how do you define the critical value?
 “In the general cases, when the additive error model is replaced by a mixed adaptivemultiplicative error model, the traditional NSE is shown not to be a welldefined notion”: The notion of NSE cannot depend on the error model since NSE characterizes the model performance independent of an error model assumption; how could e.g. the notion of a bias depend on the error model assumption?
 Why do we need the requirement that “the generalized NSE is invariant under affine transformations”? This is your requirement, not the hydrologist’s requirement
 Conclusion: “Its choice is dictated by the fact that at this value the power of noise starts dominating the power of variation of observations.” Who dictates it? Why would this interpretation be superior to previous explanations?
 Why would we need to adjust NSE for other error models? It was never intended to be used in conjunction with error models but to yield an easy to interpret performance measure; most authors do not specify an error model; this difference should become clear;
 Regarding the apparent debate on the meaning of Nash: this needs a reference; did someone else say this or is this your interpretation? I have never heard / seen anyone saying that there is a discussion about what Nash means; there is simply no need to use Nash in other disciplines

RC3: 'Reply on RC2', Anonymous Referee #2, 07 Nov 2022
 AC4: 'Reply on RC3', Le Duc, 25 Nov 2022

AC3: 'Reply on RC2', Le Duc, 25 Nov 2022
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/egusphere2022955/egusphere2022955AC3supplement.pdf
Le Duc and Yohei Sawada
Le Duc and Yohei Sawada
Viewed
HTML  XML  Total  BibTeX  EndNote  

523  158  23  704  4  2 
 HTML: 523
 PDF: 158
 XML: 23
 Total: 704
 BibTeX: 4
 EndNote: 2
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1