the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical Note: High Nash Sutcliffe Efficiencies conceal poor simulations of interannual variance in tropical, alpine, and polar catchments
Abstract. Streamflow time series can be decomposed into interannual, seasonal, and irregular components, with regionally varying contributions of each component. Seasonal variance dominates in many tropical, alpine, and polar regions, while irregular variance dominates in most other regions. Interannual variability in streamflow is known to strongly influence human and ecological systems and is likely to increase under the influence of climate change, though we find that historical interannual variance is usually only a small fraction of the total variance. We show that hydrologic models often simulate one component well while failing to simulate the others, a fact that is hidden by popular performance metrics such as the Nash-Sutcliffe Efficiency (NSE) and the Kling-Gupta Efficiency (KGE) which aggregate performance to a single number. We analyse 18 regional and global hydrologic models and find that in highly seasonal catchments where the NSE and KGE are consistently the highest, the models are almost always worse at simulating interannual variability. The NSE of the interannual component is lower in highly seasonal catchments, and simulated year-to-year changes in ecologically relevant hydrologic signatures are less accurate. This is concerning because it indicates that these hydrologic models may struggle to predict long-term responses to climate change, especially in tropical, alpine, and polar regions, which are some of the most vulnerable regimes regarding climate change.
- Preprint
(2161 KB) - Metadata XML
-
Supplement
(5801 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-3851', Anonymous Referee #1, 26 Nov 2025
- AC2: 'Reply on RC1', Sacha Ruzzante, 19 Dec 2025
-
RC2: 'Comment on egusphere-2025-3851', Anonymous Referee #2, 02 Dec 2025
This Technical Note (TN) uses available streamflow simulations from several datasets to show that high NSE and KGE values for seasonal catchments do not necessarily translate in good model performances of interannual variability of streamflow. This is a relevant topic in the scope of HESS. I agree with all points raised in RC1 and provide a few comments below.
TITLE – the title could be more straightforward regarding highly seasonal streamflow regimes rather than specifying tropical, alpine, and polar catchments.
ABSTRACT – the short summary reads better than the abstract. The introduction of the abstract is too long. There should be only one opening sentence (e.g., “…common metrics used to evaluate hydrological models…”) followed by a sentence clarifying the scientific gap (e.g., “however, simulating interannual variability might be a problem…”). It should be made clear that the paper is mostly based on simulation available in the literature (i.e., the sentences “we show that hydrologic models…” and “we analyse 18 regional and global hydrologic models…” are quite ambiguous regarding the nature of this technical note).
L13 – is “irregular variance” the best term here?
L20 – how were “ecologically relevant” signatures determined?
L21-23 – It would be nice to finalize the abstract with the important technical implications for hydrologic modeling (so what should we do now?) rather than a general comment about climate change and vulnerable regions (not really the core topic of this TN).
INTRODUCTION – The story is not clear. First 12 lines about streamflow and climate change. But this TN is about performance metrics. It seems that the most important paragraph starts at L50. This paragraph should be developed further to clarify the relevance of this TN.
L37 & L56 – What about each of these references is interesting? Expand on it or cut it out.
METHODS –
Section 2.1 describes several data selection choices. Perhaps, moving Section 3 before Section 2 would be better.
L74-77 – this is not completely clear.
Section 2.2 – this was done by running a model or using available simulations? The language is ambiguous here.
L91 – Isn’t the NSE using the average streamflow as a benchmark?
Section 2.3 – again using “Modelling” in the title is a bit ambiguous as to the methods.
L110 – Where do I_o and I_s come from?
L128 – Why is this interesting? It should be clear in the intro why changes in hydrological signatures should be evaluated. There are several important references missing here.
DATA – This data and simulation use should be clarified in the abstract and introduction sections.
RESULTS AND DISCUSSION
Section 4.1 – What is a highly seasonal catchment? What are the signature values that were used to classify the catchments?
L194-205 – A lot of climatological explanation here, but nothing about important hydrological catchment characteristics. What is the area of the chosen catchments? What is average annual rainfall? What is ET? Why were these three catchments selected?
Section 4.3 The discussion here is not linear and difficult to follow. This section could be reduced considerably and the paragraphs should be grouped around main messages.
L275 – Is this hypothesis exhaustive? Could you think about any other case where that would happen or any exception to this?
CONCLUSIONS AND RECOMMENDATIONS – This section is a bit convoluted and repetitive. The conclusions should strictly address the knowledge and recommendations without repeating the results section (e.g., “higher than 0.8. We observe, in Figure 3…”).
L378-383 – a bit of repetition of the introduction.
L387 – “Lastly…” Why is that? How much is enough?
Citation: https://doi.org/10.5194/egusphere-2025-3851-RC2 - AC1: 'Reply on RC2', Sacha Ruzzante, 19 Dec 2025
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 313 | 141 | 26 | 480 | 66 | 18 | 15 |
- HTML: 313
- PDF: 141
- XML: 26
- Total: 480
- Supplement: 66
- BibTeX: 18
- EndNote: 15
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors touch a very important and increasingly spotted (luckily) topic: should we blindly trust our traditional performance metrics for hydrological modeling? Aside other very interesting insights, they discuss a sad (although needed) truth: high NSEs (or even KGEs) do not necessarily mean that the simulations are adequate. It urges in some aspect our need to improve, as modelers, our optimization metrics. The paper is definitely a fit for HESS and should be published, but as should be expected, some concerns should be clarified/corrected/improve before, aside many suggestions.
1. I believe that the methodology used for the time-series decommission needs to be better explained (with more details) and if needed, authors could make use of Appendix/Supporting information. This is a crucial part and needs to be ensured to be easy to follow by readers.
2. Also on that, I feel that authors could justify better the choice of the decomposition. Was it motivated by previous work? Are there more references? This needs to be made clear in the text.
3. The authors called the seasonal component the long-term seasonality of the basins. Our rivers are under changes and the seasonality is consequently changing in many of our rivers. I think this could fit a bit better in the text. I understand the choice (L85-89), and also I believe that much of the change is captured in the irregular, but the text would benefit for a bit of clarification in the choices.
4. Simulations: If I understood correct, the authors used simulated data from several models (and in one case they simulated themselves). Did the authors check for the different periods of calibration/evaluation/tests for all the models? Or for overlapping period? Did the authors used only what was classified as test? my main concern, is that during the model comparison, the authors might be using simulated streamflow from test for some models and for "calibration" for other models. Or even, single-basin versus regional simulations. I see no problem in using different settings, but this needs to be extensively reported and discussed in the results. For example, I have the feeling that for the PREVAH-CH simulations, the authors might have used all the simulation (including calibration) and not only evaluation (I might be wrong). My suggestion is to review these aspects, and incorporate such information in the manuscript.
5. Regarding Figure 3 (along also L275 onwards) models that performed better for highly seasonal catchments were the ones with the lowest performances overall, or is it my impression? I think you should discuss better this, maybe showing the median performances? A box plot in appendix? Something to clarify if these models being better in seasonal are actually just the case that they had overall poor performance? Also touching point 4, how were these simulations obtained by the original authors? did they report them as the evaluation phase? or are they actually for the calibration period? This would be worthy clarifying for the readers.
6 L328-332: Needs to be rephrased (maybe) after reviewing points 4 and 5.