How well do hydrological models simulate streamflow extremes and drought-to-flood transitions?

Muñoz-Castro, Eduardo; Anderson, Bailey J.; Astagneau, Paul C.; Swain, Daniel L.; Mendoza, Pablo A.; Brunner, Manuela I.

doi:10.5194/egusphere-2025-781

Preprints

https://doi.org/10.5194/egusphere-2025-781

Preprints

24 Mar 2025

| 24 Mar 2025

How well do hydrological models simulate streamflow extremes and drought-to-flood transitions?

Eduardo Muñoz-Castro, Bailey J. Anderson, Paul C. Astagneau, Daniel L. Swain, Pablo A. Mendoza, and Manuela I. Brunner

Abstract. The impacts of floods can be enhanced when they occur shortly after drought. Models can be a useful tool to better understand the processes and mechanisms driving the response of floods occurring in close succession to streamflow drought. However, it is yet unclear how well hydrologic models capture these compound extreme events and which modeling decisions are most important for high model performance. To address this research gap, we calibrated four conceptual bucket-type hydrological models with different structures (GR4J, GR5J, GR6J, and TUW) for 63 catchments in Chile and Switzerland using different calibration strategies. We tested different configurations of the Kling-Gupta efficiency (KGE) formulation for model calibration to assess their performance in simulating and detecting observed transitions. We assessed the relative importance of different methodological choices including model structure, streamflow transformation, and KGE formulation and weights. We demonstrate that model performance as expressed by the KGE or NSE does not guarantee a good performance in terms of detecting streamflow extremes and their transitions. Furthermore, we show that a model's performance with respect to capturing extreme events primarily depends on how well it captures streamflow timing (i.e., correlation between observations and simulations) rather than other hydrological signatures or variables such as evapotranspiration or snow water equivalent. Our results also highlight that model structure and catchment characteristics as well as meteorological forcing play a key role in the detection of transitions. Specifically, we demonstrate that drought-to-flood transitions are more difficult to capture in semi-arid high-mountain catchments than in humid low-elevation catchments. Finally, our study provides guidelines for further model improvements with respect to drought-to-flood transitions, which can support process understanding related to these compound events, identifying regions prone to this type of event, and contribute to improved risk management -- aspects that will enhance preparedness.

Received: 19 Feb 2025 – Discussion started: 24 Mar 2025

Competing interests: One of the co-authors is a member of the editorial board of Hydrology and Earth System Sciences (HESS).

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 17766 KB)

Supplement (2749 KB)

Download & links

Preprint (17766 KB)
Metadata XML
Supplement (2749 KB)
BibTeX
EndNote

Eduardo Muñoz-Castro, Bailey J. Anderson, Paul C. Astagneau, Daniel L. Swain, Pablo A. Mendoza, and Manuela I. Brunner

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-781', Anonymous Referee #1, 22 Apr 2025
This manuscript presents a well-structured large-sample hydrology modeling experiment assessing the ability of four conceptual hydrological models (GR4J, GR5J, GR6J, TUW) to capture compound hydrological extremes, with a specific focus on drought-to-flood transitions. In the paper the authors examine the influence of various modeling decisions—model structure, calibration metrics, streamflow transformations, and weights—on model performance across 63 catchments in Chile and Switzerland.

The topic is relevant for the field of hydrology and fills some gaps in our understanding of model behavior under extremes events (drought-to-floods), which are of growing concern in the context of climate change.

Hence, the paper deserves to be published at HESS after some minor corrections.

General comments

Most of the paragraphs (e.g., L20-L38, L295-310) could benefit from some size reduction, or simply the separation of ideas. Generally speaking, one idea being introduced by paragraph would improve the readability of the text. Currently it is a bit hard to follow the paragraphs due to their size and mix of ideas together.

Three of the four models come from the GRXJ family. This means that model structure diversity is somewhat limited. Could you please justify better this choice in the text? Also pointing out the reasoning of not including another conceptual model structure besides the GRXJs?

The paper is dense, but could you somehow summarize better your conclusions in a maximum of three/four bullet points? I see that much can be concluded from your study, but I also think that you could benefit the readers by summarizing the main conclusions in this part rather than everything. Think about what were your hypothesis, and try to come back to them here, for example.

Specific comments

Figure 1: It is difficult to distinguish the basin boundaries in both subplots (A and B) of the figure. Maybe if you could reduce the line weight of the country boundaries in A and B, use another color for the basins and increase the figure size of subplots B, C and D.

L128: I think this section would benefit from this reference:

Clerc-Schwarzenbach, F. M., Selleri, G., Neri, M., Toth, E., van Meerveld, I., and Seibert, J.: HESS Opinions: A few camels or a whole caravan?, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2024-864, 2024.

In their study they show that most of the time using local information (as you did) can be beneficial for model simulations. If you feel that fits, please consider inserting it.

L329-333: I feel that this part should rather be placed in the discussion section.

Figure 8: The current choice of line colors and types makes it hard to distinguish among the different models. Please consider restructuring it to make it easier for readers.

Section 4.5: Start by introducing the figure, then you can make your statements. Currently it is a bit confusing the way the section is structured. Also, I see the possibility of having two paragraphs here rather than just one.

Section 4.6: Again, please start by introducing the figure, then you can make your statements.

L472-L473: Statement repetition. This idea has already been presented.

L501: Not Figure 10?

L523-L528: This idea has already been presented in the study area. Please consider keeping it just here in the discussion.
Citation: https://doi.org/10.5194/egusphere-2025-781-RC1
- AC1: 'Reply on RC1', Eduardo Muñoz-Castro, 19 Jul 2025
  
  We thank Referee #1 for reviewing our manuscript and providing constructive feedback. Our responses to each point raised are attached as a supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-781-AC1
RC2:
'Comment on egusphere-2025-781', Wouter Knoben, 09 May 2025

Summary:
This paper presents a comprehensive analysis of the use of the Kling-Gupta Efficiency to configure conceptual models for accurate simulation of droughts, floods, and the transitions from droughts to floods. The authors use 63 basins, 4 conceptual models, 5 different KGE formulations, 3 streamflow transformations (Q, 1/Q, average of both), and 4 different weights for each KGEs' variability component. They investigate (1) the performance of the models for extreme events detection with the Critical Success Index (CSI); (2) the simulation of various hydrologic states and fluxes (streamflow, evapotranspiration, snow water equivalent, soil moisture, baseflow) with the Nash-Sutcliffe Efficiency; (3) the correlation between catchment attributes and CSI values; (4) the relative importance of different parameter values; as well as (5) the relative importance of all tested factors in terms of CSI through with an ANOVA approach. Conclusions suggest (1) that higher KGE scores for streamflow simulation do not necessarily imply good performance on other variables (such as ET or SWE) or other metrics (such as CSI and NSE); and (2) that model structure, catchment attributes and forcing data quality are key controls on our ability to accurately simulate extreme events.
Assessment:
Dear authors,
I have completed my review of your paper. It is comprehensive and interesting (as well as ambitious), but I believe further work is needed before this manuscript can be published. I will upload a PDF with individual comments, and attempt to summarize what I consider most important here.
[1] First, let me say that I recognize the sheer amount of work being presented here. However, this can be as much a weakness as a strength. Right now my feeling is that this manuscript is trying to do too much at once. The analysis is very complex, the details don't always get the attention that I think they need (I tried to highlight these instances in the PDF), and part of the methodology seems only explained in the Results sections. You are asking quite a lot of the reader throughout, and I wonder if a more streamlined and focused manuscript wouldn't present a clearer, more easily digestible message. In the end it is of course the authors' choice what to present but I wanted to point out that I found the manuscript quite difficult to follow at times due to the sheer complexity of the work being presented.
[2] Second, I think there are some technical questions/concerns that should be addressed.
[2a] Primarily, I think there are reasons to believe that some of the conclusions about the importance of model structure may be artifacts of improper calibration of the GR5J and GR6J models. It is my understanding that GR5J is so close to GR4J in terms of structure that it can actually become GR4J if its X5 parameter is set to 0. The GR6J case is less clear to me, but looking at the model schematics side-by-side, it should be rather close to GR4J in terms of capabilities. However, Figure 4 shows very large differences in the calibration KGE scores between GR4J on the one hand (these are higher), and GR5J and GR6J on the other (these scores are much lower). This suggests that something has not gone quite right during the calibration of GR5J and GR6J: if GR4J is strictly a subset of GR5J, than the calibration should be able to find those parameter sets that make GR5J (that currently has the much lower scores) mimic GR4J (which currently has the much higher scores). A similar argument could (should?) apply to GR6J. My first guess would be that the ranges for parameters X5 and X6 were too restrictive, but I could not find the parameter ranges that were used for calibration in the document and thus cannot confirm this. To me, this aspect of the study needs considerable attention because it affects all the analysis that comes after.
[2b] I'm not fully convinced of the value of adding precipitation and temperature correction factors to the calibration procedure. I think there is a real risk that this makes the calibration problem (even more) poorly constrained by increasing equifinality during parameter selection, and the benefits of adding these parameters are unclear to me. These two parameters are mainly used to support the conclusion that more accurate forcing data is helpful, but I think this is obvious. The drawback of adding these parameters (letting the calibrating procedure compensate for other shortcomings in the modelling chain by adjusting the model inputs; i.e. getting the right results for the wrong reasons) are insufficiently discussed in the manuscript.
[2c] As far as I can tell, many (all?) of the comparisons between different setups are based on visual assessment of the figures shown in the manuscript. I believe the use of statistical tests would be more appropriate to quantify the extent to which decision X leads to different modeling outcomes than decision Y.
[3] Finally, I think some of the statements made in the paper are too general. In particular, the first conclusion presented in Section 6 is: "Drought events are better captured by hydrological models than flood events." This is, in my opinion, too bold of a claim for the results shown in this work and needs some sort of mention about the study constraints this conclusion is valid within (number of models, types, number of basins, locations, etc.). I believe this applies to a few other statements in the paper and tried to highlight these in the PDF.
I hope these comments are helpful in some way.
Kind regards,

Wouter Knoben

Citation: https://doi.org/10.5194/egusphere-2025-781-RC2
- AC2: 'Reply on RC2', Eduardo Muñoz-Castro, 19 Jul 2025
  
  We thank Referee #2 for reviewing our manuscript and providing constructive feedback. Our responses to each point raised are attached as a supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-781-AC2

Eduardo Muñoz-Castro, Bailey J. Anderson, Paul C. Astagneau, Daniel L. Swain, Pablo A. Mendoza, and Manuela I. Brunner

Supplement

https://doi.org/10.5194/egusphere-2025-781-supplement

Data sets

Implementation of four conceptual rainfall-runoff models to simulate drought-to-flood transitions in Chile and Switzerland Eduardo Muñoz-Castro, Bailey J. Anderson, and Manuela I. Brunner https://doi.org/10.5281/zenodo.14803501

Eduardo Muñoz-Castro, Bailey J. Anderson, Paul C. Astagneau, Daniel L. Swain, Pablo A. Mendoza, and Manuela I. Brunner

Viewed

Total article views: 1,451 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,118	306	27	1,451	82	30	50

HTML: 1,118
PDF: 306
XML: 27
Total: 1,451
Supplement: 82
BibTeX: 30
EndNote: 50

Views and downloads (calculated since 24 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	110	21	1	132
Apr 2025	97	33	6	136
May 2025	54	15	3	72
Jun 2025	66	35	2	103
Jul 2025	85	30	2	117
Aug 2025	140	24	1	165
Sep 2025	427	24	3	454
Oct 2025	54	22	2	78
Nov 2025	41	26	2	69
Dec 2025	31	60	5	96
Jan 2026	13	16	0	29

Cumulative views and downloads (calculated since 24 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	110	21	1	132
Apr 2025	97	33	6	136
May 2025	54	15	3	72
Jun 2025	66	35	2	103
Jul 2025	85	30	2	117
Aug 2025	140	24	1	165
Sep 2025	427	24	3	454
Oct 2025	54	22	2	78
Nov 2025	41	26	2	69
Dec 2025	31	60	5	96
Jan 2026	13	16	0	29

Viewed (geographical distribution)

Total article views: 1,433 (including HTML, PDF, and XML) Thereof 1,433 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 09 Jan 2026

Short summary

Flood impacts can be enhanced when they occur after droughts, yet the effectiveness of hydrological models in simulating these events remains unclear. Here, we calibrated four conceptual hydrological models across 63 catchments in Chile and Switzerland to assess their ability to detect streamflow extremes and their transitions. We show that drought-to-flood transitions are more difficult to capture in semi-arid high-mountain catchments than in humid low-elevation catchments.


Total:	0
HTML:	0
PDF:	0
XML:	0