Impact of bias adjustment strategy on ensemble projections of hydrological extremes

Astagneau, Paul C.; Wood, Raul R.; Vrac, Mathieu; Kotlarski, Sven; Vaittinada Ayar, Pradeebane; François, Bastien; Brunner, Manuela I.

doi:https://doi.org/10.5194/egusphere-2024-3966

Preprints

https://doi.org/10.5194/egusphere-2024-3966

Preprints

03 Mar 2025

| 03 Mar 2025

Impact of bias adjustment strategy on ensemble projections of hydrological extremes

Paul C. Astagneau, Raul R. Wood, Mathieu Vrac, Sven Kotlarski, Pradeebane Vaittinada Ayar, Bastien François, and Manuela I. Brunner

Abstract. Hydrological climate change impact studies typically rely on hydrological projections generated by hydrological models driven with bias adjusted climate simulations. Such hydrological projections are influenced by internal climate variability, which can mask the emergence of robust climate trends. To account for this internal variability in climate projections, single model initial-condition large ensembles (SMILEs) can be employed. SMILEs are generated by running a single global/regional climate model many times with slightly perturbed initial conditions. However, it remains challenging to select an appropriate bias adjustment method for SMILEs used in hydrological impact studies because of the relative importance of inter-variable dependence and the preservation of both climate variability and change signal. To facilitate such selection, we here compare different bias adjustment methods applied to SMILEs and their effect on hydrological impact assessments. Specifically, we investigate how climate and hydrological extremes are changing for 87 catchments in the Swiss Alps when using (a) univariate vs. bivariate, (b) ensemble vs. individual-member, and (c) change-preserving vs. non-change-preserving bias adjustment methods. To do so, we adjust the biases of a 50-member SMILE with the different adjustment methods and drive a hydrological model to simulate and project high- and low-flows. Our comparison shows (1) no clear benefits from using bivariate instead of univariate bias adjustment methods when the SMILE already efficiently simulates the dependence between temperature and precipitation; (2) that the choice of using ensemble vs. individual-member and change-preserving vs. non-change-preserving bias adjustments leads to large differences in temperature, precipitation and streamflow signal-to-noise ratios and streamflow and precipitation time-of-emergence. These influences need to be considered when selecting an appropriate bias adjustment method for a given application. Based on our comparison, we generally recommend to apply change-preserving and ensemble bias adjustment methods in future hydrological impact studies using SMILEs.

Received: 16 Dec 2024 – Discussion started: 03 Mar 2025

Competing interests: Manuela Brunner is an associate Editor with HESS. The authors declare no other conflict of interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 7395 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (7395 KB)

Supplement (3372 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

23 Oct 2025

Impact of bias adjustment strategy on ensemble projections of hydrological extremes

Paul C. Astagneau, Raul R. Wood, Mathieu Vrac, Sven Kotlarski, Pradeebane Vaittinada Ayar, Bastien François, and Manuela I. Brunner

Hydrol. Earth Syst. Sci., 29, 5695–5718, https://doi.org/10.5194/hess-29-5695-2025,https://doi.org/10.5194/hess-29-5695-2025, 2025

Short summary

Paul C. Astagneau, Raul R. Wood, Mathieu Vrac, Sven Kotlarski, Pradeebane Vaittinada Ayar, Bastien François, and Manuela I. Brunner

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3966', Faranak Tootoonchi, 23 Apr 2025
This paper is very well-written and highly relevant. The assessment of the impact of bias-adjustment techniques on SMILEs is both timely and novel. The authors have clearly put significant effort into considering important steps for bias adjustment. The results section is thorough and addresses all the proposed research questions and even goes beyond them.
I have a few minor remarks:
In my view, the paper is too long and requires multiple rounds of thorough reading to absorb all the information. I understand it is not easy to cut a paper like this shorter or present it in a simpler way. Nonetheless, I encourage the authors to read the paper again and see whether more plots can be moved to the supplementary section and whether some parts of the result section can be summarized. I think certain plots from the historic analysis can be omitted. Figure 7 is particularly difficult to interpret and I am not entirely sure if I understood its key point. I could have skipped Figure 10 and limited the plots to what is shown for runoff in Figure 11. But even Figure 11 is challenging to grasp, as it represents the final output of multiple subtractions. Again, I understand that it is not too easy to cut this paper short but I think doing the laborious work of summarizing it, helps with readability.

The authors did not find significant benefits of the multivariate bias adjustment method compared to the univariate approaches, and I find this result reasonable. They attribute this outcome to the well-preserved correlation in this particular SMILE. In my view, the relatively low P–T correlation in the observational data (Figure 4b) also contributed to this result, as there was no strong correlation that needed to be preserved. When the correlation is weak, bias adjustment for separate months may be sufficient to maintain a reasonable dependence between precipitation and temperature. In such cases, I would argue that preserving temporal order might be more important. Ultimately, I would recommend that impact modelers evaluate whether correlation (or even chronology) is important for their specific application and choose a simple method that adjusts just enough, but not more. If the authors agree with this point, I suggest including it in the final discussion and recommendations.

In section 2.4 (evaluation) does it help to have a table with all the indicators you evaluated, separate for P, T and Q, present and future?

Specific comments:
L3: You can remove this from ‘this internal’ variability.
L136: Mention what the five setups are and then in table 1, in the title mention that the combinations in the last two columns encompasses five bias adjustment setups.
L171: Why not the dependence?
L185-186: The sentence here is somewhat a repetition of L180-182.
L219 and then L253: Why P1 and P2 are introduced in the text but are not used in any part of the result? True that you want to cross validate but if the results are shown all together, is it really necessary to introduce an abbreviation? And then considering what mentioned in the text why Figure 3 is only for one sub period? Why not to show it for the entire historic period? And what is efficiency in this figure?
Does it make sense to already mention in L219 what is later mentioned in L253? And Did I understand correctly that you name the runoff simulation through this joint combination control run? If it is so, please already mention it in the text. I had a bit of difficulty understanding what period Figure 2 is showing.
L233: Change however to instead. And the whole L233-238 requires some rewriting. The section sounds more like an statement rather than what has been done in the paper.
L249: The term ‘use’ is unclear to me. It is unclear ‘how’ you evaluated it.
L259: the term ‘signal’ is unclear to me. Do you mean the difference between averages?
L265: Remove second. There are two firsts in the previous paragraph. So it is unclear which first this second comes after. I would have personally rephrased the previous paragraph to avoid those firsts.
L271: Remove the time-of-emergence and join the two sentences.
L275: Until here it was not mentioned that you will look at groups of catchment with different elevation levels (or did I miss it?). Cool that you did. But does it make sense to already bring it up earlier in the text and group the catchments in Figure 1 based on the three categories of elevation, to signal this to the reader?
L331-332: Doesn’t this belong to any other section but not the result?
Figure 7: I unfortunately did not understand Figure 7 and its aim after many tries. If it is not only me, please consider both rewriting the section and re-visualizing it, or instead think of removing the plot and the text all together.
L375-376: Somewhat repeats the beginning of the section in L355.
L414: I think setup is better than methods. Not all mentioned in the parenthesis are methods.
Figure 11 is slightly complicated. Instead of showing the subtractions can you show the actual boxplots separately for each of the pairs?
L434-446: This part and Figure 12 is very interesting. However, I think some part of the text belong to discussion. I would have loved to see a plot similar to Figure 9 but for runoff just to see how the methods behave for all runoff simulated components in the catchments.
L514: Unclear what strategies mean here.
L525: Cite the plot for precipitation.
L558: I agree that change preserving is inherently more in line with the aim of future impact studies. But I slightly disagree with the rest of this paragraph: Apart from having the same performance for precipitation, combination of change preserving and individual bias adjustment strategy resulted in very different signal for high flow in Saltina at Brig compared to the rest (Figure 12). One might argue that 99^th percentile is too extreme, but then essentially all methods are more or less similar when it comes to moderate or moderately extreme percentiles. Based on your results, your third point sounds more concrete to me. So my suggestion is to reshuffle third and second point and use an even more cautious tone in suggesting second point.
Citation: https://doi.org/10.5194/egusphere-2024-3966-RC1
- AC1: 'Reply on RC1', Paul C. Astagneau, 08 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3966-AC1
CC1:
'Comment on egusphere-2024-3966', Thomas Bosshard, 02 Jun 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-CC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2024-3966-CC1
- RC3:
  'Reply on CC1', Thomas Bosshard, 09 Jun 2025
  
  I reposted my review under RC2 and thus, the comment CC1 can be disregarded. I apologize for any confusion caused by having submitted it first under the wrong type.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3966-RC3
  - AC4: 'Reply on RC3', Paul C. Astagneau, 08 Jul 2025
    
    Thank you for the clarification. We replied to RC2.
    
    Citation: https://doi.org/10.5194/egusphere-2024-3966-AC4
RC2:
'Comment on egusphere-2024-3966', Thomas Bosshard, 09 Jun 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2024-3966-RC2
- AC2: 'Reply on RC2', Paul C. Astagneau, 08 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3966-AC2
RC4:
'Comment on egusphere-2024-3966', Anonymous Referee #3, 16 Jun 2025

The paper presents a performance and sensitivity analysis of different climate model bias correction methods and their effect on hydrological simulations of streamflow extremes (low and high flow). By comparing various existing methods across a range of basins in Switzerland, the paper is able to draw useful conclusions about the performance of existing bias correction methods for simulating streamflow in a historical period, and about the sensitivity of future streamflow projections to different bias correction methods.
The methodology is well designed, the paper is clearly structured, and the analysis is comprehensive, with different parts of the analysis logically connecting to one another. My comments are as follows.
1. The abstract should be improved, both in terms of clarity and in terms of better capturing the relevant conclusions of the study. A few pointers:
-line 12: "no clear benefits from using bivariate instead of univariate bias adjustment methods when the SMILE already efficiently simulates the dependence between temperature and precipitation". I wonder how robust/general this conclusion is. Wouldn't independent (univariate) bias correction potentially alter the dependence between variables?
-lines 15 and 16: not clear what is meant by "precipitation and streamflow signal-to-noise ratios" and by "streamflow and precipitation time-of-emergence". This only becomes clear after reading the paper.
-line 17: "we generally recommend to apply change-preserving and ensemble bias adjustment methods in future hydrological impact studies using SMILEs". To make the abstract more informative, it would be good to clarify in the abstract how this conclusion was reached. The abstract says that there are large differences between bias-correction methods, but does not specify why some methods are preferred over others.
-the abstract could also mention shortcomings identified in existing methods, i.e. which improvements are necessary based on the findings in this study. The need for more research into bias correction methods is mentioned in sections 4.3 and 4.4, but without identifying which improvements are needed, even though the detailed evaluation in this study presumably provided some useful insights on this.
2. The conclusions section (section 5) should be improved: it seems to largely focus on precipitation and temperature rather than streamflow.
3. The limitations and perspectives section (section 4.4) is currently very short. Several issues identified in the comments here could potentially be addressed in this section.
4. Basin selection (section 2.1): basins with glaciers are excluded from the analysis because the hydrological model does not account for glaciers. It would be good to come back to this in the discussion, i.e. how relevant are the results and conclusions for basins with glaciers, as these regions are especially vulnerable to climate change.
5. One of the conclusions is that differences between bias correction methods are significant. One wonders whether these differences are still significant when considering all other uncertainties in the climate change modeling chain (data errors, model errors, forcing/scenario uncertainties...). Some discussion/reflection on this would be welcome.
6. Model errors: evaluation of the hydrological model against streamflow observations is reported in terms of KGE, which gives an indication of overall model performance (across all flow levels). Since the paper focuses on flow extremes, it would be good to know how the model performs in terms of reproducing the flow quantiles studied in figure 2 and later figures (i.e. the 1%, 50% and 99% annual flow quantiles). For example, this can help put the differences between bias correction methods into perspective.
7. Data errors: "observations" of precipitation and temperature are based on gridded (interpolated) meteorological station data, which are used as benchmark ('ground-truth') in this paper (line 155). To what extent does bias and noise in these data affect the results? E.g. typical sources of bias are under-catch of precipitation gauge measurements (especially for snow) and the absence of stations at high elevations.
8. Evaluation: for evaluation of the bias correction methods, the authors adopt a method presented by Suarez-Gutierez et al. 2021; specifically they quantify the fraction of observations that fall in the 75% ensemble confidence interval. Note that those same authors also look at other aspects, e.g. they suggest making a rank histogram which should look uniform (see their figure 1). A cdf version of the same idea is in Laio et al. 2007 (figure 2 in https://hess.copernicus.org/articles/11/1267/2007/). It seems this would allow for a more complete evaluation of the ensembles. Can the authors comment on whether these methods are applicable here and why they were not considered?
9. Consistency in terminology: on line 136, we are introduced to "five bias adjustment setups". Later on, a distinction is made between 3 bias adjustment methods and 2 ensemble adjustment methods (e.g. figure 12), while figure 10 refers to these combinations as bias adjustment options. Would be good to be consistent and for example introduce the naming used in figure 10 from the start and use it consistently throughout the paper.
10. Line 185: "We run the adjustments at the grid scale rather than the catchment scale to avoid adding a downscaling step to the procedure, and because the catchments are of different sizes." The reasoning here is not clear to me, i.e. how does bias adjustment at the catchment scale add a downscaling step (compared to adjustment at grid scale followed by moving to catchment scale), and how does catchment size come into play?
11. Overall structure of the results section: even though this section already flows quite nicely, you could consider splitting up section 3.2 into two further sub-sections (precip/temp and streamflow), and using the same split in section 3.1 (precip/temp and streamflow). Currently, section 3.1 starts with streamflow, so opposite order of section 3.2. Not super crucial, but readability may improve by breaking up the results into smaller pieces and using consistent order in sections 3.1 and 3.2.
12. Figure 2: clarify what is meant by "control runs" - I know it is mentioned in the methodology section, but it should be clear from the figure caption. Also, the figure axis should make clear which variable we're looking at. Suggest to rename bias adjustment method "raw" to "none" or "unadjusted". And I assume these are box plots, would be good to explicitly mention that. And which variability is captured by these box plots? Is it variability across the 87 basins?
13. Figure 2 and other figures focus on the 75% ensemble interval for streamflow. Why did you pick 75% and would your conclusions change if you pick another percentage? See also comment 8.
14. Why does figure 3 show results for one of the evaluation periods whereas figure 2 shows results for both? Also, the color bar title ("fraction of control runs") should make clear that we're looking at streamflow.
15. Figure 10: figure/axis title should make clear we're looking at precipitation. Same for figures 11 and 12, make sure the figure/axis title mentions 'streamflow'.
16. Figure 12: noise is expressed as %. Is this the coefficient of variation? The axis title calls it standard deviation?
17. Line 587: "ensemble adjustments combined with the change-preserving method are less efficient for the tails of the precipitation and temperature distributions in the historical period, probably because the raw change signals are small compared to the internal variability for many catchments". This is not clear. How does the climate change signal (second part of sentence) affect performance of the bias correction method in the historical period (first part of sentence)?
18. Line 128: biased --> bias

Citation: https://doi.org/10.5194/egusphere-2024-3966-RC4
- AC3: 'Reply on RC4', Paul C. Astagneau, 08 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3966-AC3

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3966', Faranak Tootoonchi, 23 Apr 2025
This paper is very well-written and highly relevant. The assessment of the impact of bias-adjustment techniques on SMILEs is both timely and novel. The authors have clearly put significant effort into considering important steps for bias adjustment. The results section is thorough and addresses all the proposed research questions and even goes beyond them.
I have a few minor remarks:
In my view, the paper is too long and requires multiple rounds of thorough reading to absorb all the information. I understand it is not easy to cut a paper like this shorter or present it in a simpler way. Nonetheless, I encourage the authors to read the paper again and see whether more plots can be moved to the supplementary section and whether some parts of the result section can be summarized. I think certain plots from the historic analysis can be omitted. Figure 7 is particularly difficult to interpret and I am not entirely sure if I understood its key point. I could have skipped Figure 10 and limited the plots to what is shown for runoff in Figure 11. But even Figure 11 is challenging to grasp, as it represents the final output of multiple subtractions. Again, I understand that it is not too easy to cut this paper short but I think doing the laborious work of summarizing it, helps with readability.

The authors did not find significant benefits of the multivariate bias adjustment method compared to the univariate approaches, and I find this result reasonable. They attribute this outcome to the well-preserved correlation in this particular SMILE. In my view, the relatively low P–T correlation in the observational data (Figure 4b) also contributed to this result, as there was no strong correlation that needed to be preserved. When the correlation is weak, bias adjustment for separate months may be sufficient to maintain a reasonable dependence between precipitation and temperature. In such cases, I would argue that preserving temporal order might be more important. Ultimately, I would recommend that impact modelers evaluate whether correlation (or even chronology) is important for their specific application and choose a simple method that adjusts just enough, but not more. If the authors agree with this point, I suggest including it in the final discussion and recommendations.

In section 2.4 (evaluation) does it help to have a table with all the indicators you evaluated, separate for P, T and Q, present and future?

Specific comments:
L3: You can remove this from ‘this internal’ variability.
L136: Mention what the five setups are and then in table 1, in the title mention that the combinations in the last two columns encompasses five bias adjustment setups.
L171: Why not the dependence?
L185-186: The sentence here is somewhat a repetition of L180-182.
L219 and then L253: Why P1 and P2 are introduced in the text but are not used in any part of the result? True that you want to cross validate but if the results are shown all together, is it really necessary to introduce an abbreviation? And then considering what mentioned in the text why Figure 3 is only for one sub period? Why not to show it for the entire historic period? And what is efficiency in this figure?
Does it make sense to already mention in L219 what is later mentioned in L253? And Did I understand correctly that you name the runoff simulation through this joint combination control run? If it is so, please already mention it in the text. I had a bit of difficulty understanding what period Figure 2 is showing.
L233: Change however to instead. And the whole L233-238 requires some rewriting. The section sounds more like an statement rather than what has been done in the paper.
L249: The term ‘use’ is unclear to me. It is unclear ‘how’ you evaluated it.
L259: the term ‘signal’ is unclear to me. Do you mean the difference between averages?
L265: Remove second. There are two firsts in the previous paragraph. So it is unclear which first this second comes after. I would have personally rephrased the previous paragraph to avoid those firsts.
L271: Remove the time-of-emergence and join the two sentences.
L275: Until here it was not mentioned that you will look at groups of catchment with different elevation levels (or did I miss it?). Cool that you did. But does it make sense to already bring it up earlier in the text and group the catchments in Figure 1 based on the three categories of elevation, to signal this to the reader?
L331-332: Doesn’t this belong to any other section but not the result?
Figure 7: I unfortunately did not understand Figure 7 and its aim after many tries. If it is not only me, please consider both rewriting the section and re-visualizing it, or instead think of removing the plot and the text all together.
L375-376: Somewhat repeats the beginning of the section in L355.
L414: I think setup is better than methods. Not all mentioned in the parenthesis are methods.
Figure 11 is slightly complicated. Instead of showing the subtractions can you show the actual boxplots separately for each of the pairs?
L434-446: This part and Figure 12 is very interesting. However, I think some part of the text belong to discussion. I would have loved to see a plot similar to Figure 9 but for runoff just to see how the methods behave for all runoff simulated components in the catchments.
L514: Unclear what strategies mean here.
L525: Cite the plot for precipitation.
L558: I agree that change preserving is inherently more in line with the aim of future impact studies. But I slightly disagree with the rest of this paragraph: Apart from having the same performance for precipitation, combination of change preserving and individual bias adjustment strategy resulted in very different signal for high flow in Saltina at Brig compared to the rest (Figure 12). One might argue that 99^th percentile is too extreme, but then essentially all methods are more or less similar when it comes to moderate or moderately extreme percentiles. Based on your results, your third point sounds more concrete to me. So my suggestion is to reshuffle third and second point and use an even more cautious tone in suggesting second point.
Citation: https://doi.org/10.5194/egusphere-2024-3966-RC1
- AC1: 'Reply on RC1', Paul C. Astagneau, 08 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3966-AC1
CC1:
'Comment on egusphere-2024-3966', Thomas Bosshard, 02 Jun 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-CC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2024-3966-CC1
- RC3:
  'Reply on CC1', Thomas Bosshard, 09 Jun 2025
  
  I reposted my review under RC2 and thus, the comment CC1 can be disregarded. I apologize for any confusion caused by having submitted it first under the wrong type.
  
  Citation: https://doi.org/10.5194/egusphere-2024-3966-RC3
  - AC4: 'Reply on RC3', Paul C. Astagneau, 08 Jul 2025
    
    Thank you for the clarification. We replied to RC2.
    
    Citation: https://doi.org/10.5194/egusphere-2024-3966-AC4
RC2:
'Comment on egusphere-2024-3966', Thomas Bosshard, 09 Jun 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2024-3966-RC2
- AC2: 'Reply on RC2', Paul C. Astagneau, 08 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3966-AC2
RC4:
'Comment on egusphere-2024-3966', Anonymous Referee #3, 16 Jun 2025

The paper presents a performance and sensitivity analysis of different climate model bias correction methods and their effect on hydrological simulations of streamflow extremes (low and high flow). By comparing various existing methods across a range of basins in Switzerland, the paper is able to draw useful conclusions about the performance of existing bias correction methods for simulating streamflow in a historical period, and about the sensitivity of future streamflow projections to different bias correction methods.
The methodology is well designed, the paper is clearly structured, and the analysis is comprehensive, with different parts of the analysis logically connecting to one another. My comments are as follows.
1. The abstract should be improved, both in terms of clarity and in terms of better capturing the relevant conclusions of the study. A few pointers:
-line 12: "no clear benefits from using bivariate instead of univariate bias adjustment methods when the SMILE already efficiently simulates the dependence between temperature and precipitation". I wonder how robust/general this conclusion is. Wouldn't independent (univariate) bias correction potentially alter the dependence between variables?
-lines 15 and 16: not clear what is meant by "precipitation and streamflow signal-to-noise ratios" and by "streamflow and precipitation time-of-emergence". This only becomes clear after reading the paper.
-line 17: "we generally recommend to apply change-preserving and ensemble bias adjustment methods in future hydrological impact studies using SMILEs". To make the abstract more informative, it would be good to clarify in the abstract how this conclusion was reached. The abstract says that there are large differences between bias-correction methods, but does not specify why some methods are preferred over others.
-the abstract could also mention shortcomings identified in existing methods, i.e. which improvements are necessary based on the findings in this study. The need for more research into bias correction methods is mentioned in sections 4.3 and 4.4, but without identifying which improvements are needed, even though the detailed evaluation in this study presumably provided some useful insights on this.
2. The conclusions section (section 5) should be improved: it seems to largely focus on precipitation and temperature rather than streamflow.
3. The limitations and perspectives section (section 4.4) is currently very short. Several issues identified in the comments here could potentially be addressed in this section.
4. Basin selection (section 2.1): basins with glaciers are excluded from the analysis because the hydrological model does not account for glaciers. It would be good to come back to this in the discussion, i.e. how relevant are the results and conclusions for basins with glaciers, as these regions are especially vulnerable to climate change.
5. One of the conclusions is that differences between bias correction methods are significant. One wonders whether these differences are still significant when considering all other uncertainties in the climate change modeling chain (data errors, model errors, forcing/scenario uncertainties...). Some discussion/reflection on this would be welcome.
6. Model errors: evaluation of the hydrological model against streamflow observations is reported in terms of KGE, which gives an indication of overall model performance (across all flow levels). Since the paper focuses on flow extremes, it would be good to know how the model performs in terms of reproducing the flow quantiles studied in figure 2 and later figures (i.e. the 1%, 50% and 99% annual flow quantiles). For example, this can help put the differences between bias correction methods into perspective.
7. Data errors: "observations" of precipitation and temperature are based on gridded (interpolated) meteorological station data, which are used as benchmark ('ground-truth') in this paper (line 155). To what extent does bias and noise in these data affect the results? E.g. typical sources of bias are under-catch of precipitation gauge measurements (especially for snow) and the absence of stations at high elevations.
8. Evaluation: for evaluation of the bias correction methods, the authors adopt a method presented by Suarez-Gutierez et al. 2021; specifically they quantify the fraction of observations that fall in the 75% ensemble confidence interval. Note that those same authors also look at other aspects, e.g. they suggest making a rank histogram which should look uniform (see their figure 1). A cdf version of the same idea is in Laio et al. 2007 (figure 2 in https://hess.copernicus.org/articles/11/1267/2007/). It seems this would allow for a more complete evaluation of the ensembles. Can the authors comment on whether these methods are applicable here and why they were not considered?
9. Consistency in terminology: on line 136, we are introduced to "five bias adjustment setups". Later on, a distinction is made between 3 bias adjustment methods and 2 ensemble adjustment methods (e.g. figure 12), while figure 10 refers to these combinations as bias adjustment options. Would be good to be consistent and for example introduce the naming used in figure 10 from the start and use it consistently throughout the paper.
10. Line 185: "We run the adjustments at the grid scale rather than the catchment scale to avoid adding a downscaling step to the procedure, and because the catchments are of different sizes." The reasoning here is not clear to me, i.e. how does bias adjustment at the catchment scale add a downscaling step (compared to adjustment at grid scale followed by moving to catchment scale), and how does catchment size come into play?
11. Overall structure of the results section: even though this section already flows quite nicely, you could consider splitting up section 3.2 into two further sub-sections (precip/temp and streamflow), and using the same split in section 3.1 (precip/temp and streamflow). Currently, section 3.1 starts with streamflow, so opposite order of section 3.2. Not super crucial, but readability may improve by breaking up the results into smaller pieces and using consistent order in sections 3.1 and 3.2.
12. Figure 2: clarify what is meant by "control runs" - I know it is mentioned in the methodology section, but it should be clear from the figure caption. Also, the figure axis should make clear which variable we're looking at. Suggest to rename bias adjustment method "raw" to "none" or "unadjusted". And I assume these are box plots, would be good to explicitly mention that. And which variability is captured by these box plots? Is it variability across the 87 basins?
13. Figure 2 and other figures focus on the 75% ensemble interval for streamflow. Why did you pick 75% and would your conclusions change if you pick another percentage? See also comment 8.
14. Why does figure 3 show results for one of the evaluation periods whereas figure 2 shows results for both? Also, the color bar title ("fraction of control runs") should make clear that we're looking at streamflow.
15. Figure 10: figure/axis title should make clear we're looking at precipitation. Same for figures 11 and 12, make sure the figure/axis title mentions 'streamflow'.
16. Figure 12: noise is expressed as %. Is this the coefficient of variation? The axis title calls it standard deviation?
17. Line 587: "ensemble adjustments combined with the change-preserving method are less efficient for the tails of the precipitation and temperature distributions in the historical period, probably because the raw change signals are small compared to the internal variability for many catchments". This is not clear. How does the climate change signal (second part of sentence) affect performance of the bias correction method in the historical period (first part of sentence)?
18. Line 128: biased --> bias

Citation: https://doi.org/10.5194/egusphere-2024-3966-RC4
- AC3: 'Reply on RC4', Paul C. Astagneau, 08 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3966/egusphere-2024-3966-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3966-AC3

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (04 Aug 2025) by Thom Bogaard

AR by Manuela Irene Brunner on behalf of the Authors (07 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (08 Aug 2025) by Thom Bogaard

RR by Faranak Tootoonchi (14 Aug 2025)

RR by Anonymous Referee #3 (30 Aug 2025)

Suggestions for revision or reasons for rejection

I appreciate the authors' responses and revisions. I have two remaining minor comments related to my previous comment #6 and #8.

-previous comment #6: "Since the paper focuses on flow extremes, it would be good to know how the model performs in terms of reproducing the flow quantiles studied in figure 2 and later figures (i.e. the 1%, 50% and 99% annual flow quantiles). For example, this can help put the differences between bias correction methods into perspective."

The authors responded that adding this is not worthwhile since "the performance of the hydrological model in simulating streamflows should not significantly impact the results".

I see the point, but since the paper focuses on flow extremes, it seems natural (and relatively straightforward) to check how well the model simulates these extremes (after all, the authors do report the overall KGE values, suggesting model performance is not completely irrelevant). And it can put the results in perspective (e.g. are differences in flow between bias correction methods significant compared to errors in flow simulation).

-previous comment #8: "for evaluation of the bias correction methods, the authors adopt a method presented by Suarez-Gutierez et al. 2021; specifically they quantify the fraction of observations that fall in the 75% ensemble confidence interval. Note that those same authors also look at other aspects, e.g. they suggest making a rank histogram which should look uniform".

The authors respond that "“We evaluated the performance of the bias adjustment methods in the historical period by looking at the 75% ensemble confidence interval introduced by Suarez-Gutierrez et al. (2021). One could investigate other confidence intervals and perform a rank analysis to explore more aspects of bias adjustment performance”.

This however leaves the concern that such a rank analysis could potentially change the conclusions of the paper (or that the conclusions change if you look at a different confidence interval). So, a useful addition could be if the authors can argue (or show) that the conclusions are robust to the choice of confidence interval.

Hide

ED: Publish subject to minor revisions (review by editor) (03 Sep 2025) by Thom Bogaard

AR by Paul C. Astagneau on behalf of the Authors (08 Sep 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (09 Sep 2025) by Thom Bogaard

AR by Paul C. Astagneau on behalf of the Authors (09 Sep 2025)

Journal article(s) based on this preprint

23 Oct 2025

Impact of bias adjustment strategy on ensemble projections of hydrological extremes

Paul C. Astagneau, Raul R. Wood, Mathieu Vrac, Sven Kotlarski, Pradeebane Vaittinada Ayar, Bastien François, and Manuela I. Brunner

Hydrol. Earth Syst. Sci., 29, 5695–5718, https://doi.org/10.5194/hess-29-5695-2025,https://doi.org/10.5194/hess-29-5695-2025, 2025

Short summary

Paul C. Astagneau, Raul R. Wood, Mathieu Vrac, Sven Kotlarski, Pradeebane Vaittinada Ayar, Bastien François, and Manuela I. Brunner

Supplement

https://doi.org/10.5194/egusphere-2024-3966-supplement

Paul C. Astagneau, Raul R. Wood, Mathieu Vrac, Sven Kotlarski, Pradeebane Vaittinada Ayar, Bastien François, and Manuela I. Brunner

Viewed

Total article views: 1,075 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
936	112	27	1,075	53	23	39

HTML: 936
PDF: 112
XML: 27
Total: 1,075
Supplement: 53
BibTeX: 23
EndNote: 39

Views and downloads (calculated since 03 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	131	26	3	160
Apr 2025	53	13	3	69
May 2025	33	11	1	45
Jun 2025	60	19	13	92
Jul 2025	44	10	2	56
Aug 2025	118	18	3	139
Sep 2025	461	11	1	473
Oct 2025	36	4	1	41

Cumulative views and downloads (calculated since 03 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	131	26	3	160
Apr 2025	53	13	3	69
May 2025	33	11	1	45
Jun 2025	60	19	13	92
Jul 2025	44	10	2	56
Aug 2025	118	18	3	139
Sep 2025	461	11	1	473
Oct 2025	36	4	1	41

Viewed (geographical distribution)

Total article views: 1,042 (including HTML, PDF, and XML) Thereof 1,042 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 23 Oct 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (7395 KB)
Metadata XML

Short summary

To study floods and droughts are likely to change in the future, we use climate projections from climate models. However, we first need to adjust the systematic biases of these projections at the catchment scale before using them in hydrological models. Our study compares statistical methods that can adjust these biases, but specifically for climate projections that enable a quantification of internal climate variability. We provide recommendations on the most appropriate methods.


Total:	0
HTML:	0
PDF:	0
XML:	0