the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
On the use of streamflow transformations for hydrological model calibration
Abstract. The calibration of hydrological models through the use of automatic algorithms aims at identifying parameter sets that minimize the deviation of simulations from observations (often streamflows). It is a widespread technique that has been the subject of much research in the past. Indeed, the choice of objective function (i.e. the criterion or combination of criteria to optimize) can significantly impact the parameter set values identified as optimal by the algorithm. Besides, the actual goal of the model application (flood or lowflow estimation, for instance) influences the way calibration is undertaken. This article discusses how mathematical transformations, which are sometimes applied to the target variable before calculating the objective function, impact model simulations. Such transformations, for example square root or logarithmic, aim at increasing the weight of errors made in specific ranges of the hydrograph. Typically, a logarithmic transformation tends to increase the fit of streamflows to lower values, compared to no transformation. We show in a catchment set that the impact of these transformations on the obtained time series can sometimes be different from what could be expected. Extreme transformations, such as squared or inverse of squared transformations, lead to models that are specialized for extreme streamflows, but show poor performance outside the range of the targeted streamflows and are less robust. Other transformations, such as the power 0.2, the Box–Cox and the logarithmic transformations, can be qualified as more generalist, and show a good performance for the intermediate range of streamflows, along with an acceptable performance for extreme streamflows.
 Preprint
(1805 KB)  Metadata XML
 BibTeX
 EndNote
Status: final response (author comments only)

RC1: 'Comment on egusphere2023775  Contribution could be more fundamental', Anonymous Referee #1, 23 May 2023
The use of data transformation has a long tradition in the context of hydrologic model calibration, which makes this an interesting topic to review and analyse as the authors do. Here are some comments to further improve the study and its context.
[1] In lines 3646, the authors cite a lot of literature where transformations have been used. I find this paragraph very difficult to read. Would it not be useful to place all these papers in a table and simply report percentages of time a particular transformation has been used? It is quite difficult to find the nonreference text in this paragraph.
[2] More explanation would be helpful in places to be clearer about what previous authors found and what the state of knowledge is. The authors cite studies, but it is not clear what relevance the conclusions of these papers have. A couple of examples:
"PeñaArancibia et al. (2015) showed that a squared root transformation with the Nash–Sutcliffe efficiency leads to a better calibration and a reduced parameter uncertainty than no transformation or a logarithmic transformation." – In how far did it lead to better calibration? What does better calibration mean in this context? A better NSE value?
"Sadegh et al. (2018) investigated the role of several transformations in three catchments and two models and deduced that data transformations might be more helpful for evaluation and analysis of model behaviour than model inference." – Why did they conclude that? Why the difference in result for evaluation and inference? Is this conclusion not in conflict with the conclusion of PeñaArancibia et al.? What does ‘analysis’ mean in this context.
[3] Why do the authors select these objective functions shown in section 2.3. The authors state that they analyze the following: ‘in order to estimate how transformations impact the simulated time series’ . But this is not really what the authors do. They assess performance difference with respect to a couple of popular metrics, they do not analyze how the actual time series changes beyond assessing model performance.
[4] I am a bit confused by the transformations introduced in section 2.4. Aren’t some of the transformations included in others? E.g. the log transformation is a specific case of the BoxCox transformation. Why not use the minimum number of transformations and then test the influence of the scaling parameter used in the transformation. Using just the BoxCox transformation and a Q^x transformation with lambda and x varying would capture most and would allow for a more general analysis. You could use the two flexible transformations and plot the result against the lambda and x values used and against the streamflow percentiles to get a better fundamental overview about what is happening!?
[5] What lambda value has been used for the BoxCox transformation? The result should be dependent on that choice given that the transformation is flexible. Previous studies suggested a lambda value of 0.3 to suitable for streamflow data to gain a more balanced calibration results (e.g. Vrugt et al. (2006), Journal of Hydrology, doi: 10.1016/j.hydrol.2005.10.041). How much does the result depend on that choice?
[6] In line 245 you state: "In addition, the transformations that show the best average rank are not widely used in the literature (0.2, log and boxcox)."– Are you sure about this? Log and BoxCox (lambda of 0.3) transformations are such a standard to reduce the focus on high flows. They might not have been a focus in very recent years, but certainly from the late 90s to some years ago, they were widely used.
Some (random) examples:
Lerat et al. (2020). Journal of Hydrology, doi.org/10.1016/j.jhydrol.2020.125129
van Werkhoven et al. (2008). Water Resour. Res., doi:10.1029/2007WR006271
Huang et al. (2023). Journal of Hydrology, doi.org/10.1016/j.jhydrol.2023.129347
[7] For section 4.3, could the authors not organize the catchments into those dominated by slow and fast behavior, e.g. using the (central) slope of the flow duration curve or some other signature metric? There might be different reasons why a catchment varies in this regard (snow, pervious geology, …), which might not be easily captured by the characteristics available.
All in all, an interesting study, though I think the authors could (should?) provide some more fundamental insight still. For example by varying the parameter of the BoxCox or other flexible transformations.
Citation: https://doi.org/10.5194/egusphere2023775RC1 
AC1: 'Reply on RC1', Guillaume Thirel, 30 Aug 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere2023775/egusphere2023775AC1supplement.pdf

AC1: 'Reply on RC1', Guillaume Thirel, 30 Aug 2023

RC2: 'Comment on egusphere2023775', Anonymous Referee #2, 12 Jun 2023
Summary
In this paper, the authors examine the extent to which variations of a given calibration objective function (e.g., NSE, KGE) – formulated through mathematical transformations of streamflow – affect mean absolute errors. Such impact is characterized using ranks for different streamflow categories in order to facilitate the interpretation of results for low, medium and high flow applications. The approach is demonstrated with the GR4J (Perrin et al. 2003) and GR6J (Pushpalatha et al. 2011) conceptual rainfallrunoff models, coupled with the CemaNeige snow module (Valéry et al. 2014) in those basins where snow is an important component of the water cycle. The authors first illustrate the proposed framework at the Fecht River at Wintzenheim, and then expand their analyses to 325 catchments (all of them in France). The results show that extreme transformations like squared or inverse of squared provide parameter sets that yield good simulations of extreme streamflows, but poor quality out of that range. The study also unveils the versatility of other transformations (e.g., power 0.2 and the Box–Cox) to simulate streamflow values of different magnitudes. Finally, the authors show that these conclusions are insensitive to their choice of calibration criteria selection (which includes NSE and two versions of KGE) or model structure (by comparing GR4J against GR6J).
This is a very relevant topic for the hydrological modeling community and, to the best of my knowledge, no previous studies have conducted a systematic assessment like the one presented here. Additionally, the state of the art and the research motivation are clearly presented by the authors. Having said that, I strongly recommend the authors to rethink the methodological design and the way the results are presented in order to make their study reproducible and more impactful. This manuscript will be suitable for publication in HESS once the authors have made this effort (which might involve a substantial amount of work).
Major comments
1. Methods: I think that showing the impact of transformations on ranks, rather than on the actual absolute errors that were used to generate those ranks, may hide the real effect that the choice of mathematical functions has on streamflow simulations and, more importantly, may distort a lot the differences in performance (and their perception) among the various types of transformation. For example, what is the difference in mean absolute error between rank 1 and 10? I encourage the authors to show the effects of transformations more directly; for example, they could use a normalized mean absolute error for different streamflow categories, to make the results comparable among catchments.
Additional suggestions to make their analysis more impactful:
 Since NSE is formulated as a function of the sum of squared errors, the authors could report the fractional contribution of the total squared error for the 1, 10, 100, 1000 largest error days obtained with the various transformations (see Figure 10 in Newman et al. 2015). This could provide quantitative support to some statements that the authors make (e.g., L192193, L338) referring to the number of days where a specific transformation has more weight.
 Show the impact on some streamflow characteristics (e.g., Pool et al. 2017), also known as hydrological signatures (e.g., Addor et al. 2018; McMillan 2020).
2. In my opinion, some figures are incredibly complex (e.g., Figures 5 and 8), making the communication of the main messages unnecessarily cumbersome. What do the numbers 1 to 11 represent? Are they related to the number of transformations? Figures 6, 9, 10, 11 and 12 are better to show intermethod differences, though these could (should?) show results of actual mean absolute errors. Additionally, Figure 10, 11 and 12 could be merged into one to facilitate the comparison (the same comment applies to Tables 3, 4 and 5).
Minor comments
3. L910: “…can sometimes be different from what could be expected…”. I recommend the authors avoid including vague sentences like this throughout the manuscript, especially in the abstract.
4. L1920: From my view, there is general consensus in the community that no universal hydrological model structure exists, since each one is an assembly of hypotheses on the functioning of a specific hydrological system (Clark et al. 2011). This has motivated a proliferation of flexible modeling platforms such as FUSE (Clark et al. 2008), SUPERFLEX (Fenicia et al. 2011), NoahMP (Niu et al. 2011), SUMMA (Clark et al. 2015a,b, 2021), MARRMoT (Knoben et al. 2019), Raven (Craig et al. 2020) and even airGR with its variants GR5J and GR6J. I think this is a good place to make this point.
5. L24: This is a good place to cite previous studies showing the impact of subjective calibration criteria selection on hydrological modeling applications (e.g., Mendoza et al. 2016; Fowler et al. 2018; Melsen et al. 2019).
6. L33: I think you should refer to Figure 1a.
7. L5258: I suggest citing these studies in chronological order.
8. Figure 1: I suggest including the model being used and the simulation year in the figure caption.
9. L7072: This sentence is very confusing. "Alteration" may be interpreted by some readers as human intervention. I suggest rewording.
10. L8283: Did the authors examine whether the calibration and evaluation periods are hydroclimatically different? Please clarify.
11. I think that much of the text in section 2.2 corresponds to methodology, and therefore should be included in the methods section.
12. L98: Can you please clarify how you determined snowfall occurrence in your basins?
13. L101: Why did you choose five elevation bands and not more/less? Did you try other configurations? I think this needs a proper justification, given the large effects that this decision may have on simulated states and fluxes (Murillo et al. 2022).
14. Table 1: I think it would be more informative to show these attributes as maps with a color bar (see, for example, Addor et al. 2017; AlvarezGarreton et al. 2018).
15. L111: please specify whether your simulations consider a spinup period.
16. L160163: I think this text should be in the methods section.
17. Figure 3d: the numbers in the y axis are not legible.
18. Figure 5 (caption): is CemaNeige implemented in this basin?
19. L185: ‘average rank of transformations’. How do you compute that average?
20. L185190: all these comparisons are very hard to visualize. You could use symbols in Figures 6 and 9 to 12 to help readers to see what you want them to see. For example, use X for negative transformations, square for log, circle for BoxCox, etc.
21. L261: this sentence is unclear. What do you mean?
22. L279: ‘to behave much worse’. Note that you are judging based on the ranks, and not on the actual sum of absolute errors. I think it would be much more honest if you showed the latter.
23. L296: You have ranks for 9/11 transformations. Did you obtain the same number of correlations?
24. L301: Are these correlations statistically significant?
25. Section 4.3: I suggest the authors adding to their analysis the aridity index, the seasonality of aridity (Knoben et al. 2018) and maybe the center of time of runoff (Stewart et al. 2005).
Some suggested edits
26. L30: ‘have been’ > ‘has been’ (‘a wide panel’ is singular).
27. L36: I suggest deleting ‘more specifically’.
28. L44: ‘some other works’ > ‘other studies’.
29. L4849: delete ‘Nevertheless, some authors tried to investigate this issue. For instance,’.
30. L59: ‘Still, most of the time’ > ‘To the best of our knowledge’.
31. L59: ‘are not’ > ‘have not been’.
32. L6162: I strongly encourage the authors to write that finding with their own words instead of quoting.
33. L63 and anywhere else: I recommend the authors using past tense (i.e., ‘used’ and ‘justified’) when referring to previous studies.
34. L68: ‘tends to illustrate’ > ‘illustrates these assertions to some degree’. Delete ‘we feel that’.
35. L69: delete ‘in this article’.
36. L75: ‘Data’ > ‘We used data from…’. I strongly motivate the authors to use active voice.
37. L95: ‘Maximal’ > ‘Maximum’.
38. L101: ‘take into account the catchment heterogeneity’ > ‘consider intracatchment variability’.
39. L103: delete ‘while GR4J is the main model used’ and write ‘In this work, we also use the GR6J model to assess the transferability...’.
40. L124: ‘with N the total number’ > ‘being N the total number’.
41. L131: ‘as this focuses’ > ‘as it focuses’.
42. L300: Delete ‘Unfortunately, only a few correlations could be identified’.
43. L301: ‘Anticorrelations’ reads really awkward. I suggest writing ‘negative correlations’ instead.
References
Addor, N., A. J. Newman, N. Mizukami, and M. P. Clark, 2017: The CAMELS data set: Catchment attributes and meteorology for largesample studies. Hydrol. Earth Syst. Sci., doi:10.5194/hess2152932017.
Addor, N., G. Nearing, C. Prieto, A. J. Newman, N. Le Vine, and M. P. Clark, 2018: A Ranking of Hydrological Signatures Based on Their Predictability in Space. Water Resour. Res., 54, 8792–8812, doi:10.1029/2018WR022606.
AlvarezGarreton, C., and Coauthors, 2018: The CAMELSCL dataset: Catchment attributes and meteorology for large sample studiesChile dataset. Hydrol. Earth Syst. Sci., 22, 5817–5846, doi:10.5194/hess2258172018.
Clark, M. P., A. G. Slater, D. E. Rupp, R. A. Woods, J. A. Vrugt, H. V. Gupta, T. Wagener, and L. E. Hay, 2008: Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resour. Res., 44, W00B02, doi:10.1029/2007WR006735.
——, D. Kavetski, and F. Fenicia, 2011: Pursuing the method of multiple working hypotheses for hydrological modeling. Water Resour. Res., 47, W09301, doi:10.1029/2010WR009827.
Clark, M. P., and Coauthors, 2015a: A unified approach for processbased hydrologic modeling: 1. Modeling concept. Water Resour. Res., doi:10.1002/2015WR017198.
——, and Coauthors, 2015b: A unified approach for processbased hydrologic modeling: 2. Model implementation and case studies. Water Resour. Res., doi:10.1002/2015WR017200.
Clark, M. P., and Coauthors, 2021: The numerical implementation of land models: Problem formulation and laugh tests. J. Hydrometeorol., 22, 1627–1648, doi:10.1175/JHMD200175.1.
Craig, J. R., and Coauthors, 2020: Flexible watershed simulation with the Raven hydrological modelling framework. Environ. Model. Softw., 129, 104728, doi:10.1016/j.envsoft.2020.104728. https://doi.org/10.1016/j.envsoft.2020.104728.
Fenicia, F., D. Kavetski, and H. H. G. Savenije, 2011: Elements of a flexible approach for conceptual hydrological modeling: 1. Motivation and theoretical development. Water Resour. Res., 47, W11510, doi:10.1029/2010WR010174.
Fowler, K., M. Peel, A. Western, and L. Zhang, 2018: Improved RainfallRunoff Calibration for Drying Climate: Choice of Objective Function. Water Resour. Res., 54, 3392–3408, doi:10.1029/2017WR022466.
Knoben, W. J. M., R. A. Woods, and J. E. Freer, 2018: A Quantitative Hydrological Climate Classification Evaluated With Independent Streamflow Data. Water Resour. Res., 54, 5088–5109, doi:10.1029/2018WR022913. https://onlinelibrary.wiley.com/doi/abs/10.1029/2018WR022913.
——, J. E. Freer, K. J. A. Fowler, M. C. Peel, and R. A. Woods, 2019: Modular Assessment of Rainfall–Runoff Models Toolbox (MARRMoT) v1.2: an opensource, extendable framework providing implementations of 46 conceptual hydrologic models as continuous statespace formulations. Geosci. Model Dev., 12, 2463–2480, doi:10.5194/gmd1224632019.
McMillan, H., 2020: Linking hydrologic signatures to hydrologic processes: A review. Hydrol. Process., 34, 1393–1409, doi:10.1002/hyp.13632.
Melsen, L., A. J. Teuling, P. J. J. F. Torfs, M. Zappa, N. Mizukami, P. A. Mendoza, M. P. Clark, and R. Uijlenhoet, 2019: Subjective modeling decisions can significantly impact the simulation of flood and drought events. J. Hydrol., 568, 1093–1104, doi:10.1016/j.jhydrol.2018.11.046.
Mendoza, P. A., M. P. Clark, N. Mizukami, E. D. Gutmann, J. R. Arnold, L. D. Brekke, and B. Rajagopalan, 2016: How do hydrologic modeling decisions affect the portrayal of climate change impacts? Hydrol. Process., 30, 1071–1095, doi:10.1002/hyp.10684.
Murillo, O., P. A. Mendoza, N. Vásquez, N. Mizukami, and Á. Ayala, 2022: Impacts of Subgrid Temperature Distribution Along Elevation Bands in Snowpack Modeling: Insights From a Suite of Andean Catchments. Water Resour. Res., 58, under review, doi:10.1029/2022WR032113.
Newman, A. J., and Coauthors, 2015: Development of a largesample watershedscale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance. Hydrol. Earth Syst. Sci., 19, 209–223, doi:10.5194/hess192092015. http://www.hydrolearthsystsci.net/19/209/2015/.
Niu, G.Y., and Coauthors, 2011: The community Noah land surface model with multiparameterization options (NoahMP): 1. Model description and evaluation with localscale measurements. J. Geophys. Res., 116, D12109, doi:10.1029/2010JD015139.
Perrin, C., C. Michel, and V. Andréassian, 2003: Improvement of a parsimonious model for streamflow simulation. J. Hydrol., 279, 275–289, doi:10.1016/S00221694(03)002257.
Pool, S., M. J. P. Vis, R. R. Knight, and J. Seibert, 2017: Streamflow characteristics from modeled runoff time series  Importance of calibration criteria selection. Hydrol. Earth Syst. Sci., 21, 5443–5457, doi:10.5194/hess2154432017.
Pushpalatha, R., C. Perrin, N. Le Moine, T. Mathevet, and V. Andréassian, 2011: A downward structural sensitivity analysis of hydrological models to improve lowflow simulation. J. Hydrol., 411, 66–76, doi:10.1016/j.jhydrol.2011.09.034. http://dx.doi.org/10.1016/j.jhydrol.2011.09.034.
Stewart, I. T., D. R. Cayan, and M. D. Dettinger, 2005: Changes toward earlier streamflow timing across western North America. J. Clim., 18, 1136–1155, doi:10.1175/JCLI3321.1.
Valéry, A., V. Andréassian, and C. Perrin, 2014: ‘As simple as possible but not simpler’: What is useful in a temperaturebased snowaccounting routine? Part 2 – Sensitivity analysis of the Cemaneige snow accounting routine on 380 catchments. J. Hydrol., 517, 1176–1187, doi:https://doi.org/10.1016/j.jhydrol.2014.04.058.
Citation: https://doi.org/10.5194/egusphere2023775RC2 
AC2: 'Reply on RC2', Guillaume Thirel, 30 Aug 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere2023775/egusphere2023775AC2supplement.pdf
Viewed
HTML  XML  Total  BibTeX  EndNote  

631  283  39  953  30  28 
 HTML: 631
 PDF: 283
 XML: 39
 Total: 953
 BibTeX: 30
 EndNote: 28
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1