the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Adjointbased spatially distributed calibration of a grid GRbased parsimonious hydrological model over 312 French catchments with SMASH platform
François Colleoni
PierreAndré Garambois
Pierre Javelle
Maxime JayAllemand
Patrick Arnaud
Abstract. Reducing uncertainty and improving robustness and spatiotemporal extrapolation capabilities remain key challenges in hydrological modeling especially for flood forecasting over large areas. Parsimonious model structures and effective optimization strategies are crucially needed to tackle the difficult issue of distributed hydrological model calibration from sparse integrative discharge data, that is in general high dimensional inverse problems. This contribution presents the first evaluation of Variational Data Assimilation (VDA), very well suited to this context but still rarely employed in hydrology because of high technicality, and successfully applied here to the spatially distributed calibration of a newly taylored gridbased parsimonious model structure and corresponding adjoint, over a large sample. It is based on the Variational Data Assimilation (VDA) framework of SMASH (Spatially distributed Modelling and ASsimilation for Hydrology) platform, underlying the French national flash flood forecasting system Vigicrues Flash. It proposes an upgraded distributed hourly rainfallrunoff model structure employing GRbased operators, including a nonconservative flux, and its adjoint obtained by automatic differentiation for VDA. The performances of the approach are assessed over annual, seasonal and floods timescales via standard performance metrics and in spatiotemporal validation. The gain of using the proposed nonconservative 6parameters model structure is highlighted in terms of performance and robustness, compared to a simpler 3parameters structure. Spatially distributed calibrations lead to a significant gain in terms of reaching high performances in calibration and temporal validation on the catchments sample, with median efficiencies respectively of NSE = 0.88 (resp. 0.85) and NSE = 0.8 (resp. 0.79) over the total time window on period p2 (resp. p1). Simulated signatures in temporal validation over 1443 (resp. 1522) flood events on period p2 (resp. p1) are quite good with median flood (NSE; KGE) of (0.63; 0.59) (resp. (0.55; 0.53)). Spatiotemporal validations, i.e. on pseudo ungauged cases, lead to encouraging performances also. Moreover, the influence of certain catchment characteristics on model performance and parametric sensitivity is analyzed. Best performances are obtained for Oceanic and Mediterranean basins whereas it performs less well over Uniform basins with significant influence of multifrequency hydrogeological processes. Interestingly, regional sensitivity analysis revealed that the non conservative water exchange parameter and the production parameter, impacting the simulated runoff amount, are the most sensitive parameters along with the routing parameter especially for faster responding catchments. This study is a first step in the construction of a flexibe and versatile multimodel and optimization framework with hydbrid methods for regional hydrological modeling with multisource data assimilation.
François Colleoni et al.
Status: closed

RC1: 'unable to review scientific content of manuscript due to noncompliance with Copernicus data policy', Rolf Hut, 24 Oct 2022
The topic of the paper “Adjointbased spatially distributed calibration of a grid GRbased parsimonious hydrological model over 312 French catchments with SMASH platform” seemed interesting to me since the abstract mentioned use of a new data assimilation scheme (VDA) and a platform for running hydrological models with greater ease.
On reading the paper though, I believe that it does not comply with the Copenicus data policy and should therefore not be published in HESS (in its current form). The points in which I believe the paper does not comply with the Copernicus data policy include (but may not be limited to):
 The paper does not contain a ‘data availability’ section. This is a required section. Copernicus requires authors to state where the data used in / generated by their work can be obtained. Data should ideally be published in openly available repositories with a valid persistent identifier (preferably DOI).
 The paper does not include any link to the software generated and used in this work, whilst it heavily relies on it for its results. The authors introduce a platform for running the SMASH model integrated with data assimilation, but fail to provide that platform to the hydrological community. The only link I could find was a reference to a poster (JayAllemand et. al. 2022) which does not has a DOI and upon finding that poster (at https://hal.archivesouvertes.fr/hal03683657) I discovered that the software mentioned is hosted at the ISTREA gitlab page which is only accessable to ISTREA employees. (gitlab.irstea.fr). The software that contains the main point the authors want to make, and that is used to generate the hydrological analyses they present in the paper, should be openly available to the hydrological community in general and to reviewers of the paper in particular.
If I'm wrong and I misread the paper I deeply apologise and I owe the authors a round of drinks (at least). After reading the paper twice I have failed to locate either the data generated by the experiments described, or the model code or experiment code used to generate the results presented in the paper.
While the authors are first and foremost responsible for making sure that their submission complies with journal regulations I also like to stress that it is the responsibility of the editorial (support) team at the publisher to make sure that any submission complies with journal regulations and guidelines before sending manuscripts out to reviewers. When I get a request to review a paper I expect that checks against journal regulations such as reference style, but more importantly: open science requirements, have been conducted by the editorial (support) team.
I ask the editorial team of HESS to make agreements on procedures with the editorial support staff to make sure that in the future these checks are done before a manuscript gets send out for review.
Looking forward to reviewing a new manuscript by the authors that does comply with Copernicus Data Policy.
Rolf Hut
PS after spotting the above mentioned issues in the manuscript I did not execute a further detailed review. I did read it through twice and if the authors decide to amend the above mentioned issues I do have some further first suggestions to improve the manuscript: in its current form the manuscript tries to do too many things: introduce the SMASH model, introduce a new data assimilation method into hydrology and introduce a new platform to do hydrological research. I suggest to turn this into separate manuscripts to allow for focus in each one of them. The introduction of the model maybe fits better in GMD than in HESS, the study of the improvement in predictions when using the new data assimilation scheme would be (in my opinion) suited for HESS.
Citation: https://doi.org/10.5194/egusphere2022506RC1 
AC1: 'Reply on RC1', PierreAndré Garambois, 25 Oct 2022
Dear colleague,
We will add a data policy section.
The code is available upon request and will be open online in a couple of month, we can send it to you right now for the review. Nevertheless we believe that this research article, on the questions mentioned in introduction is well understandable with the proposed material and fits with the scope of HESS. It focuses on the evaluation of a spatially distributed variational calibration algorithm applied to a parsimonious distributed model over a large sample. This will be clarified in the manuscript.
Best regards,
P.A. Garambois on behalf of the authors
Citation: https://doi.org/10.5194/egusphere2022506AC1

RC2: 'Comment on egusphere2022506', Anonymous Referee #2, 24 Nov 2022
The article presents the application of the SMASH distributed hydrological model on a large set of French catchments. The authors evaluate the performance of two model versions, the original and a modified one, and conclude that the modified version is more efficient. The two models outperform a lumped model, which is also applied to the same catchments.
I have several major concerns about this article. I found that the way the modifications were introduced in the model is overall not justified. There are also several results difficult to understand, typically on the interception store.
I suggest major revision before the article could be reconsidered for publication.
Major comments
 Section 2.1: The authors introduced three modifications taken from model versions developed by other authors and tested them all together without explaining why they are all necessary individually. Therefore it is very difficult to understand what brings some improvement in model performance. Were all these modifications actually necessary?
 L134: The authors choose to introduce two routing stores with the same mathematical formulation. In the work they cite by Pushpalatha et al., there is one powerlaw store and one exponential store. Could the authors explain why they made a different choice here?
 L199 and 219: Nothing is said about the GR5H model tested here nor about the calibration algorithm used. It is difficult to know which version was used and how it was implemented. Furthermore, the purpose of including this model in the article is not clear. It is stated twice (lines 255 and 262) that GR5H is used as a reference but not for benchmarking purposes (actually I do not understand the difference between the two in the context of this article). If the objective is to show that SMASH is better than a model "taken from the shelf", I wonder why the choice was made to take GR5H. It is very similar to the SMASH unit brick (model structure), as mentioned in the article, but not really the same. Therefore it is difficult to conclude anything from the comparison proposed here. Does the better performance of SMASH comes from the fact that it is distributed or from the fact that there are differences in the structure of the unit brick? I found it would be more useful to test the original and modified versions of the SMASH model structure in a lumped mode to answer the previous question.
 Table 2: I do not understand why the upper bound for the interception store is so high. It is physically a nonsense to have an interception store of 100 mm. An interception store capacity is typically less than 10 mm. If the capacity is that high in the calibration process, it means that this store does not only play the role of an interception store.
 Table 2: I also did not understand how the calibration process ensures that the "fast" (r) and "slow" (l) tanks actually play this role, i.e. that ctr is lower than ctl. If there is no explicit constraint in the optimisation process, there must be catchments where this is not the case, depending on the proportion of the base flow. It may end up in the fast routing store simulating slower recession than the slow routing store. Furthermore, the fact that both stores have the same formulation must generate equivalent parameter sets for some basins that would only need one reservoir, and thus generate indetermination in the response surface.
 Section 4.4: The comments on the greater variability (greater standard deviation) of some parameters are in my view biased by the fact that the parameters vary over very different orders of magnitude. It might have been more appropriate to compare the coefficients of variation to assess which parameters are indeed the most variable relative to their mean value. Some of the comments (L350352, L357359, L360361) remain rather general, and hypotheses that are difficult to verify as they stand. I therefore question the addedvalue of these comments.
 Tables 4 and 5: I was quite surprised by the average values taken by the capacity of the interception reservoir here (15 to 25 mm sometimes). One could expect a capacity ten times lower. Clearly, the role taken by this reservoir goes beyond a simple interception function, with a possible interaction on the water balance function (exchange in particular) or a smoothing role on rainfall inputs going beyond the interception process. In Ficchi's work cited by the authors, this reservoir had capacities of a few millimetres on average. How can these differences be interpreted? Furthermore, the stability of the average values of the model parameters does not mean that the values are stable when comparing periods basin by basin. Have scatterplots been drawn to check that the apparent stability of the averages is confirmed when looking at the basins individually? Last, for version D, have spatial analyses of the stability of parameter fields been carried out? There may be equivalent fields in terms of mean or variability, but with very different spatial configurations.
Other comments
 Asbtract: It could probably be reduced. Some introductory sentences sound very general and some details about the results do not seem essential.
 General: most of the table and figure captions are not detailed enough to fully understand what is shown. This should be improved.
 L5 and 7: VAD is defined twice.
 L20: Unclear what “Uniform basins” means here.
 L46: please explain what semilumped means
 L70: “over a large catchment sample”
 L73: “the initial study by”
 L8284: It is unclear why the sensitivity analysis is performed.
 L91: “a uniform calibration”
 L108: Here again there is no justification why this celltocell routing was chosen. There are many options possible. Furthermore, the authors never discuss the possible interactions between the incell routing and the celltocell routing. There are probably many cases where one compensates for the other. The overall stability of all these parameters between periods may give some answers.
 L109: “described in Fig. 1”
 L158 and Eq. A20: The NSE is defined by 1 minus the ratio of quadratic error and variance. The authors should stick to this definition to avoid confusion.
 L180: “continental France” (?)
 L199: The GR5H model is mentioned here without having been presented before.
 L204: QM/PM min/max are not defined. Is it the min/max over the twelve months?
 L208: I understand from this sentence that there was no criterion applied to remove basins subject to artificial influences such as dams. Is it actually the case? How could the model simulate artificial behaviours?
 Figure 3: Add basin boundaries and the main river network for better readability. It is only mentioned in the legend of this figure that there are two subsamples (upstream and downstream). This should be mentioned in the text.
 L226229: Is the initialization period of one year sufficient for basins where groundwater is dominant? Often it is not the case for such basins Furthermore it is not clear whether this initialization period was also used in validation.
 L237: “in calibration, i.e. considered as ungauged” (?)
 L254255, L261262: Again, I do not understand these sentences (see comment above on the use of GR5H in the article)
 L285288: I do not understand why this assumption is made without being verified. It should be possible to check whether there is a link, in the sample of basins, between the hydroclimatic differences observed between the two subperiods and the evolution of model performance.
 L300: “each flood event”
 L302: "Sutcliffe”
 L318321: How significant is this improvement in S6 over S3? Give some quantitative evidence of this result.
 Tables 4 and 5: Put cr and ml in the same order as Table 2.
 Table 4: Do the ctr and ctl values were compared? The mean values seem very close on some catchments, which may indicate that a single store would be sufficient and that the extra complexity by using two stores is not justified and may even create identification issues. Was this investigated?
 Lines 395397: Which fixed value would this be in this case? I found this conclusion strange. It should be further investigated in light of the comments above on the odd values of the interception store capacity.
 Lines 437: "French".
 Appendix A6: The formula for NSE and KGE used to illustrate performance is "1 " the formulas given here.
Citation: https://doi.org/10.5194/egusphere2022506RC2 
AC2: 'Reply on RC2', PierreAndré Garambois, 19 Dec 2022
We thank the reviewer for his detailed review and helpful comments on our work. We propose a detailed answer below including propositions of revisions of our study with presentation of additional results, analysis and enriched discussions and perspectives. This should help the reader to better understand the choices made to reach the forward model structure with six parameters which performances are illustrated over a large catchment sample especially for spatially distributed calibration with a variational data assimilation algorithm, as well as to clarify the results on the interception store. Overall we believe that reviewer’s comments along with the proposed revisions should enable to clarify and enrich our work.
"Major comments
Section 2.1: The authors introduced three modifications taken from model versions developed by other authors and tested them all together without explaining why they are all necessary individually. Therefore it is very difficult to understand what brings some improvement in model performance. Were all these modifications actually necessary?"
Thank you for this question that will be answered through the presentation of additional results and analysis on the effect of each operator taken from lumped models of the literature and combined to reach the 6 parameters structure of the studied distributed model. This combination of operators in the distributed model has been made in the spirit and following the work done on the lumped parsimonious GR models. Our modeling choices result from several numerical experiments that were not shown in the first version of the article for sake of brevity. Nevertheless, in the revised version, we propose to clarify (§4.1 and Fig. 1 will be modified) the effect of each additional model component by providing a comparison of performances obtained with intermediate model structures of increasing complexity starting from the 3 parameters structures and adding (1) an exchange term (ml param) to the transfer store (ctr), (2) another transfer store (ctl). This will help to clarify the gain of performances due to model structure modifications. Note that the role/constrain of interception store will also be analyzed as proposed in answers to next questions.
"L134: The authors choose to introduce two routing stores with the same mathematical formulation. In the work they cite by Pushpalatha et al., there is one powerlaw store and one exponential store. Could the authors explain why they made a different choice here?"
Adding a second transfer store with the same equation led to performances improvement as it will be more clearly shown and analyzed with the proposed comparison of intermediate structures of increasing complexity proposed to answer the above question. Note that in structure S6, an exchange function is applied to one of the two transfer reservoirs which makes the two “transfer subbranches” behaviors different. This choice leads to good performances and coherent model behavior which will be better analyzed with the proposed revisions. Testing other variants/operators for transfer/routing is an interesting and large topic voluntarily left for further research.
"L199 and 219: Nothing is said about the GR5H model tested here nor about the calibration algorithm used. It is difficult to know which version was used and how it was implemented. Furthermore, the purpose of including this model in the article is not clear. It is stated twice (lines 255 and 262) that GR5H is used as a reference but not for benchmarking purposes (actually I do not understand the difference between the two in the context of this article). If the objective is to show that SMASH is better than a model "taken from the shelf", I wonder why the choice was made to take GR5H. It is very similar to the SMASH unit brick (model structure), as mentioned in the article, but not really the same. Therefore it is difficult to conclude anything from the comparison proposed here. Does the better performance of SMASH comes from the fact that it is distributed or from the fact that there are differences in the structure of the unit brick? I found it would be more useful to test the original and modified versions of the SMASH model structure in a lumped mode to answer the previous question."
We agree with your comment and we decided to remove GR5H from this study since the focus is spatially distributed hydrological modeling and calibration with a variational data assimilation algorithm. As suggested by the reviewer and proposed in the above answers, we will put more emphasis on the comparison of original, intermediate and modified smash model structures.
"Table 2: I do not understand why the upper bound for the interception store is so high. It is physically a nonsense to have an interception store of 100 mm. An interception store capacity is typically less than 10 mm. If the capacity is that high in the calibration process, it means that this store does not only play the role of an interception store."
Thank you for this remark that will be answered with additional results and analysis. Note that we have chosen relatively large bounds, without searching for physical meaning with this parsimonious conceptual model structure, to study its performances with variational distributed calibration over this large sample of French catchments. The analysis is performed both in terms of global parametric sensitivity analysis and spatially distributed optimization results. High performances were obtained in calibration/validation along with coherent functioning points and marked parametric sensitivities as analyzed over the whole set and by catchment groups. Note that the median calibrated interception parameter for each catchment group over the whole dataset ranges between 14 and 26mm. Note also that, as already analyzed in the article, a low sensitivity to interception capacity is found. Following your comments we propose to add in discussion some new results and analysis over the whole sample with smaller interception capacity which could be determined for example with a flux matching method (Ficchi et al. 2017, https://doi.org/10.1016/j.jhydrol.2019.05.084).
"Table 2: I also did not understand how the calibration process ensures that the "fast" (r) and "slow" (l) tanks actually play this role, i.e. that ctr is lower than ctl. If there is no explicit constraint in the optimization process, there must be catchments where this is not the case, depending on the proportion of the base flow. It may end up in the fast routing store simulating slower recession than the slow routing store. Furthermore, the fact that both stores have the same formulation must generate equivalent parameter sets for some basins that would only need one reservoir, and thus generate indetermination in the response surface."
We applied different calibration bounds for the the two transfer reservoirs of this conceptual model for which no physical meaning is sought. We agree with you and we will correct the typo in appendix: “fast” and “slow” reservoir that will be changed into “first” and “second” reservoir as already done in the main text. The bounds applied to their capacity in the calibration process are already given in the manuscript as well as information on calibrated parameters. We only let the potential for one reservoir to reach a higher capacity without other constrain in the optimization. Good performances and coherent transfer values are obtained as already analyzed in the manuscript. You are right, for few cases \overbar{ctl}<\overbar{ctr}, which will be clarified in the text of the revised article.
Regarding the response surface, a global sensitivity analysis has already been presented over the whole sample as well as calibrated parameter values that depict the sensitivities and functioning points for each model parameter including the respective transfer reservoirs. Potential compensations between parameters may arise as it is generally the case in hydrological modeling, and the computation of higher order sensitivities is an interesting and full research topic left for further research. Nevertheless this will be discussed in the revised manuscript.
Note that the response surface of the model is not an issue for the calibration algorithm that converged very efficiently for each high dimensional optimization problem solved by catchmentperiod. This is demonstrated by the high performances obtained in calibrationvalidation over the large sample, and especially by the improvements found with distributed calibration. This will be better discussed and reinforced thanks to the new results and analysis that will be proposed in the revised version."Section 4.4: The comments on the greater variability (greater standard deviation) of some parameters are in my view biased by the fact that the parameters vary over very different orders of magnitude. It might have been more appropriate to compare the coefficients of variation to assess which parameters are indeed the most variable relative to their mean value. Some of the comments (L350352, L357359, L360361) remain rather general, and hypotheses that are difficult to verify as they stand. I therefore question the addedvalue of these comments."
Thank you for this suggestion, we will have a look to coefficients of variation of parameters spatial variability. Analysis will be improved and the general statements pointed will be deepened.
"Tables 4 and 5: I was quite surprised by the average values taken by the capacity of the interception reservoir here (15 to 25 mm sometimes). One could expect a capacity ten times lower. Clearly, the role taken by this reservoir goes beyond a simple interception function, with a possible interaction on the water balance function (exchange in particular) or a smoothing role on rainfall inputs going beyond the interception process. In Ficchi's work cited by the authors, this reservoir had capacities of a few millimetres on average. How can these differences be interpreted? Furthermore, the stability of the average values of the model parameters does not mean that the values are stable when comparing periods basin by basin. Have scatterplots been drawn to check that the apparent stability of the averages is confirmed when looking at the basins individually? Last, for version D, have spatial analyses of the stability of parameter fields been carried out? There may be equivalent fields in terms of mean or variability, but with very different spatial configurations."
Thank you for these questions that will be used to deepen the analysis.
First, regarding interception capacity, again, following your comments, we propose to add in discussion some new results and analysis over the whole sample with smaller interception capacity determined for example with a flux matching method (Ficchi et al. 2017).
Next, regarding the stability of uniform and distributed calibrated parameters will be better analyzed and discussed. As already stated in the manuscript but this will be clarified, spatial constrain/regionalization (of conceptual models) is a difficult research question left for further work. Thank you for the suggestion of scatterplots that will be examined and possibly added to the article with complementary analysis and discussions.
"Other comments :" (...)
Thank you for your detailed comments that will all be seriously taken into account, following the proposed revisions above. This should help to improve this article.
Kind regards,
The authors
Citation: https://doi.org/10.5194/egusphere2022506AC2
Status: closed

RC1: 'unable to review scientific content of manuscript due to noncompliance with Copernicus data policy', Rolf Hut, 24 Oct 2022
The topic of the paper “Adjointbased spatially distributed calibration of a grid GRbased parsimonious hydrological model over 312 French catchments with SMASH platform” seemed interesting to me since the abstract mentioned use of a new data assimilation scheme (VDA) and a platform for running hydrological models with greater ease.
On reading the paper though, I believe that it does not comply with the Copenicus data policy and should therefore not be published in HESS (in its current form). The points in which I believe the paper does not comply with the Copernicus data policy include (but may not be limited to):
 The paper does not contain a ‘data availability’ section. This is a required section. Copernicus requires authors to state where the data used in / generated by their work can be obtained. Data should ideally be published in openly available repositories with a valid persistent identifier (preferably DOI).
 The paper does not include any link to the software generated and used in this work, whilst it heavily relies on it for its results. The authors introduce a platform for running the SMASH model integrated with data assimilation, but fail to provide that platform to the hydrological community. The only link I could find was a reference to a poster (JayAllemand et. al. 2022) which does not has a DOI and upon finding that poster (at https://hal.archivesouvertes.fr/hal03683657) I discovered that the software mentioned is hosted at the ISTREA gitlab page which is only accessable to ISTREA employees. (gitlab.irstea.fr). The software that contains the main point the authors want to make, and that is used to generate the hydrological analyses they present in the paper, should be openly available to the hydrological community in general and to reviewers of the paper in particular.
If I'm wrong and I misread the paper I deeply apologise and I owe the authors a round of drinks (at least). After reading the paper twice I have failed to locate either the data generated by the experiments described, or the model code or experiment code used to generate the results presented in the paper.
While the authors are first and foremost responsible for making sure that their submission complies with journal regulations I also like to stress that it is the responsibility of the editorial (support) team at the publisher to make sure that any submission complies with journal regulations and guidelines before sending manuscripts out to reviewers. When I get a request to review a paper I expect that checks against journal regulations such as reference style, but more importantly: open science requirements, have been conducted by the editorial (support) team.
I ask the editorial team of HESS to make agreements on procedures with the editorial support staff to make sure that in the future these checks are done before a manuscript gets send out for review.
Looking forward to reviewing a new manuscript by the authors that does comply with Copernicus Data Policy.
Rolf Hut
PS after spotting the above mentioned issues in the manuscript I did not execute a further detailed review. I did read it through twice and if the authors decide to amend the above mentioned issues I do have some further first suggestions to improve the manuscript: in its current form the manuscript tries to do too many things: introduce the SMASH model, introduce a new data assimilation method into hydrology and introduce a new platform to do hydrological research. I suggest to turn this into separate manuscripts to allow for focus in each one of them. The introduction of the model maybe fits better in GMD than in HESS, the study of the improvement in predictions when using the new data assimilation scheme would be (in my opinion) suited for HESS.
Citation: https://doi.org/10.5194/egusphere2022506RC1 
AC1: 'Reply on RC1', PierreAndré Garambois, 25 Oct 2022
Dear colleague,
We will add a data policy section.
The code is available upon request and will be open online in a couple of month, we can send it to you right now for the review. Nevertheless we believe that this research article, on the questions mentioned in introduction is well understandable with the proposed material and fits with the scope of HESS. It focuses on the evaluation of a spatially distributed variational calibration algorithm applied to a parsimonious distributed model over a large sample. This will be clarified in the manuscript.
Best regards,
P.A. Garambois on behalf of the authors
Citation: https://doi.org/10.5194/egusphere2022506AC1

RC2: 'Comment on egusphere2022506', Anonymous Referee #2, 24 Nov 2022
The article presents the application of the SMASH distributed hydrological model on a large set of French catchments. The authors evaluate the performance of two model versions, the original and a modified one, and conclude that the modified version is more efficient. The two models outperform a lumped model, which is also applied to the same catchments.
I have several major concerns about this article. I found that the way the modifications were introduced in the model is overall not justified. There are also several results difficult to understand, typically on the interception store.
I suggest major revision before the article could be reconsidered for publication.
Major comments
 Section 2.1: The authors introduced three modifications taken from model versions developed by other authors and tested them all together without explaining why they are all necessary individually. Therefore it is very difficult to understand what brings some improvement in model performance. Were all these modifications actually necessary?
 L134: The authors choose to introduce two routing stores with the same mathematical formulation. In the work they cite by Pushpalatha et al., there is one powerlaw store and one exponential store. Could the authors explain why they made a different choice here?
 L199 and 219: Nothing is said about the GR5H model tested here nor about the calibration algorithm used. It is difficult to know which version was used and how it was implemented. Furthermore, the purpose of including this model in the article is not clear. It is stated twice (lines 255 and 262) that GR5H is used as a reference but not for benchmarking purposes (actually I do not understand the difference between the two in the context of this article). If the objective is to show that SMASH is better than a model "taken from the shelf", I wonder why the choice was made to take GR5H. It is very similar to the SMASH unit brick (model structure), as mentioned in the article, but not really the same. Therefore it is difficult to conclude anything from the comparison proposed here. Does the better performance of SMASH comes from the fact that it is distributed or from the fact that there are differences in the structure of the unit brick? I found it would be more useful to test the original and modified versions of the SMASH model structure in a lumped mode to answer the previous question.
 Table 2: I do not understand why the upper bound for the interception store is so high. It is physically a nonsense to have an interception store of 100 mm. An interception store capacity is typically less than 10 mm. If the capacity is that high in the calibration process, it means that this store does not only play the role of an interception store.
 Table 2: I also did not understand how the calibration process ensures that the "fast" (r) and "slow" (l) tanks actually play this role, i.e. that ctr is lower than ctl. If there is no explicit constraint in the optimisation process, there must be catchments where this is not the case, depending on the proportion of the base flow. It may end up in the fast routing store simulating slower recession than the slow routing store. Furthermore, the fact that both stores have the same formulation must generate equivalent parameter sets for some basins that would only need one reservoir, and thus generate indetermination in the response surface.
 Section 4.4: The comments on the greater variability (greater standard deviation) of some parameters are in my view biased by the fact that the parameters vary over very different orders of magnitude. It might have been more appropriate to compare the coefficients of variation to assess which parameters are indeed the most variable relative to their mean value. Some of the comments (L350352, L357359, L360361) remain rather general, and hypotheses that are difficult to verify as they stand. I therefore question the addedvalue of these comments.
 Tables 4 and 5: I was quite surprised by the average values taken by the capacity of the interception reservoir here (15 to 25 mm sometimes). One could expect a capacity ten times lower. Clearly, the role taken by this reservoir goes beyond a simple interception function, with a possible interaction on the water balance function (exchange in particular) or a smoothing role on rainfall inputs going beyond the interception process. In Ficchi's work cited by the authors, this reservoir had capacities of a few millimetres on average. How can these differences be interpreted? Furthermore, the stability of the average values of the model parameters does not mean that the values are stable when comparing periods basin by basin. Have scatterplots been drawn to check that the apparent stability of the averages is confirmed when looking at the basins individually? Last, for version D, have spatial analyses of the stability of parameter fields been carried out? There may be equivalent fields in terms of mean or variability, but with very different spatial configurations.
Other comments
 Asbtract: It could probably be reduced. Some introductory sentences sound very general and some details about the results do not seem essential.
 General: most of the table and figure captions are not detailed enough to fully understand what is shown. This should be improved.
 L5 and 7: VAD is defined twice.
 L20: Unclear what “Uniform basins” means here.
 L46: please explain what semilumped means
 L70: “over a large catchment sample”
 L73: “the initial study by”
 L8284: It is unclear why the sensitivity analysis is performed.
 L91: “a uniform calibration”
 L108: Here again there is no justification why this celltocell routing was chosen. There are many options possible. Furthermore, the authors never discuss the possible interactions between the incell routing and the celltocell routing. There are probably many cases where one compensates for the other. The overall stability of all these parameters between periods may give some answers.
 L109: “described in Fig. 1”
 L158 and Eq. A20: The NSE is defined by 1 minus the ratio of quadratic error and variance. The authors should stick to this definition to avoid confusion.
 L180: “continental France” (?)
 L199: The GR5H model is mentioned here without having been presented before.
 L204: QM/PM min/max are not defined. Is it the min/max over the twelve months?
 L208: I understand from this sentence that there was no criterion applied to remove basins subject to artificial influences such as dams. Is it actually the case? How could the model simulate artificial behaviours?
 Figure 3: Add basin boundaries and the main river network for better readability. It is only mentioned in the legend of this figure that there are two subsamples (upstream and downstream). This should be mentioned in the text.
 L226229: Is the initialization period of one year sufficient for basins where groundwater is dominant? Often it is not the case for such basins Furthermore it is not clear whether this initialization period was also used in validation.
 L237: “in calibration, i.e. considered as ungauged” (?)
 L254255, L261262: Again, I do not understand these sentences (see comment above on the use of GR5H in the article)
 L285288: I do not understand why this assumption is made without being verified. It should be possible to check whether there is a link, in the sample of basins, between the hydroclimatic differences observed between the two subperiods and the evolution of model performance.
 L300: “each flood event”
 L302: "Sutcliffe”
 L318321: How significant is this improvement in S6 over S3? Give some quantitative evidence of this result.
 Tables 4 and 5: Put cr and ml in the same order as Table 2.
 Table 4: Do the ctr and ctl values were compared? The mean values seem very close on some catchments, which may indicate that a single store would be sufficient and that the extra complexity by using two stores is not justified and may even create identification issues. Was this investigated?
 Lines 395397: Which fixed value would this be in this case? I found this conclusion strange. It should be further investigated in light of the comments above on the odd values of the interception store capacity.
 Lines 437: "French".
 Appendix A6: The formula for NSE and KGE used to illustrate performance is "1 " the formulas given here.
Citation: https://doi.org/10.5194/egusphere2022506RC2 
AC2: 'Reply on RC2', PierreAndré Garambois, 19 Dec 2022
We thank the reviewer for his detailed review and helpful comments on our work. We propose a detailed answer below including propositions of revisions of our study with presentation of additional results, analysis and enriched discussions and perspectives. This should help the reader to better understand the choices made to reach the forward model structure with six parameters which performances are illustrated over a large catchment sample especially for spatially distributed calibration with a variational data assimilation algorithm, as well as to clarify the results on the interception store. Overall we believe that reviewer’s comments along with the proposed revisions should enable to clarify and enrich our work.
"Major comments
Section 2.1: The authors introduced three modifications taken from model versions developed by other authors and tested them all together without explaining why they are all necessary individually. Therefore it is very difficult to understand what brings some improvement in model performance. Were all these modifications actually necessary?"
Thank you for this question that will be answered through the presentation of additional results and analysis on the effect of each operator taken from lumped models of the literature and combined to reach the 6 parameters structure of the studied distributed model. This combination of operators in the distributed model has been made in the spirit and following the work done on the lumped parsimonious GR models. Our modeling choices result from several numerical experiments that were not shown in the first version of the article for sake of brevity. Nevertheless, in the revised version, we propose to clarify (§4.1 and Fig. 1 will be modified) the effect of each additional model component by providing a comparison of performances obtained with intermediate model structures of increasing complexity starting from the 3 parameters structures and adding (1) an exchange term (ml param) to the transfer store (ctr), (2) another transfer store (ctl). This will help to clarify the gain of performances due to model structure modifications. Note that the role/constrain of interception store will also be analyzed as proposed in answers to next questions.
"L134: The authors choose to introduce two routing stores with the same mathematical formulation. In the work they cite by Pushpalatha et al., there is one powerlaw store and one exponential store. Could the authors explain why they made a different choice here?"
Adding a second transfer store with the same equation led to performances improvement as it will be more clearly shown and analyzed with the proposed comparison of intermediate structures of increasing complexity proposed to answer the above question. Note that in structure S6, an exchange function is applied to one of the two transfer reservoirs which makes the two “transfer subbranches” behaviors different. This choice leads to good performances and coherent model behavior which will be better analyzed with the proposed revisions. Testing other variants/operators for transfer/routing is an interesting and large topic voluntarily left for further research.
"L199 and 219: Nothing is said about the GR5H model tested here nor about the calibration algorithm used. It is difficult to know which version was used and how it was implemented. Furthermore, the purpose of including this model in the article is not clear. It is stated twice (lines 255 and 262) that GR5H is used as a reference but not for benchmarking purposes (actually I do not understand the difference between the two in the context of this article). If the objective is to show that SMASH is better than a model "taken from the shelf", I wonder why the choice was made to take GR5H. It is very similar to the SMASH unit brick (model structure), as mentioned in the article, but not really the same. Therefore it is difficult to conclude anything from the comparison proposed here. Does the better performance of SMASH comes from the fact that it is distributed or from the fact that there are differences in the structure of the unit brick? I found it would be more useful to test the original and modified versions of the SMASH model structure in a lumped mode to answer the previous question."
We agree with your comment and we decided to remove GR5H from this study since the focus is spatially distributed hydrological modeling and calibration with a variational data assimilation algorithm. As suggested by the reviewer and proposed in the above answers, we will put more emphasis on the comparison of original, intermediate and modified smash model structures.
"Table 2: I do not understand why the upper bound for the interception store is so high. It is physically a nonsense to have an interception store of 100 mm. An interception store capacity is typically less than 10 mm. If the capacity is that high in the calibration process, it means that this store does not only play the role of an interception store."
Thank you for this remark that will be answered with additional results and analysis. Note that we have chosen relatively large bounds, without searching for physical meaning with this parsimonious conceptual model structure, to study its performances with variational distributed calibration over this large sample of French catchments. The analysis is performed both in terms of global parametric sensitivity analysis and spatially distributed optimization results. High performances were obtained in calibration/validation along with coherent functioning points and marked parametric sensitivities as analyzed over the whole set and by catchment groups. Note that the median calibrated interception parameter for each catchment group over the whole dataset ranges between 14 and 26mm. Note also that, as already analyzed in the article, a low sensitivity to interception capacity is found. Following your comments we propose to add in discussion some new results and analysis over the whole sample with smaller interception capacity which could be determined for example with a flux matching method (Ficchi et al. 2017, https://doi.org/10.1016/j.jhydrol.2019.05.084).
"Table 2: I also did not understand how the calibration process ensures that the "fast" (r) and "slow" (l) tanks actually play this role, i.e. that ctr is lower than ctl. If there is no explicit constraint in the optimization process, there must be catchments where this is not the case, depending on the proportion of the base flow. It may end up in the fast routing store simulating slower recession than the slow routing store. Furthermore, the fact that both stores have the same formulation must generate equivalent parameter sets for some basins that would only need one reservoir, and thus generate indetermination in the response surface."
We applied different calibration bounds for the the two transfer reservoirs of this conceptual model for which no physical meaning is sought. We agree with you and we will correct the typo in appendix: “fast” and “slow” reservoir that will be changed into “first” and “second” reservoir as already done in the main text. The bounds applied to their capacity in the calibration process are already given in the manuscript as well as information on calibrated parameters. We only let the potential for one reservoir to reach a higher capacity without other constrain in the optimization. Good performances and coherent transfer values are obtained as already analyzed in the manuscript. You are right, for few cases \overbar{ctl}<\overbar{ctr}, which will be clarified in the text of the revised article.
Regarding the response surface, a global sensitivity analysis has already been presented over the whole sample as well as calibrated parameter values that depict the sensitivities and functioning points for each model parameter including the respective transfer reservoirs. Potential compensations between parameters may arise as it is generally the case in hydrological modeling, and the computation of higher order sensitivities is an interesting and full research topic left for further research. Nevertheless this will be discussed in the revised manuscript.
Note that the response surface of the model is not an issue for the calibration algorithm that converged very efficiently for each high dimensional optimization problem solved by catchmentperiod. This is demonstrated by the high performances obtained in calibrationvalidation over the large sample, and especially by the improvements found with distributed calibration. This will be better discussed and reinforced thanks to the new results and analysis that will be proposed in the revised version."Section 4.4: The comments on the greater variability (greater standard deviation) of some parameters are in my view biased by the fact that the parameters vary over very different orders of magnitude. It might have been more appropriate to compare the coefficients of variation to assess which parameters are indeed the most variable relative to their mean value. Some of the comments (L350352, L357359, L360361) remain rather general, and hypotheses that are difficult to verify as they stand. I therefore question the addedvalue of these comments."
Thank you for this suggestion, we will have a look to coefficients of variation of parameters spatial variability. Analysis will be improved and the general statements pointed will be deepened.
"Tables 4 and 5: I was quite surprised by the average values taken by the capacity of the interception reservoir here (15 to 25 mm sometimes). One could expect a capacity ten times lower. Clearly, the role taken by this reservoir goes beyond a simple interception function, with a possible interaction on the water balance function (exchange in particular) or a smoothing role on rainfall inputs going beyond the interception process. In Ficchi's work cited by the authors, this reservoir had capacities of a few millimetres on average. How can these differences be interpreted? Furthermore, the stability of the average values of the model parameters does not mean that the values are stable when comparing periods basin by basin. Have scatterplots been drawn to check that the apparent stability of the averages is confirmed when looking at the basins individually? Last, for version D, have spatial analyses of the stability of parameter fields been carried out? There may be equivalent fields in terms of mean or variability, but with very different spatial configurations."
Thank you for these questions that will be used to deepen the analysis.
First, regarding interception capacity, again, following your comments, we propose to add in discussion some new results and analysis over the whole sample with smaller interception capacity determined for example with a flux matching method (Ficchi et al. 2017).
Next, regarding the stability of uniform and distributed calibrated parameters will be better analyzed and discussed. As already stated in the manuscript but this will be clarified, spatial constrain/regionalization (of conceptual models) is a difficult research question left for further work. Thank you for the suggestion of scatterplots that will be examined and possibly added to the article with complementary analysis and discussions.
"Other comments :" (...)
Thank you for your detailed comments that will all be seriously taken into account, following the proposed revisions above. This should help to improve this article.
Kind regards,
The authors
Citation: https://doi.org/10.5194/egusphere2022506AC2
François Colleoni et al.
François Colleoni et al.
Viewed
HTML  XML  Total  BibTeX  EndNote  

469  187  24  680  15  10 
 HTML: 469
 PDF: 187
 XML: 24
 Total: 680
 BibTeX: 15
 EndNote: 10
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1