The benefits and trade-offs of multi-variable calibration of WGHM in the Ganges and Brahmaputra basins

Hasan, H. M. Mehedi; Döll, Petra; Hosseini-Moghari, Seyed-Mohammad; Papa, Fabrice; Güntner, Andreas

doi:https://doi.org/10.5194/egusphere-2023-2324

Preprints

https://doi.org/10.5194/egusphere-2023-2324

Preprints

07 Nov 2023

| 07 Nov 2023

The benefits and trade-offs of multi-variable calibration of WGHM in the Ganges and Brahmaputra basins

H. M. Mehedi Hasan, Petra Döll, Seyed-Mohammad Hosseini-Moghari, Fabrice Papa, and Andreas Güntner

Abstract. While global hydrological models (GHMs) are affected by large uncertainties regarding model structure, forcing and calibration data, and parameters, observations of model output variables are rarely used to calibrate the model. Pareto dominance-based multi-objective calibration, often referred to as Pareto-Optimal Calibration (POC), may serve to estimate model parameter sets and analyse trade-offs among different objectives during calibration. Within a POC framework, we determined optimal parameter sets for the WaterGAP Global Hydrology Model (WGHM) in the two largest basins of the Indian subcontinent—the Ganges and the Brahmaputra, collectively supporting nearly 580 million inhabitants. The selected model parameters, determined through a multi-variable multi-signature sensitivity analysis, were estimated using up to four types of observations: in-situ streamflow (Q), GRACE and GRACE Follow-On total water storage anomalies (TWSA), LandFlux evapotranspiration (ET), and surface water storage anomalies (SWSA) derived from multi-satellite observations. While our sensitivity analysis assured that the model parameters that are most influential for the four variables were identified in a transparent and comprehensive way, the rather large number of calibration parameters, 10 for the Ganges and 16 for the Brahmaputra, had a negative impact on parameter identifiability during the calibration process. Calibration against observed Q resulted to be crucial for reasonable streamflow simulations, while additional calibration against TWSA was crucial for the Ganges basin and helpful for the Brahmaputra basin to obtain a reasonable simulation of both Q and T. Calibrating also against the other two observation types enhanced the overall model performance and enabled a more accurate representation of the water balance. We identified several trade-offs among the calibration objectives, with the nature of these trade-offs closely tied to the physiographic and hydrologic characteristics of the study basins. The trade-offs were particularly pronounced in the Ganges basin, in particular between Q and SWSA, as well as between Q and ET. When considering the observational uncertainty of the calibration data, model performance decreases in most cases. This indicates an overfitting to the singular observation time series by the calibration algorithm. We therefore propose a transparent algorithm to identify high-performing Pareto solutions under consideration of observational uncertainties of the calibration data. Recognizing these uncertainties, we anticipate that actual model performance may be lower in roughly 90 % of cases.

Received: 10 Oct 2023 – Discussion started: 07 Nov 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 3357 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (3357 KB)

Supplement (4779 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

30 Jan 2025

The benefits and trade-offs of multi-variable calibration of the WaterGAP global hydrological model (WGHM) in the Ganges and Brahmaputra basins

Howlader Mohammad Mehedi Hasan, Petra Döll, Seyed-Mohammad Hosseini-Moghari, Fabrice Papa, and Andreas Güntner

Hydrol. Earth Syst. Sci., 29, 567–596, https://doi.org/10.5194/hess-29-567-2025,https://doi.org/10.5194/hess-29-567-2025, 2025

Short summary

H. M. Mehedi Hasan, Petra Döll, Seyed-Mohammad Hosseini-Moghari, Fabrice Papa, and Andreas Güntner

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2324', Anonymous Referee #1, 02 Feb 2024

I enjoyed reading the manuscript.
My main concerns are; 1) the temporal only calibration of a distributed hydrologic model and 2) use of coarse meteo inputs while era5-land offers 0.1 inputs.

Other comments:

Section 3.3: More details on the SA should be provided. Morris is an elaborated SA method as compared to the one at a time local methods so that much more runs are required in Morris. How many runs were required for a 24 parameter model (Line 263).
Can Morris identify effects of parameter interactions on the sensitivities like in Sobols’ method? Why did you choose Morris instead of looking at Jacobian matrix in simple terms?

L402: 5 times run of first year? I couldn’t understand how? 1985-89 spin up run and one time should be enough to reach equilibrium, shouldn’t be?

Eq2: Why only NSE is used as performance metric? Why only temporal calibration is pursued for a distributed hydrologic model which can produce flux maps? How did you deal with unit differences from satellite AET (watt/m2) and model outputs at mm/day? The same may apply to Grace anomaly values and recharge output of the model.

NSE is a bias sensitive metric and it might be necessary to use bias insensitive spatial pattern metrics in the calibration.

Introduction misses recent works on satellite based evaluation and calibration of the distributed hydrologic models using actual ET. Also, trade offs in multi objective Pareto calibration of hydrologic models have been studied in the literature. Please update your literature review with studies from 2018 to Jan 2024 from top journals (HESS and WRR). Compare your results with them in the discussions.

Conclusion: Very different than conventional conclusion sections. Detailed results (numbers) should not be given here but just the conclusions drawn from the results should be provided in bullets. It is lengthy and not easy to follow. Research Questions are repeated and probably not necessary.
The reader needs the main messages from the study and not the repetition of the results.

Citation: https://doi.org/10.5194/egusphere-2023-2324-RC1
- AC1: 'Reply on RC1', HM Mehedi Hasan, 15 May 2024
  
  We thank you very much for your helpful comments and constructive suggestions for improving the manuscript. Below, each comment (indicated by “RC”) is followed by our answer (indicated by “AC”). Proposed new text in the revised manuscript is written in bold.
  RC: I enjoyed reading the manuscript.
  
  My main concerns are; 1) the temporal only calibration of a distributed hydrologic model and 2) use of coarse meteo inputs while era5-land offers 0.1 inputs.
  AC: 1) We agree that conducting a spatio-temporal calibration analysis would be the preferred approach from a conceptual point of view for a distributed model. However, such an approach would lead to a significantly expanded parameter space, which could render calibration impractical within a reasonable timeframe and acceptable accuracy range. Another major limitation is the insufficient capacity of observations to effectively constrain a large number of model parameters. As discussed in the introduction, most hydrological observations can effectively constrain only around 4 to 6 model parameters. While the hydrology modelling community is exploring methods to parameterize spatially distributed parameters for right reasons and we are also interested in conducting such a spatio-temporal calibration, our current study has its focus on basin-scale parameter calibration with more observables than usually performed.
  2) At present, the WaterGAP Global Hydrological Model (WGHM version 2.2e) can only operate at 0.5-degree resolution and thus with 0.5-degree meteorological inputs. We thus do not expect significant added value of using higher-resolution forcing data that need to be aggregated to 0.5-degree resolution. However, if the model resolution will change in future, the methods employed in this study could be adapted and applied.
  RC:
  
  Other comments:
  
  Section 3.3: More details on the SA should be provided. Morris is an elaborated SA method as compared to the one at a time local methods so that much more runs are required in Morris. How many runs were required for a 24 parameter model (Line 263).
  AC: We will provide the details of the Morris method in the Appendix (see below in one of our responses). We reported that out of 24 model parameters, we excluded two parameters in the SA – EP-NM and P-PM (in Line 263). These two parameters directly modify model forcing, i.e., precipitation and net radiation, leading to very high changes in most target variables, which suppresses the relative influence of the other parameters. Thus, 22 parameters were considered in the sensitivity analysis (mentioned in Lines 398-399). For the 22 parameters, we needed to evaluate 23,000 samples for each basin. The number of model runs required in Morris's method is calculated as r × (m + 1), where m is the number of parameters and r is the number of elementary effects to be used. Additional details will be provided in the method description in the Appendix. To specify the number of model runs in the sensitivity analysis, we will add a statement after Lines 398-399 and rephrase Lines 399-402.
  Instead of “For the sensitivity analysis, model simulations for the period 1990-2019 were used, with 1985-1989 taken as the model spin-up period and the first year of the spin-up was run 5 times to allow the water storages to fill up to an equilibrium state”, we will write “During the SA, a total of 23,000 samples were analysed for each of the river basins. Model simulations were conducted for the period 1990-2019, with the spin-up period from 1985 to 1989 and the initial year of the spin-up was run five times to allow water storages to reach an equilibrium state.”
  RC:
  
  Can Morris identify effects of parameter interactions on the sensitivities like in Sobols’ method? Why did you choose Morris instead of looking at Jacobian matrix in simple terms?
  AC: The Morris method calculates the partial derivatives (i.e., elementary effects) at various points in the parameter space, similar to those in the Jacobian matrix. The sensitivity index is determined by averaging these partial derivatives. This approach provides a more accurate estimation of a parameter's effect compared to local methods like the Jacobian matrix.
  Unlike the variance decomposition method of Sobol, Morris's method does not explicitly differentiate interaction terms. However, it does produce a variance term for the elementary effect that accounts for parameter interactions and the functional non-linearity of the model response. We utilized this variance term in the parameter selection process. We acknowledge the importance of providing comprehensive sensitivity analysis details and will include them in the appendix.
  We will add to the text that the Morris SA is a global sensitivity analysis. We will reformulate Lines 391-393 as follows: “The sensitivity index of the EET method averages out the local influences by taking samples from many locations in the parameter space, making it a global sensitivity analysis method (Pianosi et al., 2016)”.
  In additional, we will add the following text to section 3.3 after Line 387:
  
  “While the Morris method does not explicitly show interaction terms, it produces a variance term for the elementary effect that accounts for parameter interactions and the functional non-linearity of the model response. We computed the standard error of the sensitivity index from this variance term and used it for parameter selection (Algorithm 4 in Appendix A).”
  Please note that "Appendix A: Elementary Effect Test (EET) method of Morris (1991)" is provided as a supplemental document with this response letter.
  RC:
  
  L402: 5 times run of first year? I couldn’t understand how? 1985-89 spin up run and one time should be enough to reach equilibrium, shouldn’t be?
  AC: The WaterGAP model offers two methods to achieve equilibrium in the state variables by spin-up runs: (i) repeating the simulation of the first year for multiple times, and (ii) starting the model from a sufficiently early point in time. Since there is no general guideline available for the WGHM model that specifies how many spin-up years are required to reach equilibrium states of the storage variables, we utilized both available options.
  RC:
  
  Eq2: Why only NSE is used as performance metric? Why only temporal calibration is pursued for a distributed hydrologic model which can produce flux maps? How did you deal with unit differences from satellite AET (watt/m2) and model outputs at mm/day? The same may apply to Grace anomaly values and recharge output of the model.
  AC:
  
  We chose NSE as our performance metric because it is widely used in the field of hydrology, although there are significant concerns associated with its use. For example, NSE is sensitive to outliers, biases, and seasonality, and it uses the observed mean as the benchmark which may not be an adequate reference for most hydrologic variables (Schaefli and Gupta, 2007). Livneh and Lettenmaier (2012), however, noted that the NSE can be a useful indicator for inter-basin performance comparison since it normalizes the mean squared error (MSE) by the observed variance (σ_o²) of each basin. While we acknowledge that we have not addressed all the limitations of NSE, we considered it sufficient for our study as our primary objective was to evaluate the benefits and trade-offs of multi-variable calibration. Nonetheless, our methodology allows for the use of alternative performance metrics. Please note that we used other commonly used indices such as RMSE and correlation for model validation (Table 12), and in the supplementary materials, we have provided the Kling-Gupta Efficiency (KGE) and its three components for the overall compromise solutions (Tables S8, S9, and S10).
  The WaterGAP Global Hydrology Model (WGHM) indeed generates spatially distributed data of water fluxes. However, in our already complex study, we opted to utilize basin-scale observations for all variables to obtain more accurate estimates of observational errors. Consequently, we were unable to leverage the flux maps and explore the potential use of spatial pattern-based metrics.
  We used the LandFlux-EVAL multi-dataset synthesis ET product developed by Mueller et al. (2013), which reports ET values in units of mm/day. Similarly, for the GRACE anomaly, we incorporated basin-scale total water storage anomaly (TWSA) data processed in units of water height equivalent (mm). These GRACE TWSA data, including propagated errors, were prepared by the University of Bonn following the methodology outlined by Gerdener et al. (2020).
  RC:
  
  NSE is a bias sensitive metric and it might be necessary to use bias insensitive spatial pattern metrics in the calibration.
  AC: As mentioned in the response to earlier comments, we did not use spatial pattern-based performance metrics because we employed basin-scale monthly average observations. As we also want to improve the simulated water balance for the target area at the basin scale, a bias-sensitive performance metric seems to be reasonable choice.
  RC:
  
  Introduction misses recent works on satellite based evaluation and calibration of the distributed hydrologic models using actual ET. Also, trade offs in multi objective Pareto calibration of hydrologic models have been studied in the literature. Please update your literature review with studies from 2018 to Jan 2024 from top journals (HESS and WRR). Compare your results with them in the discussions.
  AC: We will update the literature review in the introduction and include the following statements after Lines 149-152 of the current manuscript.
  
  “Demiral et al. (2018) demonstrated successful enhancement of spatial pattern performance in a distributed hydrological model through multi-objective calibration using discharge and remote-sensing-based ET observations. Additionally, Demiral et al. (2024) provide a discussion on the trade-offs between temporal and spatial pattern calibration of the same distributed model using discharge and ET observations.”
  Also, we will insert the following text after Lines 162-165:
  “Hulsman et al. (2021) utilized in-situ discharge, satellite-based evapotranspiration (ET), and GRACE Total Water Storage Anomaly (TWSA) data to calibrate a process-based distributed hydrological model in a large semi-arid basin in Africa, aiming to incrementally improve the model's process representation.”
  RC:
  
  Conclusion: Very different than conventional conclusion sections. Detailed results (numbers) should not be given here but just the conclusions drawn from the results should be provided in bullets. It is lengthy and not easy to follow. Research Questions are repeated and probably not necessary.
  The reader needs the main messages from the study and not the repetition of the results.
  AC: We will reformulate the entire conclusion chapter based on the suggestions, presenting the main findings clearly so that readers can quickly grasp the key messages. In the new conclusion chapter, we will avoid repeating the research questions. Please refer to the suggested new conclusion chapter provided in one of our responses to the comments of the second anonymous referee.
  References
  
  Campolongo, F., Saltelli, A., and Cariboni, J.: From screening to quantitative sensitivity analysis. A unified approach, Comput Phys Commun, 182, 978–988, https://doi.org/https://doi.org/10.1016/j.cpc.2010.12.039, 2011.
  Demirel, M. C., Koch, J., Rakovec, O., Kumar, R., Mai, J., Müller, S., Thober, S., Samaniego, L., and Stisen, S.: Tradeoffs Between Temporal and Spatial Pattern Calibration and Their Impacts on Robustness and Transferability of Hydrologic Model Parameters to Ungauged Basins, Water Resources Research, 60, e2022WR034193, https://doi.org/https://doi.org/10.1029/2022WR034193, 2024.
  Demirel, M. C., Mai, J., Mendiguren, G., Koch, J., Samaniego, L., and Stisen, S.: Combining satellite data and appropriate objective functions for improved spatial pattern performance of a distributed hydrologic model, Hydrol. Earth Syst. Sci., 22, 1299–1315, https://doi.org/10.5194/hess-22-1299-2018, 2018.
  Gerdener, H., Engels, O., and Kusche, J.: A framework for deriving drought indicators from the Gravity Recovery and Climate Experiment (GRACE), Hydrol Earth Syst Sci, 24, 227–248, https://doi.org/10.5194/hess-24-227-2020, 2020.
  Hulsman, P., Savenije, H. H. G., and Hrachowitz, M.: Learning from satellite observations: increased understanding of catchment processes through stepwise model improvement, Hydrol. Earth Syst. Sci., 25, 957–982, https://doi.org/10.5194/hess-25-957-2021, 2021.
  Livneh, B. and Lettenmaier, D. P.: Multi-criteria parameter estimation for the Unified Land Model, Hydrol Earth Syst Sci, 16, 3029–3048, https://doi.org/10.5194/hess-16-3029-2012, 2012.
  Morris, M. D.: Factorial Sampling Plans for Preliminary Computational Experiments, Technometrics, 33, 161–174, https://doi.org/10.1080/00401706.1991.10484804, 1991.
  Mueller, B., Hirschi, M., Jimenez, C., Ciais, P., Dirmeyer, P. A., Dolman, A. J., Fisher, J. B., Jung, M., Ludwig, F., Maignan, F., Miralles, D. G., McCabe, M. F., Reichstein, M., Sheffield, J., Wang, K., Wood, E. F., Zhang, Y., and Seneviratne, S. I.: Benchmark products for land evapotranspiration: LandFlux-EVAL multi-data set synthesis, Hydrol. Earth Syst. Sci., 17, 3707–3720, https://doi.org/10.5194/hess-17-3707-2013, 2013.
  Pianosi, F., Sarrazin, F., and Wagener, T.: A Matlab toolbox for Global Sensitivity Analysis, Environmental Modelling & Software, 70, 80–85, https://doi.org/https://doi.org/10.1016/j.envsoft.2015.04.009, 2015.
  Schaefli, B. and Gupta, H. V.: Do Nash values have value?, Hydrological Processes, 21, 2075–2080, https://doi.org/10.1002/hyp.6825, 2007.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2324-AC1
RC2:
'Comment on egusphere-2023-2324', Anonymous Referee #2, 09 Feb 2024
This study presents a very thorough analysis of multi-variable calibrations considering different variables for a global hydrological model. The model was applied on two exemplary basins using in-situ and multi-satellite data. The authors did an excellent job in considering a variety of aspects that are important for modelling (e.g., required number of model runs, Pareto frontier, parameter sensitivity/importance). I also liked the selection of the authors of the data that was considered for the multi-variable calibration scenarios.
I am recommending minor revision because even though the modelling analyses seem thorough, the presentation for the reader could be improved. As a reader, it was rather difficult to extract the main aspects of this research as the text was not very concise. Additionally, a slightly unconventional structure regarding the results and discussion, and conclusion was used. I recommend shortening the manuscript or summarizing several points into one point to make it easier for future readers to follow it and to get the main points of the study. This is a very general recommendation, so I have picked out examples for you to explain what I mean.
Make your sentences more concise:

Lines 554-566 could for example be shortened into something like this: “Several parameters influence most or all response variables across various signatures. However, certain parameters affect only one or two signatures of the response variables. For instance, the Runoff Coefficient (SL-RC) significantly influences monthly means (MM) of ET in the Ganges basin and MTS of streamflow. Similarly, the snow melt temperature (SN-MT) is important for some cases in snow-dominated catchments in the Brahmaputra basin. These parameters may also affect other response variables and signatures to some extent but do not meet the defined threshold for calibration selection (Figure 4).”

You created ten (lovely) figures and twelve tables. This is nice in the sense of replicability. However, in my opinion, this is too much to present in the main text. Please consider moving some of the tables that are not essential for the main outcomes of this study to the supplementary. Table 4 could be deleted entirely as it does not contain additional information to Figure 4.

Regarding the structure of the paper: For me the content of the conclusion chapter would be (the main) part of the discussion. Overall, the discussion part had become a bit short by being combined with the results. I recommend renaming the current conclusion chapter and writing a more common conclusion chapter. This will help the reader a lot to understand what you did. Also shorten the current conclusion chapter and do not present the results again.

Minor comments or examples:
1 Introduction:
You switch between the terms multi-variable, multi-signature, and multi-objective throughout the manuscript. Please clearly define them in the introduction and use the terms consistently throughout the manuscript. E.g., line 595: “multi-objective” and in the title of the manuscript: “multi-variable”. I assume the same is meant in both cases.

Line 34: “T” not explained anywhere.

Lines 53-53: local or regional hydrological models

Lines 62-64: Abbreviations are placed inconsistently. Maybe do: For example, the Water - Global Assessment and Prognosis (WaterGAP) Global Hyrdological Model (WGHM, Müller...)..

Lines 70-75: Add a reference

Lines 88-91: Sentence is a bit difficult to follow. Please rephrase.

Lines: 138-144: Maybe mention earlier!?

Line 175: First time using the term “signature”. Mention and explain it before.

Lines 191-193: Delete.

2 Study area:
Table 1: Over which period are the means calculated (e.g., mean summer temperatures)?

How did you decide on the two basins? What are the differences between the two basins? What was the reasoning for not choosing two very different basins (regarding climate, geology, water abstractions etc.) to see the influence of these characteristics on the modelling scenarios?

Highlight these differences or similarities between the basins also in the interpretation and comparison of the modeling results of the two basins. Why were different parameters selected between the different basins (Figure 4)? E.g., lines 693-696: Why do you think that is the case? Is there any explanation for that?

3 Data and methods
Why not present available data in the chapter study area?

Line 343: Title of chapter 3.2.5 Water balance closure is a bit confusing as it’s a subchapter of 3.3 observations. Water balance closure is not an observation. Maybe call it storage change (which is also not exactly an observation, but sill might fit better)?

Lines 436-448: Move them after line 465.

Line 473: One comma too many

4 Results and discussion
Lines 554-566: Explanation of parameters could also be in the method section when parameters are being presented or is this meant as a discussion?

Line 583: Maybe add a short sentence why P-PM was added later or refer to the method section (lines 263-268). Why is EP-NM not added?

Line 648: Livneh and Lettenmaier (2012)

Table 6 and 7: Are those the NSE values of the calibration? I am not sure if I got that correctly, but due to data scarcity you could not calculate the NSE for all variables for the validation period (only for Q and TWSA). Is that correct?

Figure 6 and Figure 7 are a bit small.

The authors chose to have a combined results and discussion chapter. Sometimes an explanation as to why the results turned out the way they did was missing.

Lines 864-865: From the following text of that paragraph, I still did not understand why the Ganges and Brahmaputra basins had different identifiability regarding their parameter comparison. Could you please explain that more clearly?

Could you add an outlook at the end of this section considering the following points: What do you expect for other basins? Could this method be applied to other basins? What would be the challenges?

5 Conclusion
Lines 1031-1032: repetitive
Citation: https://doi.org/10.5194/egusphere-2023-2324-RC2
- AC2: 'Reply on RC2', HM Mehedi Hasan, 15 May 2024
  
  We thank you very much for your helpful comments and constructive suggestions for improving the manuscript. Below, each comment (indicated by “RC”) is followed by our answer (indicated by “AC”). Proposed new text in the revised manuscript is written in bold.
  RC: This study presents a very thorough analysis of multi-variable calibrations considering different variables for a global hydrological model. The model was applied on two exemplary basins using in-situ and multi-satellite data. The authors did an excellent job in considering a variety of aspects that are important for modelling (e.g., required number of model runs, Pareto frontier, parameter sensitivity/importance). I also liked the selection of the authors of the data that was considered for the multi-variable calibration scenarios.
  AC: Thank you for your positive and encouraging feedback on our manuscript.
  RC: I am recommending minor revision because even though the modelling analyses seem thorough, the presentation for the reader could be improved. As a reader, it was rather difficult to extract the main aspects of this research as the text was not very concise. Additionally, a slightly unconventional structure regarding the results and discussion, and conclusion was used. I recommend shortening the manuscript or summarizing several points into one point to make it easier for future readers to follow it and to get the main points of the study. This is a very general recommendation, so I have picked out examples for you to explain what I mean.
  AC: To concisely report the main outcomes of our work, we will reformulate the conclusion chapter. Please refer to the draft of the revised conclusion chapter provided in one of our responses. To shorten the manuscript, we will remove several tables (Tables 4, 9, and 10) from the main text. Additionally, we intend to make the following changes in the manuscript that contribute to a more concise writing and to shorten the manuscript.
  We will rewrite Lines 589-591 as follows:
  
  Instead of “In total, 4.8 million samples were evaluated during the study which approximately consumed over 3.2 million CPU hours of execution time for the WGHM model to assess those samples”, we will write “Overall, the study involved the evaluation of 4.8 million samples, requiring approximately 3.2 million CPU hours of model run time”.
  We will rewrite Line 594-601 as follows:
  
  Instead of “We obtained a good number of non-dominated solutions, i.e. Pareto-optimal parameter sets, in most of the multi-objective calibrations (Table 5). The cardinality (number of solutions) of the non-dominated solution set of a multi-objective calibration depends mainly on the shape of the Pareto frontier (PF) and the crowding distance of the members. The crowding distance is controlled in the Borg algorithm by the epsilon parameters which was 0.005 for all objectives. The greater solution cardinality in the Ganges basin experiments, when compared to those in the Brahmaputra basin, already indicates heightened trade-offs among the objectives, especially between NSE_Q and NSE_SWSA, as well as between NSE_SWSA and NSE_TWSA”,
  we will write “A high cardinality, i.e., a high number of solutions in the non-dominated Pareto solution set, was obtained in most multi-objective calibrations. The cardinality depends on the shape of the Pareto frontier (PF) and the allowed crowding distance, which was constant (0.005) for all objectives in all experiments. A wider PF resulting in high cardinality reflects a high trade-off between the objectives. The high cardinality observed in the Ganges experiments indicates marked trade-offs among objectives, especially between NSE_Q and NSE_SWSA, as well as between NSE_SWSA and NSE_TWSA”.
  We will delete the Lines 619-625: “This is a common occurrence in multi-objective optimization scenarios (Meyer Oliveira et al., 2021; Livneh and Lettenmaier, 2012). However, this comes at the cost of performance loss for the other variables that were not considered for calibration. The standard calibration of WGHM for mean annual streamflow (Müller Schmied et al., 2021) resulted in poorer results for all performance criteria in both the Ganges and Brahmaputra basins than the uncalibrated model for both basins. The mean NSE of all four objectives (µ_NSE,ALL) was used as a simple indicator of the overall performance of an experiment”.
  We will rewrite Lines 638-643 as follows:
  
  Instead of “Different from the Brahmaputra, calibration against only Q in the Ganges basin (both the calibration method presented here and the standard WGHM method) resulted in worse fits to all three other variables as compared to the uncalibrated model version. Multi-variable calibration, however, works best if streamflow observations are included because the average fit to all observations is, in the case of both 2-objective and 3-objective calibration cases, highest if NSE_Q is one of the calibration targets (Table 6 and Table 7)”,
  we will write “Different from the Brahmaputra, calibration against only Q in the Ganges basin resulted in worse fits to all three other variables as compared to the uncalibrated model version. Multi-variable calibration, however, works best if streamflow observations are included. Excluding NSE_Q as an objective in any calibration resulted in significantly poorer performance in streamflow simulation (Table 6 and Table 7)”.
  We will delete Lines 650-651: “However, in the majority of the calibration cases, the performance in streamflow simulation was very poor when the model was not constrained by streamflow observations.”
  We will delete Lines 699-711: “In contrast to the apparent trade-offs among objectives, there could be other non-traditional interactions among the objectives. For instance, in all replications of the calibration with only NSE_Q in the Ganges basin, we observed negative NSE_TWSA (not shown). But using only NSE_TWSA as the calibration objective, we consistently observed very high values in NSE_Q for all replications. Likewise, when NSE_ET is used as the only calibration objective, NSE_Q exhibited a significant decrease across replications in the two basins. However, when NSE_Q is employed as the only objective, reasonable performance in ET simulation is observed. Hence, the nature of the association between a pair of objectives, when attempting to describe the trade-offs, is neither unidirectional nor easily traceable through correlation analysis. Furthermore, there could be three-way trade-offs and so forth in a high-dimensional objective space, making them challenging to detect. While the association and causality of such relationships are indeed intriguing, examining the nature of trade-offs among the objectives is beyond the scope of the current study.”
  We will also delete Lines 735-736: “This indicates the high reliability of our findings regarding the trade-offs among objectives discussed in the earlier paragraphs”.
  RC:
  
  ● Make your sentences more concise: Lines 554-566 could for example be shortened into something like this: “Several parameters influence most or all response variables across various signatures. However, certain parameters affect only one or two signatures of the response variables. For instance, the Runoff Coefficient (SL-RC) significantly influences monthly means (MM) of ET in the Ganges basin and MTS of streamflow. Similarly, the snow melt temperature (SN-MT) is important for some cases in snow-dominated catchments in the Brahmaputra basin. These parameters may also affect other response variables and signatures to some extent but do not meet the defined threshold for calibration selection (Figure 4).”
  AC: Lines 554-566 will be replaced by:
  
  “Several parameters influence most or all response variables across various signatures. However, certain parameters affect only one or two signatures of the response variables. For instance, the Runoff Coefficient (SL-RC) – which is one of the parameters considered in the standard WGHM calibration – significantly influences monthly means (MM) of ET in the Ganges basin and MTS of streamflow. Similarly, the snow melt temperature (SN-MT) is important for some cases in snow-dominated catchments in the Brahmaputra basin. These parameters may also affect other response variables and signatures to some extent but do not meet the defined threshold for calibration selection (Figure 4).”
  RC:
  
  ● You created ten (lovely) figures and twelve tables. This is nice in the sense of replicability. However, in my opinion, this is too much to present in the main text. Please consider moving some of the tables that are not essential for the main outcomes of this study to the supplementary. Table 4 could be deleted entirely as it does not contain additional information to Figure 4.
  AC: In the revised manuscript, we will shift the Tables 9 and 10 to the supplementary materials. We will delete Table 4.
  RC:
  
  ● Regarding the structure of the paper: For me the content of the conclusion chapter would be (the main) part of the discussion. Overall, the discussion part had become a bit short by being combined with the results. I recommend renaming the current conclusion chapter and writing a more common conclusion chapter. This will help the reader a lot to understand what you did. Also shorten the current conclusion chapter and do not present the results again.
  AC: To better communicate our main findings, we propose to replace the conclusion chapter with the following text.
  "Conclusions
  
  In this study, we have introduced a multi-objective calibration framework for estimating basin-specific optimal parameter sets for global hydrological models that can utilize observations of multiple model output variables as well as multiple signature of each variable. Applying this approach to the simulation of the Ganges and Brahmaputra basins by the global hydrological model WGHM, we analysed the impacts, benefits and challenges of multi-variable multi-signature sensitivity analysis and multi-variable calibration.
  The multi-variable multi-signature sensitivity analysis facilitated the identification of important parameters that would have remained unidentified if not all variables or signatures were considered. A separate sensitivity analysis has to be done for each spatial unit for which parameters are to be estimated. The proposed parameters selection method, which is based on selecting parameters based on relative impact compared to that of all model parameters, can be modified regarding selected thresholds, and some weighting regarding variables and signatures can be introduced depending on the modelling purpose.
  An increased number of parameters in calibration enhances the potential for model equifinality, a factor that must be considered when employing a multi-variable multi-signature sensitivity analysis. Although we achieved a reasonably good level of parameter identifiability in the multi-variable calibrations, our results do not provide evidence that using multiple observational variables increases parameter identifiability. Certain combinations of observations demonstrated improved parameter identifiability in calibrations, with variations observed between basins. Also, our study found that parameter identifiability is inversely related to the number of parameters selected for calibration.
  The inclusion of additional observational variables in the calibration consistently improved overall model performance that takes into account all observational variables. The value of Q and TWSA observations for the overall performance was higher than that of ET and SWSA observations. The extent of improvement depends on basin characteristics as well as the trade-offs and interactions among the objectives of the associated variables, which also depend on the capability of the model to simulate important hydrological processes in the basin. Streamflow observations were found to be essential for achieving accurate streamflow simulations, which are a primary target for most hydrological model applications.
  We used straightforward metrics to assess two sources of uncertainty in the calibration process, those arising from the search algorithm used to identify the non-dominated Pareto-optimal parameter sets and those stemming from observational errors. As the random seeds used in the BORG algorithm lead to non-negligible variations in the performance in particular for the unobserved variable (Table 7), a sufficient number of replications of the calibration runs are vital. Our analysis revealed that a large portion of variation of “optimal” parameter sets can be attributed to observational uncertainties, a factor often overlooked in calibration exercises. We demonstrated that in the presence of observational uncertainty, relying solely on a ‘best solution’ or compromise solution can become unreliable, leading to decreased overall efficiency. To address this challenge, we propose a method to select an ensemble of ‘acceptable’ solutions from the Pareto solutions derived by the search algorithm, taking into account uncertainties in the observation data used for calibration (section 4.2.4).
  The methodology presented in this study should be applied to calibrate GHMs for all large river basins of the globe where diverse model output variables can be observed. Additionally, it is imperative to explore how accounting for observation uncertainties can enhance the robustness of calibration outcomes. Developing uncertainty-based performance metrics would represent a significant advancement in this direction. In regions with limited data availability, leveraging remote sensing-based streamflow observations such as HydroSAT (http://hydrosat.gis.uni-stuttgart.de) or SWOT can provide new insights, complementing TWSA data from GRACE, GRACE-FO, and GRACE-C (GRACE-Continuity). Given the availability of numerous contemporary ET products, future calibration efforts should explore the benefits of considering these a number of ET data sources.
  "
  RC: Minor comments or examples:
  AC: We greatly value all the insightful comments and are committed to incorporating changes that will enhance the manuscript.
  RC:
  
  1 Introduction:
  
  ● You switch between the terms multi-variable, multi-signature, and multi-objective throughout the manuscript. Please clearly define them in the introduction and use the terms consistently throughout the manuscript. E.g., line 595: “multi-objective” and in the title of the manuscript: “multi-variable”. I assume the same is meant in both cases.
  AC: We used the terms ‘multi-variable’, ‘multi-signature’, and ‘multi-objective’ in their literal meanings in the manuscript. We utilized ‘multi-signature’ specifically in sensitivity analysis (SA), where the effects of parameters on multiple aspects of each variable were explored. The term ‘multi-objective’ is employed in the context of calibration exercises when more than one objective is utilized in the calibration process. Conversely, the term ‘multi-variable’ is applicable in both SA and calibration, as these analyses involve multiple variables. While ‘multi-objective calibration’ and ‘multi-variable calibration’ are not always synonymous—given that multiple objectives can be associated with a single variable, and multiple variables may contribute to a single composite objective—in our study, they are used interchangeably because each objective corresponds to a separate variable. To clarify the meanings of these terms, we will include the following paragraph in the revised manuscript before Line 173.
  “The terms ‘multi-objective’ and ‘multi-variable’ are not always interchangeable, as multiple objectives can stem from the same variable and multiple variables can contribute to a single composite objective. We use these terms contextually based on their literal meanings. Our multi-objective calibration analyses involve multiple objectives and multiple variables, with one objective corresponding to each variable. In the reminder text, we used both terms interchangeably. However, to highlight the involvement of multiple variables, we specifically used the term 'multi-variable'. A ‘signature’ of a data series consists of quantitative metrics or indices that describe its statistical or dynamic properties (McMillan, 2021). In this context, the term ‘multi-signature’ refers to a scenario where multiple quantitative properties of a data series are considered simultaneously.”
  In addition, for clarity, we will reformulate lines 173-179 as follows:
  
  “In this study, we present a comprehensive multi-objective calibration framework for estimating optimal basin-specific parameter values for a global hydrological model by taking into account observations of multiple model output variables. The framework consists of 1) an approach for selecting model parameters that is based on a global sensitivity analysis and considers multiple signatures of each variable and 2) a multi-objective parameter optimization that includes multiple variables. We apply the framework to WGHM and estimate, for the Ganges and the Brahmaputra basins of the Indian subcontinent, the most important model parameters using multi-variable multi-signature sensitivity analysis and multi-variable parameter optimization.”
  RC:
  
  ● Line 34: “T” not explained anywhere.
  AC: Line 34 will be corrected by replacing “T” with “TWSA”.
  RC:
  
  ● Lines 53-53: local or regional hydrological models
  AC: In Lines 52-53, by the statement “Even more than local to regional hydrological models, GHMs suffer from high predictive uncertainties ...”, we intend to compare all models that lie between local and regional scales to the global scale models. If, however, the expression is not clear in the statement, we may change it to “local or regional hydrological models”.
  RC:
  
  ● Lines 62-64: Abbreviations are placed inconsistently. Maybe do: For example, the Water - Global Assessment and Prognosis (WaterGAP) Global Hyrdological Model (WGHM, Müller...)..
  AC:
  
  To avoid inconsistencies in the abbreviation and full name of WGHM, we will write “the WaterGAP Global Hydrological Model (WGHM)” as used in the reference model description paper for WGHM by Müller Schmied et al. (2021) and will leave out the full form of WaterGAP in Lines 62-64.
  RC:
  
  ● Lines 70-75: Add a reference
  AC: We will refer to the study of Cheng et al. (2005) where they mentioned that “...with more parameters, it takes longer time to accomplish the optimization procedure. This may result in premature termination of the optimization process which will adversely affect the quality of the results.”
  RC:
  
  ● Lines 88-91: Sentence is a bit difficult to follow. Please rephrase.
  AC: The statement in Lines 88-91 will be written as follows:
  
  “The equifinality thesis proposed by Beven (1993) challenges the notion of a singular optimal model – whether in terms of structure, input, or parameters – particularly in the presence of multifaceted uncertainties. Instead, it suggests that there can be alternative models that exhibit comparable predictive capabilities while differing in their specific configurations.”
  RC:
  
  ● Lines: 138-144: Maybe mention earlier!?
  AC: We will move the Lines 138-144 up and merge them with the paragraph that starts at Line 117. The paragraph would read as follows: “The equifinality thesis proposed by Beven (1993) challenges …… specific configurations. In the context of multi-objective calibration, Efstratiadis and Koutsoyiannis ….”.
  RC:
  
  ● Line 175: First time using the term “signature”. Mention and explain it before.
  AC: In the revised manuscript, we will add the definition of ‘signature’ in introduction chapter. Please see our reply to the earlier comment namely “You switch between the terms multi-variable, multi-signature, and multi-objective throughout the manuscript. Please clearly define them in the introduction and use the terms consistently throughout the manuscript. E.g., line 595: “multi-objective” and in the title of the manuscript: “multi-variable”. I assume the same is meant in both cases.”
  RC:
  
  ● Lines 191-193: Delete.
  AC: Lines 191-193 will be deleted in the revised version of the manuscript.
  RC:
  
  2 Study area:
  
  ● Table 1: Over which period are the means calculated (e.g., mean summer temperatures)?
  AC: We will include a note in Table 1 indicating that the temperature means were estimated using data from 1969 to 2004.
  RC:
  
  ● How did you decide on the two basins? What are the differences between the two basins? What was the reasoning for not choosing two very different basins (regarding climate, geology, water abstractions etc.) to see the influence of these characteristics on the modelling scenarios?
  AC: The Ganges and Brahmaputra basins were selected for this study due to their significant geopolitical importance. Both basins are transboundary and characterized by very high population densities and substantial water demands. They are situated in a critical region where climate change poses a serious threat to water availability, with potentially severe impacts on human lives. Despite these challenges, these basins also exhibit numerous distinct features of scientific interest. The hydrological processes governing these basins differ substantially. The Brahmaputra is dominated by snowmelt, whereas the Ganges basin encompasses a wide range of climatic zones from arid to semi-arid to humid. Agricultural water use exerts the most significant influence on human-nature interactions in the basin. Although the impacts of various geomorphological and physiographic characteristics are intriguing, our study's limited scope prevents us from exploring these interactions further.
  RC: Highlight these differences or similarities between the basins also in the interpretation and comparison of the modeling results of the two basins. Why were different parameters selected between the different basins (Figure 4)? E.g., lines 693-696: Why do you think that is the case? Is there any explanation for that?
  AC: We observe different sensitivities in the two study basins due to their distinct dominant hydrological processes. For instance, the Brahmaputra basin, being snow-dominated in its upstream parts, exhibits sensitivity of most response variables to snow parameters, with all four snow parameters being selected as important for this basin. In contrast, snow parameters are not significant for the Ganges basin, as only a small fraction of the basin is affected by snow processes.
  Regarding the differences in the Pareto fronts between the basins (as mentioned in Lines 693-696), we were unable to provide a definitive explanation. Apart from differences in dominant hydrological processes, variations in the error structure of observations could also contribute to the differences observed in the shape of the Pareto fronts. We believe that an in-depth investigation is necessary to elucidate the potential causes underlying these interactions.
  RC:
  
  3 Data and methods
  
  ● Why not present available data in the chapter study area?
  AC: We intend to present the available data in a separate chapter due to the length of the text created by detailing the sources, processing, and the descriptions of the error information for each observation series. This approach will enhance the readability of both ‘study area’ and ‘data and methods’ chapters. Also, because of the importance of the different observables used for model calibration in this study, we think that a separate chapter on the data is justified. Moreover, as we partly use observables that are available at the global scale and not only at the basin scale, we did not consider subsuming the data description under the study site description.
  RC:
  
  ● Line 343: Title of chapter 3.2.5 Water balance closure is a bit confusing as it’s a subchapter of 3.3 observations. Water balance closure is not an observation. Maybe call it storage change (which is also not exactly an observation, but sill might fit better)?
  AC:
  
  The subchapter '3.2.5 Water balance closure' discusses an inherent imbalance in the water balance components of the observational data, pointing to lower quality observations and the presence of inconsistencies that could impact the calibration process. To enhance clarity, we will rename this subsection from '3.2.5 Water balance closure' to '3.2.5 Water balance closure of observations'. We need to stress that water balance closure as discussed here is not the storage change as derived from Precipitation-ET-runoff, but the imbalance of the components – Precipitation, ET, and storage change, all individually derived from observations.
  RC:
  
  ● Lines 436-448: Move them after line 465.
  AC: We will move Lines 436-448 to the end of Line 465 in a new paragraph.
  RC:
  
  ● Line 473: One comma too many
  AC: The comma in Line 474 will be deleted.
  RC:
  
  4 Results and discussion
  
  ● Lines 554-566: Explanation of parameters could also be in the method section when parameters are being presented or is this meant as a discussion?
  AC: The explanations of the parameters are given in Table-2 within the ‘data and methods’ chapter; however, the detail descriptions of these parameters and their physical (and/or hypothetical) meanings are not presented in the text. The readers will be directed to Müller Schmied et al. (2021) for further explanation of those parameters. In lines 554-566, we intent to discuss some of the results of our sensitivity analysis.
  RC:
  
  ● Line 583: Maybe add a short sentence why P-PM was added later or refer to the method section (lines 263-268). Why is EP-NM not added?
  AC: We will add the following statement to explain why P-PM was selected for calibration:
  
  “Nevertheless, P-PM was selected as an additional calibration parameter because precipitation forcing data, in contrast to radiation data, contain high uncertainties and biases which need to be corrected during model calibration, if possible. Recently, Goteti & Famiglietti (2024) pointed out the underestimation of precipitation in data sets of India that need to be corrected (here by P-PM) to avoid non-physical or process-based compensation by calibration of other parameters.”
  RC:
  
  ● Line 648: Livneh and Lettenmaier (2012)
  AC: We will correct Line 648, the statement will read “In their study, Livneh and Lettenmaier, (2012) concluded that …”.
  RC:
  
  ● Table 6 and 7: Are those the NSE values of the calibration? I am not sure if I got that correctly, but due to data scarcity you could not calculate the NSE for all variables for the validation period (only for Q and TWSA). Is that correct?
  AC: The Tables 6 and 7 report the mean and standard deviation of the performance metric NSE of 8 compromise solutions of each calibration experiment. We reran the WGHM model updating the parameters of the compromise solutions and computed the NSEs of all target variables to obtain the performance of those solutions across all variables. The NSE values remained unchanged for the variables that had been used as objectives in the calibration experiments. For enhance clarity, we intend to rename the title of Table 6 as follows:
  “Table 6: Mean and standard deviation of model performance indicator NSE for the compromise solutions (N = 8) of the calibration experiments in the Ganges river basin during calibration periods. The WGHM model was rerun using parameters from the compromise solutions to compute NSEs of all variables. The μ_{NSE, ALL} represents the mean NSE across all objectives over all eight compromise solutions per experiment. The highest NSE for each objective is highlighted using bold face, also the highlighted mean across objectives (μ_{NSE, ALL}) show the highest value in each group (2-objective, 3-objective, and 4-objective). The objective obtained in the standard calibration and in the uncalibrated model is also shown.”
  The title of Table 7 will remain unchanged.
  The NSE values in Tables 6 and 7 were calculated based on the monthly values during the calibration period. As you correctly pointed out, during validation, we were only able to compute the efficiency score for Q and TWSA, as no data were available for the other variables during the validation periods (2010-variable years).
  RC:
  
  ● Figure 6 and Figure 7 are a bit small.
  AC: We will increase the size of Figures 6 and 7.
  RC:
  
  ● The authors chose to have a combined results and discussion chapter. Sometimes an explanation as to why the results turned out the way they did was missing.
  AC: We agree that some of the results were not adequately discussed. We will provide the missing explanations of why some of the results turned out as they did in the revised manuscript.
  RC:
  
  ● Lines 864-865: From the following text of that paragraph, I still did not understand why the Ganges and Brahmaputra basins had different identifiability regarding their parameter comparison. Could you please explain that more clearly?
  AC: The parameter identifiability has a typical inverse association with the dimensionality of the search space. In the Ganges experiments, relatively higher parameter identifiability was observed due to the use of a smaller number of parameters in calibration. We investigated the relationship between parameter identifiability and the sensitivity of the response variables. Additionally, we examined evidence regarding the impact of multi-variable calibration on increasing parameter identifiability. To clarify our statement, we will rewrite the statement in Lines 864-865 as follows:
  “Due to the fewer parameters involved in the Ganges calibration experiments, better parameter identifiability is observed within the basin compared to experiments in the Brahmaputra basin. We investigated how individual observations influence parameter identifiability during calibration and explored the impact of sensitivity on parameter identifiability.”
  Related to this issue, we reformulate the Lines 985-899. Instead of “In the Brahmaputra basin, the identifiability of parameters tends to be lower than in the Ganges basin. Four parameters (P-PM, SN-MT, SN-TG, and SL-RC) are constrained well (i.e., they have low coverage of their a-priori range in the compromise solution sets) with the variable Q, two parameters (SN-MT and SL-MSM) by the ET variable, and two (SN-TG and SW-RRM) by the SWSA observations”, we will write “In the Brahmaputra basin, four parameters (P-PM, SN-MT, SN-TG, and SL-RC) are constrained well (i.e., they have low coverage of their a-priori range in the compromise solution sets) with the variable Q, two parameters (SN-MT and SL-MSM) by the ET variable, and two (SN-TG and SW-RRM) by the SWSA observations”.
  RC:
  
  ● Could you add an outlook at the end of this section considering the following points: What do you expect for other basins? Could this method be applied to other basins? What would be the challenges?
  AC: We will revise and rewrite our conclusion provided in one of our earlier responses. Additionally, we will include a concluding paragraph emphasizing that the methodologies employed in this study can be effectively applied to other basins and other global hydrological models provided that error information regarding the observations is available.
  RC:
  
  5 Conclusion
  
  ● Lines 1031-1032: repetitive
  AC: In the new conclusion chapter, this statement will be omitted.
  References
  Cheng, C. T., Wu, X. Y., & Chau, K. W. (2005). Multiple criteria rainfall–runoff model calibration using a parallel genetic algorithm in a cluster of computers / Calage multi-critères en modélisation pluie–débit par un algorithme génétique parallèle mis en œuvre par une grappe d’ordinateurs. Hydrological Sciences Journal, 50(6), 1087. https://doi.org/10.1623/hysj.2005.50.6.1069
  Goteti, G. and Famiglietti, J.: Extent of gross underestimation of precipitation in India, Hydrol. Earth Syst. Sci. Discuss. [preprint], https://doi.org/10.5194/hess-2024-18, in review, 2024.
  McMillan, H. K.: A review of hydrologic signatures and their applications, WIREs Water, 8, e1499, https://doi.org/https://doi.org/10.1002/wat2.1499, 2021.
  Müller Schmied, H., Cáceres, D., Eisner, S., Flörke, M., Herbert, C., Niemann, C., Peiris, T. A., Popat, E., Portmann, F. T., Reinecke, R., Schumacher, M., Shadkam, S., Telteu, C.-E., Trautmann, T., and Döll, P.: The global water resources and use model WaterGAP v2.2d: model description and evaluation, Geosci Model Dev, 14, 1037–1079, https://doi.org/10.5194/gmd-14-1037-2021, 2021.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2324-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2324', Anonymous Referee #1, 02 Feb 2024

I enjoyed reading the manuscript.
My main concerns are; 1) the temporal only calibration of a distributed hydrologic model and 2) use of coarse meteo inputs while era5-land offers 0.1 inputs.

Other comments:

Section 3.3: More details on the SA should be provided. Morris is an elaborated SA method as compared to the one at a time local methods so that much more runs are required in Morris. How many runs were required for a 24 parameter model (Line 263).
Can Morris identify effects of parameter interactions on the sensitivities like in Sobols’ method? Why did you choose Morris instead of looking at Jacobian matrix in simple terms?

L402: 5 times run of first year? I couldn’t understand how? 1985-89 spin up run and one time should be enough to reach equilibrium, shouldn’t be?

Eq2: Why only NSE is used as performance metric? Why only temporal calibration is pursued for a distributed hydrologic model which can produce flux maps? How did you deal with unit differences from satellite AET (watt/m2) and model outputs at mm/day? The same may apply to Grace anomaly values and recharge output of the model.

NSE is a bias sensitive metric and it might be necessary to use bias insensitive spatial pattern metrics in the calibration.

Introduction misses recent works on satellite based evaluation and calibration of the distributed hydrologic models using actual ET. Also, trade offs in multi objective Pareto calibration of hydrologic models have been studied in the literature. Please update your literature review with studies from 2018 to Jan 2024 from top journals (HESS and WRR). Compare your results with them in the discussions.

Conclusion: Very different than conventional conclusion sections. Detailed results (numbers) should not be given here but just the conclusions drawn from the results should be provided in bullets. It is lengthy and not easy to follow. Research Questions are repeated and probably not necessary.
The reader needs the main messages from the study and not the repetition of the results.

Citation: https://doi.org/10.5194/egusphere-2023-2324-RC1
- AC1: 'Reply on RC1', HM Mehedi Hasan, 15 May 2024
  
  We thank you very much for your helpful comments and constructive suggestions for improving the manuscript. Below, each comment (indicated by “RC”) is followed by our answer (indicated by “AC”). Proposed new text in the revised manuscript is written in bold.
  RC: I enjoyed reading the manuscript.
  
  My main concerns are; 1) the temporal only calibration of a distributed hydrologic model and 2) use of coarse meteo inputs while era5-land offers 0.1 inputs.
  AC: 1) We agree that conducting a spatio-temporal calibration analysis would be the preferred approach from a conceptual point of view for a distributed model. However, such an approach would lead to a significantly expanded parameter space, which could render calibration impractical within a reasonable timeframe and acceptable accuracy range. Another major limitation is the insufficient capacity of observations to effectively constrain a large number of model parameters. As discussed in the introduction, most hydrological observations can effectively constrain only around 4 to 6 model parameters. While the hydrology modelling community is exploring methods to parameterize spatially distributed parameters for right reasons and we are also interested in conducting such a spatio-temporal calibration, our current study has its focus on basin-scale parameter calibration with more observables than usually performed.
  2) At present, the WaterGAP Global Hydrological Model (WGHM version 2.2e) can only operate at 0.5-degree resolution and thus with 0.5-degree meteorological inputs. We thus do not expect significant added value of using higher-resolution forcing data that need to be aggregated to 0.5-degree resolution. However, if the model resolution will change in future, the methods employed in this study could be adapted and applied.
  RC:
  
  Other comments:
  
  Section 3.3: More details on the SA should be provided. Morris is an elaborated SA method as compared to the one at a time local methods so that much more runs are required in Morris. How many runs were required for a 24 parameter model (Line 263).
  AC: We will provide the details of the Morris method in the Appendix (see below in one of our responses). We reported that out of 24 model parameters, we excluded two parameters in the SA – EP-NM and P-PM (in Line 263). These two parameters directly modify model forcing, i.e., precipitation and net radiation, leading to very high changes in most target variables, which suppresses the relative influence of the other parameters. Thus, 22 parameters were considered in the sensitivity analysis (mentioned in Lines 398-399). For the 22 parameters, we needed to evaluate 23,000 samples for each basin. The number of model runs required in Morris's method is calculated as r × (m + 1), where m is the number of parameters and r is the number of elementary effects to be used. Additional details will be provided in the method description in the Appendix. To specify the number of model runs in the sensitivity analysis, we will add a statement after Lines 398-399 and rephrase Lines 399-402.
  Instead of “For the sensitivity analysis, model simulations for the period 1990-2019 were used, with 1985-1989 taken as the model spin-up period and the first year of the spin-up was run 5 times to allow the water storages to fill up to an equilibrium state”, we will write “During the SA, a total of 23,000 samples were analysed for each of the river basins. Model simulations were conducted for the period 1990-2019, with the spin-up period from 1985 to 1989 and the initial year of the spin-up was run five times to allow water storages to reach an equilibrium state.”
  RC:
  
  Can Morris identify effects of parameter interactions on the sensitivities like in Sobols’ method? Why did you choose Morris instead of looking at Jacobian matrix in simple terms?
  AC: The Morris method calculates the partial derivatives (i.e., elementary effects) at various points in the parameter space, similar to those in the Jacobian matrix. The sensitivity index is determined by averaging these partial derivatives. This approach provides a more accurate estimation of a parameter's effect compared to local methods like the Jacobian matrix.
  Unlike the variance decomposition method of Sobol, Morris's method does not explicitly differentiate interaction terms. However, it does produce a variance term for the elementary effect that accounts for parameter interactions and the functional non-linearity of the model response. We utilized this variance term in the parameter selection process. We acknowledge the importance of providing comprehensive sensitivity analysis details and will include them in the appendix.
  We will add to the text that the Morris SA is a global sensitivity analysis. We will reformulate Lines 391-393 as follows: “The sensitivity index of the EET method averages out the local influences by taking samples from many locations in the parameter space, making it a global sensitivity analysis method (Pianosi et al., 2016)”.
  In additional, we will add the following text to section 3.3 after Line 387:
  
  “While the Morris method does not explicitly show interaction terms, it produces a variance term for the elementary effect that accounts for parameter interactions and the functional non-linearity of the model response. We computed the standard error of the sensitivity index from this variance term and used it for parameter selection (Algorithm 4 in Appendix A).”
  Please note that "Appendix A: Elementary Effect Test (EET) method of Morris (1991)" is provided as a supplemental document with this response letter.
  RC:
  
  L402: 5 times run of first year? I couldn’t understand how? 1985-89 spin up run and one time should be enough to reach equilibrium, shouldn’t be?
  AC: The WaterGAP model offers two methods to achieve equilibrium in the state variables by spin-up runs: (i) repeating the simulation of the first year for multiple times, and (ii) starting the model from a sufficiently early point in time. Since there is no general guideline available for the WGHM model that specifies how many spin-up years are required to reach equilibrium states of the storage variables, we utilized both available options.
  RC:
  
  Eq2: Why only NSE is used as performance metric? Why only temporal calibration is pursued for a distributed hydrologic model which can produce flux maps? How did you deal with unit differences from satellite AET (watt/m2) and model outputs at mm/day? The same may apply to Grace anomaly values and recharge output of the model.
  AC:
  
  We chose NSE as our performance metric because it is widely used in the field of hydrology, although there are significant concerns associated with its use. For example, NSE is sensitive to outliers, biases, and seasonality, and it uses the observed mean as the benchmark which may not be an adequate reference for most hydrologic variables (Schaefli and Gupta, 2007). Livneh and Lettenmaier (2012), however, noted that the NSE can be a useful indicator for inter-basin performance comparison since it normalizes the mean squared error (MSE) by the observed variance (σ_o²) of each basin. While we acknowledge that we have not addressed all the limitations of NSE, we considered it sufficient for our study as our primary objective was to evaluate the benefits and trade-offs of multi-variable calibration. Nonetheless, our methodology allows for the use of alternative performance metrics. Please note that we used other commonly used indices such as RMSE and correlation for model validation (Table 12), and in the supplementary materials, we have provided the Kling-Gupta Efficiency (KGE) and its three components for the overall compromise solutions (Tables S8, S9, and S10).
  The WaterGAP Global Hydrology Model (WGHM) indeed generates spatially distributed data of water fluxes. However, in our already complex study, we opted to utilize basin-scale observations for all variables to obtain more accurate estimates of observational errors. Consequently, we were unable to leverage the flux maps and explore the potential use of spatial pattern-based metrics.
  We used the LandFlux-EVAL multi-dataset synthesis ET product developed by Mueller et al. (2013), which reports ET values in units of mm/day. Similarly, for the GRACE anomaly, we incorporated basin-scale total water storage anomaly (TWSA) data processed in units of water height equivalent (mm). These GRACE TWSA data, including propagated errors, were prepared by the University of Bonn following the methodology outlined by Gerdener et al. (2020).
  RC:
  
  NSE is a bias sensitive metric and it might be necessary to use bias insensitive spatial pattern metrics in the calibration.
  AC: As mentioned in the response to earlier comments, we did not use spatial pattern-based performance metrics because we employed basin-scale monthly average observations. As we also want to improve the simulated water balance for the target area at the basin scale, a bias-sensitive performance metric seems to be reasonable choice.
  RC:
  
  Introduction misses recent works on satellite based evaluation and calibration of the distributed hydrologic models using actual ET. Also, trade offs in multi objective Pareto calibration of hydrologic models have been studied in the literature. Please update your literature review with studies from 2018 to Jan 2024 from top journals (HESS and WRR). Compare your results with them in the discussions.
  AC: We will update the literature review in the introduction and include the following statements after Lines 149-152 of the current manuscript.
  
  “Demiral et al. (2018) demonstrated successful enhancement of spatial pattern performance in a distributed hydrological model through multi-objective calibration using discharge and remote-sensing-based ET observations. Additionally, Demiral et al. (2024) provide a discussion on the trade-offs between temporal and spatial pattern calibration of the same distributed model using discharge and ET observations.”
  Also, we will insert the following text after Lines 162-165:
  “Hulsman et al. (2021) utilized in-situ discharge, satellite-based evapotranspiration (ET), and GRACE Total Water Storage Anomaly (TWSA) data to calibrate a process-based distributed hydrological model in a large semi-arid basin in Africa, aiming to incrementally improve the model's process representation.”
  RC:
  
  Conclusion: Very different than conventional conclusion sections. Detailed results (numbers) should not be given here but just the conclusions drawn from the results should be provided in bullets. It is lengthy and not easy to follow. Research Questions are repeated and probably not necessary.
  The reader needs the main messages from the study and not the repetition of the results.
  AC: We will reformulate the entire conclusion chapter based on the suggestions, presenting the main findings clearly so that readers can quickly grasp the key messages. In the new conclusion chapter, we will avoid repeating the research questions. Please refer to the suggested new conclusion chapter provided in one of our responses to the comments of the second anonymous referee.
  References
  
  Campolongo, F., Saltelli, A., and Cariboni, J.: From screening to quantitative sensitivity analysis. A unified approach, Comput Phys Commun, 182, 978–988, https://doi.org/https://doi.org/10.1016/j.cpc.2010.12.039, 2011.
  Demirel, M. C., Koch, J., Rakovec, O., Kumar, R., Mai, J., Müller, S., Thober, S., Samaniego, L., and Stisen, S.: Tradeoffs Between Temporal and Spatial Pattern Calibration and Their Impacts on Robustness and Transferability of Hydrologic Model Parameters to Ungauged Basins, Water Resources Research, 60, e2022WR034193, https://doi.org/https://doi.org/10.1029/2022WR034193, 2024.
  Demirel, M. C., Mai, J., Mendiguren, G., Koch, J., Samaniego, L., and Stisen, S.: Combining satellite data and appropriate objective functions for improved spatial pattern performance of a distributed hydrologic model, Hydrol. Earth Syst. Sci., 22, 1299–1315, https://doi.org/10.5194/hess-22-1299-2018, 2018.
  Gerdener, H., Engels, O., and Kusche, J.: A framework for deriving drought indicators from the Gravity Recovery and Climate Experiment (GRACE), Hydrol Earth Syst Sci, 24, 227–248, https://doi.org/10.5194/hess-24-227-2020, 2020.
  Hulsman, P., Savenije, H. H. G., and Hrachowitz, M.: Learning from satellite observations: increased understanding of catchment processes through stepwise model improvement, Hydrol. Earth Syst. Sci., 25, 957–982, https://doi.org/10.5194/hess-25-957-2021, 2021.
  Livneh, B. and Lettenmaier, D. P.: Multi-criteria parameter estimation for the Unified Land Model, Hydrol Earth Syst Sci, 16, 3029–3048, https://doi.org/10.5194/hess-16-3029-2012, 2012.
  Morris, M. D.: Factorial Sampling Plans for Preliminary Computational Experiments, Technometrics, 33, 161–174, https://doi.org/10.1080/00401706.1991.10484804, 1991.
  Mueller, B., Hirschi, M., Jimenez, C., Ciais, P., Dirmeyer, P. A., Dolman, A. J., Fisher, J. B., Jung, M., Ludwig, F., Maignan, F., Miralles, D. G., McCabe, M. F., Reichstein, M., Sheffield, J., Wang, K., Wood, E. F., Zhang, Y., and Seneviratne, S. I.: Benchmark products for land evapotranspiration: LandFlux-EVAL multi-data set synthesis, Hydrol. Earth Syst. Sci., 17, 3707–3720, https://doi.org/10.5194/hess-17-3707-2013, 2013.
  Pianosi, F., Sarrazin, F., and Wagener, T.: A Matlab toolbox for Global Sensitivity Analysis, Environmental Modelling & Software, 70, 80–85, https://doi.org/https://doi.org/10.1016/j.envsoft.2015.04.009, 2015.
  Schaefli, B. and Gupta, H. V.: Do Nash values have value?, Hydrological Processes, 21, 2075–2080, https://doi.org/10.1002/hyp.6825, 2007.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2324-AC1
RC2:
'Comment on egusphere-2023-2324', Anonymous Referee #2, 09 Feb 2024
This study presents a very thorough analysis of multi-variable calibrations considering different variables for a global hydrological model. The model was applied on two exemplary basins using in-situ and multi-satellite data. The authors did an excellent job in considering a variety of aspects that are important for modelling (e.g., required number of model runs, Pareto frontier, parameter sensitivity/importance). I also liked the selection of the authors of the data that was considered for the multi-variable calibration scenarios.
I am recommending minor revision because even though the modelling analyses seem thorough, the presentation for the reader could be improved. As a reader, it was rather difficult to extract the main aspects of this research as the text was not very concise. Additionally, a slightly unconventional structure regarding the results and discussion, and conclusion was used. I recommend shortening the manuscript or summarizing several points into one point to make it easier for future readers to follow it and to get the main points of the study. This is a very general recommendation, so I have picked out examples for you to explain what I mean.
Make your sentences more concise:

Lines 554-566 could for example be shortened into something like this: “Several parameters influence most or all response variables across various signatures. However, certain parameters affect only one or two signatures of the response variables. For instance, the Runoff Coefficient (SL-RC) significantly influences monthly means (MM) of ET in the Ganges basin and MTS of streamflow. Similarly, the snow melt temperature (SN-MT) is important for some cases in snow-dominated catchments in the Brahmaputra basin. These parameters may also affect other response variables and signatures to some extent but do not meet the defined threshold for calibration selection (Figure 4).”

You created ten (lovely) figures and twelve tables. This is nice in the sense of replicability. However, in my opinion, this is too much to present in the main text. Please consider moving some of the tables that are not essential for the main outcomes of this study to the supplementary. Table 4 could be deleted entirely as it does not contain additional information to Figure 4.

Regarding the structure of the paper: For me the content of the conclusion chapter would be (the main) part of the discussion. Overall, the discussion part had become a bit short by being combined with the results. I recommend renaming the current conclusion chapter and writing a more common conclusion chapter. This will help the reader a lot to understand what you did. Also shorten the current conclusion chapter and do not present the results again.

Minor comments or examples:
1 Introduction:
You switch between the terms multi-variable, multi-signature, and multi-objective throughout the manuscript. Please clearly define them in the introduction and use the terms consistently throughout the manuscript. E.g., line 595: “multi-objective” and in the title of the manuscript: “multi-variable”. I assume the same is meant in both cases.

Line 34: “T” not explained anywhere.

Lines 53-53: local or regional hydrological models

Lines 62-64: Abbreviations are placed inconsistently. Maybe do: For example, the Water - Global Assessment and Prognosis (WaterGAP) Global Hyrdological Model (WGHM, Müller...)..

Lines 70-75: Add a reference

Lines 88-91: Sentence is a bit difficult to follow. Please rephrase.

Lines: 138-144: Maybe mention earlier!?

Line 175: First time using the term “signature”. Mention and explain it before.

Lines 191-193: Delete.

2 Study area:
Table 1: Over which period are the means calculated (e.g., mean summer temperatures)?

How did you decide on the two basins? What are the differences between the two basins? What was the reasoning for not choosing two very different basins (regarding climate, geology, water abstractions etc.) to see the influence of these characteristics on the modelling scenarios?

Highlight these differences or similarities between the basins also in the interpretation and comparison of the modeling results of the two basins. Why were different parameters selected between the different basins (Figure 4)? E.g., lines 693-696: Why do you think that is the case? Is there any explanation for that?

3 Data and methods
Why not present available data in the chapter study area?

Line 343: Title of chapter 3.2.5 Water balance closure is a bit confusing as it’s a subchapter of 3.3 observations. Water balance closure is not an observation. Maybe call it storage change (which is also not exactly an observation, but sill might fit better)?

Lines 436-448: Move them after line 465.

Line 473: One comma too many

4 Results and discussion
Lines 554-566: Explanation of parameters could also be in the method section when parameters are being presented or is this meant as a discussion?

Line 583: Maybe add a short sentence why P-PM was added later or refer to the method section (lines 263-268). Why is EP-NM not added?

Line 648: Livneh and Lettenmaier (2012)

Table 6 and 7: Are those the NSE values of the calibration? I am not sure if I got that correctly, but due to data scarcity you could not calculate the NSE for all variables for the validation period (only for Q and TWSA). Is that correct?

Figure 6 and Figure 7 are a bit small.

The authors chose to have a combined results and discussion chapter. Sometimes an explanation as to why the results turned out the way they did was missing.

Lines 864-865: From the following text of that paragraph, I still did not understand why the Ganges and Brahmaputra basins had different identifiability regarding their parameter comparison. Could you please explain that more clearly?

Could you add an outlook at the end of this section considering the following points: What do you expect for other basins? Could this method be applied to other basins? What would be the challenges?

5 Conclusion
Lines 1031-1032: repetitive
Citation: https://doi.org/10.5194/egusphere-2023-2324-RC2
- AC2: 'Reply on RC2', HM Mehedi Hasan, 15 May 2024
  
  We thank you very much for your helpful comments and constructive suggestions for improving the manuscript. Below, each comment (indicated by “RC”) is followed by our answer (indicated by “AC”). Proposed new text in the revised manuscript is written in bold.
  RC: This study presents a very thorough analysis of multi-variable calibrations considering different variables for a global hydrological model. The model was applied on two exemplary basins using in-situ and multi-satellite data. The authors did an excellent job in considering a variety of aspects that are important for modelling (e.g., required number of model runs, Pareto frontier, parameter sensitivity/importance). I also liked the selection of the authors of the data that was considered for the multi-variable calibration scenarios.
  AC: Thank you for your positive and encouraging feedback on our manuscript.
  RC: I am recommending minor revision because even though the modelling analyses seem thorough, the presentation for the reader could be improved. As a reader, it was rather difficult to extract the main aspects of this research as the text was not very concise. Additionally, a slightly unconventional structure regarding the results and discussion, and conclusion was used. I recommend shortening the manuscript or summarizing several points into one point to make it easier for future readers to follow it and to get the main points of the study. This is a very general recommendation, so I have picked out examples for you to explain what I mean.
  AC: To concisely report the main outcomes of our work, we will reformulate the conclusion chapter. Please refer to the draft of the revised conclusion chapter provided in one of our responses. To shorten the manuscript, we will remove several tables (Tables 4, 9, and 10) from the main text. Additionally, we intend to make the following changes in the manuscript that contribute to a more concise writing and to shorten the manuscript.
  We will rewrite Lines 589-591 as follows:
  
  Instead of “In total, 4.8 million samples were evaluated during the study which approximately consumed over 3.2 million CPU hours of execution time for the WGHM model to assess those samples”, we will write “Overall, the study involved the evaluation of 4.8 million samples, requiring approximately 3.2 million CPU hours of model run time”.
  We will rewrite Line 594-601 as follows:
  
  Instead of “We obtained a good number of non-dominated solutions, i.e. Pareto-optimal parameter sets, in most of the multi-objective calibrations (Table 5). The cardinality (number of solutions) of the non-dominated solution set of a multi-objective calibration depends mainly on the shape of the Pareto frontier (PF) and the crowding distance of the members. The crowding distance is controlled in the Borg algorithm by the epsilon parameters which was 0.005 for all objectives. The greater solution cardinality in the Ganges basin experiments, when compared to those in the Brahmaputra basin, already indicates heightened trade-offs among the objectives, especially between NSE_Q and NSE_SWSA, as well as between NSE_SWSA and NSE_TWSA”,
  we will write “A high cardinality, i.e., a high number of solutions in the non-dominated Pareto solution set, was obtained in most multi-objective calibrations. The cardinality depends on the shape of the Pareto frontier (PF) and the allowed crowding distance, which was constant (0.005) for all objectives in all experiments. A wider PF resulting in high cardinality reflects a high trade-off between the objectives. The high cardinality observed in the Ganges experiments indicates marked trade-offs among objectives, especially between NSE_Q and NSE_SWSA, as well as between NSE_SWSA and NSE_TWSA”.
  We will delete the Lines 619-625: “This is a common occurrence in multi-objective optimization scenarios (Meyer Oliveira et al., 2021; Livneh and Lettenmaier, 2012). However, this comes at the cost of performance loss for the other variables that were not considered for calibration. The standard calibration of WGHM for mean annual streamflow (Müller Schmied et al., 2021) resulted in poorer results for all performance criteria in both the Ganges and Brahmaputra basins than the uncalibrated model for both basins. The mean NSE of all four objectives (µ_NSE,ALL) was used as a simple indicator of the overall performance of an experiment”.
  We will rewrite Lines 638-643 as follows:
  
  Instead of “Different from the Brahmaputra, calibration against only Q in the Ganges basin (both the calibration method presented here and the standard WGHM method) resulted in worse fits to all three other variables as compared to the uncalibrated model version. Multi-variable calibration, however, works best if streamflow observations are included because the average fit to all observations is, in the case of both 2-objective and 3-objective calibration cases, highest if NSE_Q is one of the calibration targets (Table 6 and Table 7)”,
  we will write “Different from the Brahmaputra, calibration against only Q in the Ganges basin resulted in worse fits to all three other variables as compared to the uncalibrated model version. Multi-variable calibration, however, works best if streamflow observations are included. Excluding NSE_Q as an objective in any calibration resulted in significantly poorer performance in streamflow simulation (Table 6 and Table 7)”.
  We will delete Lines 650-651: “However, in the majority of the calibration cases, the performance in streamflow simulation was very poor when the model was not constrained by streamflow observations.”
  We will delete Lines 699-711: “In contrast to the apparent trade-offs among objectives, there could be other non-traditional interactions among the objectives. For instance, in all replications of the calibration with only NSE_Q in the Ganges basin, we observed negative NSE_TWSA (not shown). But using only NSE_TWSA as the calibration objective, we consistently observed very high values in NSE_Q for all replications. Likewise, when NSE_ET is used as the only calibration objective, NSE_Q exhibited a significant decrease across replications in the two basins. However, when NSE_Q is employed as the only objective, reasonable performance in ET simulation is observed. Hence, the nature of the association between a pair of objectives, when attempting to describe the trade-offs, is neither unidirectional nor easily traceable through correlation analysis. Furthermore, there could be three-way trade-offs and so forth in a high-dimensional objective space, making them challenging to detect. While the association and causality of such relationships are indeed intriguing, examining the nature of trade-offs among the objectives is beyond the scope of the current study.”
  We will also delete Lines 735-736: “This indicates the high reliability of our findings regarding the trade-offs among objectives discussed in the earlier paragraphs”.
  RC:
  
  ● Make your sentences more concise: Lines 554-566 could for example be shortened into something like this: “Several parameters influence most or all response variables across various signatures. However, certain parameters affect only one or two signatures of the response variables. For instance, the Runoff Coefficient (SL-RC) significantly influences monthly means (MM) of ET in the Ganges basin and MTS of streamflow. Similarly, the snow melt temperature (SN-MT) is important for some cases in snow-dominated catchments in the Brahmaputra basin. These parameters may also affect other response variables and signatures to some extent but do not meet the defined threshold for calibration selection (Figure 4).”
  AC: Lines 554-566 will be replaced by:
  
  “Several parameters influence most or all response variables across various signatures. However, certain parameters affect only one or two signatures of the response variables. For instance, the Runoff Coefficient (SL-RC) – which is one of the parameters considered in the standard WGHM calibration – significantly influences monthly means (MM) of ET in the Ganges basin and MTS of streamflow. Similarly, the snow melt temperature (SN-MT) is important for some cases in snow-dominated catchments in the Brahmaputra basin. These parameters may also affect other response variables and signatures to some extent but do not meet the defined threshold for calibration selection (Figure 4).”
  RC:
  
  ● You created ten (lovely) figures and twelve tables. This is nice in the sense of replicability. However, in my opinion, this is too much to present in the main text. Please consider moving some of the tables that are not essential for the main outcomes of this study to the supplementary. Table 4 could be deleted entirely as it does not contain additional information to Figure 4.
  AC: In the revised manuscript, we will shift the Tables 9 and 10 to the supplementary materials. We will delete Table 4.
  RC:
  
  ● Regarding the structure of the paper: For me the content of the conclusion chapter would be (the main) part of the discussion. Overall, the discussion part had become a bit short by being combined with the results. I recommend renaming the current conclusion chapter and writing a more common conclusion chapter. This will help the reader a lot to understand what you did. Also shorten the current conclusion chapter and do not present the results again.
  AC: To better communicate our main findings, we propose to replace the conclusion chapter with the following text.
  "Conclusions
  
  In this study, we have introduced a multi-objective calibration framework for estimating basin-specific optimal parameter sets for global hydrological models that can utilize observations of multiple model output variables as well as multiple signature of each variable. Applying this approach to the simulation of the Ganges and Brahmaputra basins by the global hydrological model WGHM, we analysed the impacts, benefits and challenges of multi-variable multi-signature sensitivity analysis and multi-variable calibration.
  The multi-variable multi-signature sensitivity analysis facilitated the identification of important parameters that would have remained unidentified if not all variables or signatures were considered. A separate sensitivity analysis has to be done for each spatial unit for which parameters are to be estimated. The proposed parameters selection method, which is based on selecting parameters based on relative impact compared to that of all model parameters, can be modified regarding selected thresholds, and some weighting regarding variables and signatures can be introduced depending on the modelling purpose.
  An increased number of parameters in calibration enhances the potential for model equifinality, a factor that must be considered when employing a multi-variable multi-signature sensitivity analysis. Although we achieved a reasonably good level of parameter identifiability in the multi-variable calibrations, our results do not provide evidence that using multiple observational variables increases parameter identifiability. Certain combinations of observations demonstrated improved parameter identifiability in calibrations, with variations observed between basins. Also, our study found that parameter identifiability is inversely related to the number of parameters selected for calibration.
  The inclusion of additional observational variables in the calibration consistently improved overall model performance that takes into account all observational variables. The value of Q and TWSA observations for the overall performance was higher than that of ET and SWSA observations. The extent of improvement depends on basin characteristics as well as the trade-offs and interactions among the objectives of the associated variables, which also depend on the capability of the model to simulate important hydrological processes in the basin. Streamflow observations were found to be essential for achieving accurate streamflow simulations, which are a primary target for most hydrological model applications.
  We used straightforward metrics to assess two sources of uncertainty in the calibration process, those arising from the search algorithm used to identify the non-dominated Pareto-optimal parameter sets and those stemming from observational errors. As the random seeds used in the BORG algorithm lead to non-negligible variations in the performance in particular for the unobserved variable (Table 7), a sufficient number of replications of the calibration runs are vital. Our analysis revealed that a large portion of variation of “optimal” parameter sets can be attributed to observational uncertainties, a factor often overlooked in calibration exercises. We demonstrated that in the presence of observational uncertainty, relying solely on a ‘best solution’ or compromise solution can become unreliable, leading to decreased overall efficiency. To address this challenge, we propose a method to select an ensemble of ‘acceptable’ solutions from the Pareto solutions derived by the search algorithm, taking into account uncertainties in the observation data used for calibration (section 4.2.4).
  The methodology presented in this study should be applied to calibrate GHMs for all large river basins of the globe where diverse model output variables can be observed. Additionally, it is imperative to explore how accounting for observation uncertainties can enhance the robustness of calibration outcomes. Developing uncertainty-based performance metrics would represent a significant advancement in this direction. In regions with limited data availability, leveraging remote sensing-based streamflow observations such as HydroSAT (http://hydrosat.gis.uni-stuttgart.de) or SWOT can provide new insights, complementing TWSA data from GRACE, GRACE-FO, and GRACE-C (GRACE-Continuity). Given the availability of numerous contemporary ET products, future calibration efforts should explore the benefits of considering these a number of ET data sources.
  "
  RC: Minor comments or examples:
  AC: We greatly value all the insightful comments and are committed to incorporating changes that will enhance the manuscript.
  RC:
  
  1 Introduction:
  
  ● You switch between the terms multi-variable, multi-signature, and multi-objective throughout the manuscript. Please clearly define them in the introduction and use the terms consistently throughout the manuscript. E.g., line 595: “multi-objective” and in the title of the manuscript: “multi-variable”. I assume the same is meant in both cases.
  AC: We used the terms ‘multi-variable’, ‘multi-signature’, and ‘multi-objective’ in their literal meanings in the manuscript. We utilized ‘multi-signature’ specifically in sensitivity analysis (SA), where the effects of parameters on multiple aspects of each variable were explored. The term ‘multi-objective’ is employed in the context of calibration exercises when more than one objective is utilized in the calibration process. Conversely, the term ‘multi-variable’ is applicable in both SA and calibration, as these analyses involve multiple variables. While ‘multi-objective calibration’ and ‘multi-variable calibration’ are not always synonymous—given that multiple objectives can be associated with a single variable, and multiple variables may contribute to a single composite objective—in our study, they are used interchangeably because each objective corresponds to a separate variable. To clarify the meanings of these terms, we will include the following paragraph in the revised manuscript before Line 173.
  “The terms ‘multi-objective’ and ‘multi-variable’ are not always interchangeable, as multiple objectives can stem from the same variable and multiple variables can contribute to a single composite objective. We use these terms contextually based on their literal meanings. Our multi-objective calibration analyses involve multiple objectives and multiple variables, with one objective corresponding to each variable. In the reminder text, we used both terms interchangeably. However, to highlight the involvement of multiple variables, we specifically used the term 'multi-variable'. A ‘signature’ of a data series consists of quantitative metrics or indices that describe its statistical or dynamic properties (McMillan, 2021). In this context, the term ‘multi-signature’ refers to a scenario where multiple quantitative properties of a data series are considered simultaneously.”
  In addition, for clarity, we will reformulate lines 173-179 as follows:
  
  “In this study, we present a comprehensive multi-objective calibration framework for estimating optimal basin-specific parameter values for a global hydrological model by taking into account observations of multiple model output variables. The framework consists of 1) an approach for selecting model parameters that is based on a global sensitivity analysis and considers multiple signatures of each variable and 2) a multi-objective parameter optimization that includes multiple variables. We apply the framework to WGHM and estimate, for the Ganges and the Brahmaputra basins of the Indian subcontinent, the most important model parameters using multi-variable multi-signature sensitivity analysis and multi-variable parameter optimization.”
  RC:
  
  ● Line 34: “T” not explained anywhere.
  AC: Line 34 will be corrected by replacing “T” with “TWSA”.
  RC:
  
  ● Lines 53-53: local or regional hydrological models
  AC: In Lines 52-53, by the statement “Even more than local to regional hydrological models, GHMs suffer from high predictive uncertainties ...”, we intend to compare all models that lie between local and regional scales to the global scale models. If, however, the expression is not clear in the statement, we may change it to “local or regional hydrological models”.
  RC:
  
  ● Lines 62-64: Abbreviations are placed inconsistently. Maybe do: For example, the Water - Global Assessment and Prognosis (WaterGAP) Global Hyrdological Model (WGHM, Müller...)..
  AC:
  
  To avoid inconsistencies in the abbreviation and full name of WGHM, we will write “the WaterGAP Global Hydrological Model (WGHM)” as used in the reference model description paper for WGHM by Müller Schmied et al. (2021) and will leave out the full form of WaterGAP in Lines 62-64.
  RC:
  
  ● Lines 70-75: Add a reference
  AC: We will refer to the study of Cheng et al. (2005) where they mentioned that “...with more parameters, it takes longer time to accomplish the optimization procedure. This may result in premature termination of the optimization process which will adversely affect the quality of the results.”
  RC:
  
  ● Lines 88-91: Sentence is a bit difficult to follow. Please rephrase.
  AC: The statement in Lines 88-91 will be written as follows:
  
  “The equifinality thesis proposed by Beven (1993) challenges the notion of a singular optimal model – whether in terms of structure, input, or parameters – particularly in the presence of multifaceted uncertainties. Instead, it suggests that there can be alternative models that exhibit comparable predictive capabilities while differing in their specific configurations.”
  RC:
  
  ● Lines: 138-144: Maybe mention earlier!?
  AC: We will move the Lines 138-144 up and merge them with the paragraph that starts at Line 117. The paragraph would read as follows: “The equifinality thesis proposed by Beven (1993) challenges …… specific configurations. In the context of multi-objective calibration, Efstratiadis and Koutsoyiannis ….”.
  RC:
  
  ● Line 175: First time using the term “signature”. Mention and explain it before.
  AC: In the revised manuscript, we will add the definition of ‘signature’ in introduction chapter. Please see our reply to the earlier comment namely “You switch between the terms multi-variable, multi-signature, and multi-objective throughout the manuscript. Please clearly define them in the introduction and use the terms consistently throughout the manuscript. E.g., line 595: “multi-objective” and in the title of the manuscript: “multi-variable”. I assume the same is meant in both cases.”
  RC:
  
  ● Lines 191-193: Delete.
  AC: Lines 191-193 will be deleted in the revised version of the manuscript.
  RC:
  
  2 Study area:
  
  ● Table 1: Over which period are the means calculated (e.g., mean summer temperatures)?
  AC: We will include a note in Table 1 indicating that the temperature means were estimated using data from 1969 to 2004.
  RC:
  
  ● How did you decide on the two basins? What are the differences between the two basins? What was the reasoning for not choosing two very different basins (regarding climate, geology, water abstractions etc.) to see the influence of these characteristics on the modelling scenarios?
  AC: The Ganges and Brahmaputra basins were selected for this study due to their significant geopolitical importance. Both basins are transboundary and characterized by very high population densities and substantial water demands. They are situated in a critical region where climate change poses a serious threat to water availability, with potentially severe impacts on human lives. Despite these challenges, these basins also exhibit numerous distinct features of scientific interest. The hydrological processes governing these basins differ substantially. The Brahmaputra is dominated by snowmelt, whereas the Ganges basin encompasses a wide range of climatic zones from arid to semi-arid to humid. Agricultural water use exerts the most significant influence on human-nature interactions in the basin. Although the impacts of various geomorphological and physiographic characteristics are intriguing, our study's limited scope prevents us from exploring these interactions further.
  RC: Highlight these differences or similarities between the basins also in the interpretation and comparison of the modeling results of the two basins. Why were different parameters selected between the different basins (Figure 4)? E.g., lines 693-696: Why do you think that is the case? Is there any explanation for that?
  AC: We observe different sensitivities in the two study basins due to their distinct dominant hydrological processes. For instance, the Brahmaputra basin, being snow-dominated in its upstream parts, exhibits sensitivity of most response variables to snow parameters, with all four snow parameters being selected as important for this basin. In contrast, snow parameters are not significant for the Ganges basin, as only a small fraction of the basin is affected by snow processes.
  Regarding the differences in the Pareto fronts between the basins (as mentioned in Lines 693-696), we were unable to provide a definitive explanation. Apart from differences in dominant hydrological processes, variations in the error structure of observations could also contribute to the differences observed in the shape of the Pareto fronts. We believe that an in-depth investigation is necessary to elucidate the potential causes underlying these interactions.
  RC:
  
  3 Data and methods
  
  ● Why not present available data in the chapter study area?
  AC: We intend to present the available data in a separate chapter due to the length of the text created by detailing the sources, processing, and the descriptions of the error information for each observation series. This approach will enhance the readability of both ‘study area’ and ‘data and methods’ chapters. Also, because of the importance of the different observables used for model calibration in this study, we think that a separate chapter on the data is justified. Moreover, as we partly use observables that are available at the global scale and not only at the basin scale, we did not consider subsuming the data description under the study site description.
  RC:
  
  ● Line 343: Title of chapter 3.2.5 Water balance closure is a bit confusing as it’s a subchapter of 3.3 observations. Water balance closure is not an observation. Maybe call it storage change (which is also not exactly an observation, but sill might fit better)?
  AC:
  
  The subchapter '3.2.5 Water balance closure' discusses an inherent imbalance in the water balance components of the observational data, pointing to lower quality observations and the presence of inconsistencies that could impact the calibration process. To enhance clarity, we will rename this subsection from '3.2.5 Water balance closure' to '3.2.5 Water balance closure of observations'. We need to stress that water balance closure as discussed here is not the storage change as derived from Precipitation-ET-runoff, but the imbalance of the components – Precipitation, ET, and storage change, all individually derived from observations.
  RC:
  
  ● Lines 436-448: Move them after line 465.
  AC: We will move Lines 436-448 to the end of Line 465 in a new paragraph.
  RC:
  
  ● Line 473: One comma too many
  AC: The comma in Line 474 will be deleted.
  RC:
  
  4 Results and discussion
  
  ● Lines 554-566: Explanation of parameters could also be in the method section when parameters are being presented or is this meant as a discussion?
  AC: The explanations of the parameters are given in Table-2 within the ‘data and methods’ chapter; however, the detail descriptions of these parameters and their physical (and/or hypothetical) meanings are not presented in the text. The readers will be directed to Müller Schmied et al. (2021) for further explanation of those parameters. In lines 554-566, we intent to discuss some of the results of our sensitivity analysis.
  RC:
  
  ● Line 583: Maybe add a short sentence why P-PM was added later or refer to the method section (lines 263-268). Why is EP-NM not added?
  AC: We will add the following statement to explain why P-PM was selected for calibration:
  
  “Nevertheless, P-PM was selected as an additional calibration parameter because precipitation forcing data, in contrast to radiation data, contain high uncertainties and biases which need to be corrected during model calibration, if possible. Recently, Goteti & Famiglietti (2024) pointed out the underestimation of precipitation in data sets of India that need to be corrected (here by P-PM) to avoid non-physical or process-based compensation by calibration of other parameters.”
  RC:
  
  ● Line 648: Livneh and Lettenmaier (2012)
  AC: We will correct Line 648, the statement will read “In their study, Livneh and Lettenmaier, (2012) concluded that …”.
  RC:
  
  ● Table 6 and 7: Are those the NSE values of the calibration? I am not sure if I got that correctly, but due to data scarcity you could not calculate the NSE for all variables for the validation period (only for Q and TWSA). Is that correct?
  AC: The Tables 6 and 7 report the mean and standard deviation of the performance metric NSE of 8 compromise solutions of each calibration experiment. We reran the WGHM model updating the parameters of the compromise solutions and computed the NSEs of all target variables to obtain the performance of those solutions across all variables. The NSE values remained unchanged for the variables that had been used as objectives in the calibration experiments. For enhance clarity, we intend to rename the title of Table 6 as follows:
  “Table 6: Mean and standard deviation of model performance indicator NSE for the compromise solutions (N = 8) of the calibration experiments in the Ganges river basin during calibration periods. The WGHM model was rerun using parameters from the compromise solutions to compute NSEs of all variables. The μ_{NSE, ALL} represents the mean NSE across all objectives over all eight compromise solutions per experiment. The highest NSE for each objective is highlighted using bold face, also the highlighted mean across objectives (μ_{NSE, ALL}) show the highest value in each group (2-objective, 3-objective, and 4-objective). The objective obtained in the standard calibration and in the uncalibrated model is also shown.”
  The title of Table 7 will remain unchanged.
  The NSE values in Tables 6 and 7 were calculated based on the monthly values during the calibration period. As you correctly pointed out, during validation, we were only able to compute the efficiency score for Q and TWSA, as no data were available for the other variables during the validation periods (2010-variable years).
  RC:
  
  ● Figure 6 and Figure 7 are a bit small.
  AC: We will increase the size of Figures 6 and 7.
  RC:
  
  ● The authors chose to have a combined results and discussion chapter. Sometimes an explanation as to why the results turned out the way they did was missing.
  AC: We agree that some of the results were not adequately discussed. We will provide the missing explanations of why some of the results turned out as they did in the revised manuscript.
  RC:
  
  ● Lines 864-865: From the following text of that paragraph, I still did not understand why the Ganges and Brahmaputra basins had different identifiability regarding their parameter comparison. Could you please explain that more clearly?
  AC: The parameter identifiability has a typical inverse association with the dimensionality of the search space. In the Ganges experiments, relatively higher parameter identifiability was observed due to the use of a smaller number of parameters in calibration. We investigated the relationship between parameter identifiability and the sensitivity of the response variables. Additionally, we examined evidence regarding the impact of multi-variable calibration on increasing parameter identifiability. To clarify our statement, we will rewrite the statement in Lines 864-865 as follows:
  “Due to the fewer parameters involved in the Ganges calibration experiments, better parameter identifiability is observed within the basin compared to experiments in the Brahmaputra basin. We investigated how individual observations influence parameter identifiability during calibration and explored the impact of sensitivity on parameter identifiability.”
  Related to this issue, we reformulate the Lines 985-899. Instead of “In the Brahmaputra basin, the identifiability of parameters tends to be lower than in the Ganges basin. Four parameters (P-PM, SN-MT, SN-TG, and SL-RC) are constrained well (i.e., they have low coverage of their a-priori range in the compromise solution sets) with the variable Q, two parameters (SN-MT and SL-MSM) by the ET variable, and two (SN-TG and SW-RRM) by the SWSA observations”, we will write “In the Brahmaputra basin, four parameters (P-PM, SN-MT, SN-TG, and SL-RC) are constrained well (i.e., they have low coverage of their a-priori range in the compromise solution sets) with the variable Q, two parameters (SN-MT and SL-MSM) by the ET variable, and two (SN-TG and SW-RRM) by the SWSA observations”.
  RC:
  
  ● Could you add an outlook at the end of this section considering the following points: What do you expect for other basins? Could this method be applied to other basins? What would be the challenges?
  AC: We will revise and rewrite our conclusion provided in one of our earlier responses. Additionally, we will include a concluding paragraph emphasizing that the methodologies employed in this study can be effectively applied to other basins and other global hydrological models provided that error information regarding the observations is available.
  RC:
  
  5 Conclusion
  
  ● Lines 1031-1032: repetitive
  AC: In the new conclusion chapter, this statement will be omitted.
  References
  Cheng, C. T., Wu, X. Y., & Chau, K. W. (2005). Multiple criteria rainfall–runoff model calibration using a parallel genetic algorithm in a cluster of computers / Calage multi-critères en modélisation pluie–débit par un algorithme génétique parallèle mis en œuvre par une grappe d’ordinateurs. Hydrological Sciences Journal, 50(6), 1087. https://doi.org/10.1623/hysj.2005.50.6.1069
  Goteti, G. and Famiglietti, J.: Extent of gross underestimation of precipitation in India, Hydrol. Earth Syst. Sci. Discuss. [preprint], https://doi.org/10.5194/hess-2024-18, in review, 2024.
  McMillan, H. K.: A review of hydrologic signatures and their applications, WIREs Water, 8, e1499, https://doi.org/https://doi.org/10.1002/wat2.1499, 2021.
  Müller Schmied, H., Cáceres, D., Eisner, S., Flörke, M., Herbert, C., Niemann, C., Peiris, T. A., Popat, E., Portmann, F. T., Reinecke, R., Schumacher, M., Shadkam, S., Telteu, C.-E., Trautmann, T., and Döll, P.: The global water resources and use model WaterGAP v2.2d: model description and evaluation, Geosci Model Dev, 14, 1037–1079, https://doi.org/10.5194/gmd-14-1037-2021, 2021.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2324-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (03 Jul 2024) by Elham R. Freund

AR by HM Mehedi Hasan on behalf of the Authors (22 Aug 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (02 Oct 2024) by Elham R. Freund

RR by Anonymous Referee #1 (03 Oct 2024)

RR by Anonymous Referee #2 (07 Oct 2024)

ED: Publish as is (20 Nov 2024) by Elham R. Freund

AR by HM Mehedi Hasan on behalf of the Authors (26 Nov 2024) Manuscript

Journal article(s) based on this preprint

30 Jan 2025

The benefits and trade-offs of multi-variable calibration of the WaterGAP global hydrological model (WGHM) in the Ganges and Brahmaputra basins

Howlader Mohammad Mehedi Hasan, Petra Döll, Seyed-Mohammad Hosseini-Moghari, Fabrice Papa, and Andreas Güntner

Hydrol. Earth Syst. Sci., 29, 567–596, https://doi.org/10.5194/hess-29-567-2025,https://doi.org/10.5194/hess-29-567-2025, 2025

Short summary

H. M. Mehedi Hasan, Petra Döll, Seyed-Mohammad Hosseini-Moghari, Fabrice Papa, and Andreas Güntner

Supplement

https://doi.org/10.5194/egusphere-2023-2324-supplement

H. M. Mehedi Hasan, Petra Döll, Seyed-Mohammad Hosseini-Moghari, Fabrice Papa, and Andreas Güntner

Viewed

Total article views: 706 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
505	150	51	706	42	32	31

HTML: 505
PDF: 150
XML: 51
Total: 706
Supplement: 42
BibTeX: 32
EndNote: 31

Views and downloads (calculated since 07 Nov 2023)

Month	HTML	PDF	XML	Total
Nov 2023	121	50	5	176
Dec 2023	26	10	6	42
Jan 2024	32	2	2	36
Feb 2024	70	21	6	97
Mar 2024	33	6	4	43
Apr 2024	29	12	7	48
May 2024	51	10	7	68
Jun 2024	39	12	4	55
Jul 2024	15	5	3	23
Aug 2024	16	4	6	26
Sep 2024	15	1	0	16
Oct 2024	11	3	0	14
Nov 2024	20	2	1	23
Dec 2024	13	5	0	18
Jan 2025	14	7	0	21

Cumulative views and downloads (calculated since 07 Nov 2023)

Month	HTML	PDF	XML	Total
Nov 2023	121	50	5	176
Dec 2023	26	10	6	42
Jan 2024	32	2	2	36
Feb 2024	70	21	6	97
Mar 2024	33	6	4	43
Apr 2024	29	12	7	48
May 2024	51	10	7	68
Jun 2024	39	12	4	55
Jul 2024	15	5	3	23
Aug 2024	16	4	6	26
Sep 2024	15	1	0	16
Oct 2024	11	3	0	14
Nov 2024	20	2	1	23
Dec 2024	13	5	0	18
Jan 2025	14	7	0	21

Viewed (geographical distribution)

Total article views: 676 (including HTML, PDF, and XML) Thereof 676 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 30 Jan 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (3357 KB)
Metadata XML

Short summary

We calibrate a global hydrological model using multiple observations to analyse the benefits and trade-offs of multi-variable calibration. We found such an approach to be very important for understanding the real-world system. However, some observations are very essential to the system, in particular streamflow. We also showed uncertainties in the calibration results, which is often useful for making informed decisions. We emphasis to consider observation uncertainty in model calibration.


Total:	0
HTML:	0
PDF:	0
XML:	0