the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
To Bucket or not to Bucket? Analyzing the performance and interpretability of hybrid hydrological models with dynamic parameterization
Abstract. Hydrological hybrid models have been proposed as an option to combine the enhanced performance of deep learning methods with the interpretability of process-based models. Among the various hybrid methods available, the dynamic parameterization of conceptual models using LSTM networks has shown high potential. We explored this method further to evaluate specifically if the flexibility given by the dynamic parameterization overwrites the physical interpretability of the process-based part. We conducted our study using a subset of CAMELS-GB dataset. First, we show that the hybrid model can reach state-of-the-art performance, fully comparable with a regional LSTM, and surpassing the performance of conceptual models in the same area. We then modified the conceptual model structure to assess if the dynamic parameterization can compensate for structural deficiencies of the model. Our results demonstrated the ability of the deep learning method to effectively compensate for deficiencies and implausible model structures in the hydrological models. This indicates that the hydrological model did not give a strong enough regularization to drop the hybrid model's performance. A model selection based purely on the performance to predict streamflow, for this type of hybrid model, is hence not advisable. However, this does not entail that such hybrid models cannot be used to gain a better understanding of a hydrological system by studying other hydrological fluxes and states than discharge. Comparisons with external data, as well as the internal functioning of the hybrid model, reiterate that if a well-tested model architecture is combined with a LSTM, the deep learning model can learn to operate the process-based model in a consistent matter. In conclusion, this study demonstrated that hybrid models, if set up cautiously, can combine the enhanced performance of deep learning methods while maintaining good interpretability in the process-based part.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(4290 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(4290 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1980', Anonymous Referee #1, 06 Oct 2023
General Comments
The authors introduce and analyse a hybrid hydrological model consisting of a conceptual hydrological model and a LSTM data-driven model to estimate time dependent model parameter dependent on the same inputs as used to drive the conceptual model. The intension is to keep the excellent performance of data driven approaches that have been demonstrated in recent years, but also to keep or improve the interpretability of such data driven approaches.
In general, I am in favour of an intensive analysis of such approaches, and think the manuscript is well suited for the readership of HESS, in continuation of a significant number of important papers in this area in the same journal.
It is in general well written and figures support the understanding and flow of the text! However, I have a number of major and minor comments/suggestion that I believe would improve the manuscript and should be addressed before final publication.
- The authors motivate they work by a paper of Feng et al. who propose a general framework of hybrid dPL modelling. They use the HBV model as a basis and estimate static and dynamically HBV parameters using Catchment parameters and meteorological input (as used do force HBV). This paper extends and slightly varies the this approach by analysing simple bucket based models as well as (what they call) NonSense model. Dynamic parameters are estimated with an LSTM DL. Research question 1 is “do conceptual models serve as a regionalization mechanism for thwe dynamic parameterization? I do think this is an important question (and I miss the reference of Frame et al, 2022 in this context), however, I believe it is not addressed in such a rigorous way as would be needed here. Conceptual models can range over a large range of complexity. Wha,t if we would just apply a simple equation relating Rainfall to runoff (Q = c(x,t) * P) and allow c to be estimated by a LSTM as suggested. This is the simplest model I can think of, and then I would systematically increase the complexity of the conceptual models.
(Frame, J. M., Kratzert, F., Klotz, D., Gauch, M., Shalev, G., Gilon, O., Qualls, L. M., Gupta, H. V., and Nearing, G. S.: Deep learning rainfall–runoff predictions of extreme events, Hydrol. Earth Syst. Sci., 26, 3377–3392, https://doi.org/10.5194/hess-26-3377-2022, 2022.)
In that procedure I would suggest to use a much wider set of catchments and characteristics in order to see under what physio-geographical properties and climate conditions (as has been used of plenty other previous application) to answer research question 1 in a more general way!
- Research question 2 addresses the physical interpretability of conceptual models and whether it is comprised by data driven dynamic parameterization. Fig. 8 shows some of the parameters for 2 catchments and how they vary in time. I am missing a few points that should be discussed: i) Are the variations of parameters du to structural imitations of the conceptual model component, or is it just needed because of averaging non-linear processes over spatial variable catchment characteristics, or is it compensating for biases in the ERA5 input data? Or all three? What do I learn from Fig. 8? Which weight is assigned to each individual input for driving the variation? ii) How does the methodology compare to “more classical/statistical approaches” such as state and time dependent parameter estimation techniques. iii) How does the methodology compare in philosophy and potential to approaches that have been introduced by e.g. Feigl et al. (2022), what do we learn here in this approach from mistakes?
(Feigl et al., 2022, Learning from mistakes-Assessing the performance and uncertainty in process-based models. Hydrological Processes 36). - Overall, I miss a kind of “surprise” concerning the analysis – could that be more emphazided.
Specific/technical Comments
The following minor comments/suggestions I would like to make:
- L9ff: The last part of the abstract is hard to understand/follow – I read it before the rest of text and did not know what is meant.
- L20: Reference needed.
- L136: how is ETp calculated (may one short sentence)
- L161: how you calculate the gradiants for if/then and iterative loops with state updates?
- L214: is 855 batches true hen you consider tat one data point consideres 180 previous days as input?
- L216: Why not optimizing the initial conditions?
- L232: this refers to one major comment – when is the model complex enough so that the LSTM is able to produce the full output space just by varying parameters!? Is this already possible with the structure I suggested). When an I see limitations/restictions?
- L265: what is the criterium for overfitting! Have you used ensembles of optimized networks to see how robust results are?
- 6: it is hard to see any differences, perhaps you can enlarge an interesting part of the time seies!
- L309: I would guess that ERA5-Land data are also computed and not observed quantities. So it is a model state intercomparison!
- L329: why looking at average values and not show the distribution?
- L385: what has this paper contributed to a better understanding in this context! Be specific!
- L402: What s new compared to Feng et al., what are different findings!
- L417: States (instead of variables!?
- L421: correlation is a very weak measure-of -goodness-of-fit especially when dealing with cyclic data/processes)
Overall, I feel, the manuscript has in general the potential to be a valuable contribution to HESS, however, questions and issues raised in the general comments would need to be addressed and discussed to a significant part before final acceptance.
-
AC1: 'Reply on RC1', Eduardo Acuna, 23 Oct 2023
Response to RC1: Comment of eguspere-2023-1980. Anonymous Referee #1. 06 Oct 2023
We want to thank the referee for the detailed evaluation of our paper. In this document we answer the questions, comments and suggestions given. We will address those comments individually. For clarity, the original comments posted by the referee are written in italic, while our answers are written in bold.
The authors introduce and analyse a hybrid hydrological model consisting of a conceptual hydrological model and a LSTM data-driven model to estimate time dependent model parameter dependent on the same inputs as used to drive the conceptual model. The intension is to keep the excellent performance of data driven approaches that have been demonstrated in recent years, but also to keep or improve the interpretability of such data driven approaches.
In general, I am in favour of an intensive analysis of such approaches, and think the manuscript is well suited for the readership of HESS, in continuation of a significant number of important papers in this area in the same journal.
It is in general well written and figures support the understanding and flow of the text! However, I have a number of major and minor comments/suggestion that I believe would improve the manuscript and should be addressed before final publication.
• The authors motivate they work by a paper of Feng et al. who propose a general framework of hybrid dPL modelling. They use the HBV model as a basis and estimate static and dynamically HBV parameters using Catchment parameters and meteorological input (as used do force HBV). This paper extends and slightly varies the this approach by analysing simple bucket based models as well as (what they call) NonSense model. Dynamic parameters are estimated with an LSTM DL.
We thank the referee for the well-structured summary of our paper until this point.
• Research question 1 is “do conceptual models serve as a regionalization mechanism for thwe dynamic parameterization?
We assume there was a typo in the word regionalization, as in line 61 of our original manuscript the research question was: “Do conceptual models serve as an efficient regularization mechanism…”. Therefore, we will answer the following comments/suggestions assuming the word regularization.
• I do think this is an important question (and I miss the reference of Frame et al, 2022 in this context), …
Frame et al (2022) evaluate the performance of deep learning methods for rainfall-runoff models in predicting extreme events. According to the authors: “The primary objective of this study is to test the hypothesis that data-driven models lose predictive accuracy in extreme events more than models based on process understanding.” To accomplish this objective, they compared the performance of a LSTM network, a mass conservative LSTM (MC-LSTM), a conceptual model (SAC-SMA) and a process-based model (NWM) for predicting extreme events. In their study they showed that the data-driven models were better than conceptual and process-based models at predicting peak flows under almost all conditions.
Most of their study is dedicated on answering their main objective, which is not directly related to our research. However, in the last paragraph of the conclusions the authors do discuss the differences between pure ML and physics informed ML. They argue that ‘’there is only one type of situation in which adding any type of constraint (physically based or otherwise) to a data-driven model can add value: if constraints help optimization”.
Given the relevance of this last paragraph, we will add this reference in a revised version of the manuscript. We will include the reference in the introduction. We thank the referee for pointing out this study.
• …however, I believe it is not addressed in such a rigorous way as would be needed here. Conceptual models can range over a large range of complexity. Wha,t if we would just apply a simple equation relating Rainfall to runoff (Q = c(x,t) * P) and allow c to be estimated by a LSTM as suggested. This is the simplest model I can think of, and then I would systematically increase the complexity of the conceptual models
The general idea of the hybrid models in our study is to test if we can reach the performance of data-driven methods while maintaining interpretability and access to untrained variables. A model Q = c(x,t) * P will very likely be able to reach a similar performance as a stand-alone LSTM, as the performance will be given by the data-driven part (coefficient c(x,t)). However, we will not be gaining any interpretability or access to untrained variables. Also, the stand-alone LSTM that we are using receives the precipitation as an input, and therefore has access to the precipitation to make the discharge prediction, therefore we argue that the case Q = c(x,t) * P is already being covered.
We evaluated in our study multiple conceptual structures: LSTM+Bucket, LSTM+NonSense and LSTM+SHM, which intended to cover a representative spectrum of conceptual models. The first case (LSTM+Bucket) removed most of the hydrological understanding we normally impose in our process-based model through its components (multiple buckets) and the fluxes between them. With the LSTM+Bucket model we only impose: mass conservation, the idea that some water may not reach the river (evapotranspiration is present) and the idea that the outflow is somehow proportional to the water content of the basin (Q = k*S). Even with this limited information, the LSTM+Bucket model was able to achieve similar performance as the other cases which indicated that the data-driven part can compensate for missing processes and flux interactions. The second case (LSTM+NonSense) allowed us to test that the data-driven part can even compensate for erroneous structure. Finally, the LSTM+SHM model covered the case where a well-structured conceptual model is given. This allowed us to evaluate the interpretability of our conceptual part and the access to untrained variables.
Therefore, we argue that we are evaluating in a rigorous way our research questions, as the spectrum of cases that help us achieve our objectives, is being covered. Testing other conceptual structures would be associated with the specific case of application and which untrained variables one is interested in recovering, however, this is not the main objective of study.
In that procedure I would suggest to use a much wider set of catchments and characteristics in order to see under what physio-geographical properties and climate conditions (as has been used of plenty other previous application) to answer research question 1 in a more general way!
About using CAMELS-GB:
Feng et al. (2022) conducted a study using a similar method in CAMELS-US. We wanted to test our method on a different dataset, which would increase the general testing conditions of studies involving hybrid models.
About using a subset of CAMELS-GB:
In our study we used 60 basins and 25 years of data per basin, which is not negligible to produce robust conclusions. As we explained in section 2.1, using a subset of the whole CAMELS-GB had different reasons.
First, we wanted to ensure a fair comparison between the models, on an even playing field. Therefore, we removed basins with high-anthropogenic impacts, as the process-based models did not consider these effects in their structure. We also considered the fact that we are using a daily resolution, so the basins should have a sufficient size such that the discharge variations can be resolved by daily data. Second, as shown in Figure 1 of the manuscript, the spatial location of the 60 basins covers most of the original range. So even if the overall range of hydroclimatic conditions in the (CAMELS-) UK may not be as wide as in the (CAMELS-) US, we made sure that it was fully represented by our test data set. Third our performance measurements aligned with the benchmark set by Lees et at. (2021) where he trained a data-driven method for the CAMELS-GB full dataset. Lastly, to have good baselines for our study we also calibrated the stand-alone conceptual models. We calibrated for each basin the SHM-only, Bucket-only and NonSense-only. During this process, to mitigate potential calibration biases that may favor our hybrid models, we calibrated each conceptual model with three different methods: SCE-UA, DREAM and gradient descent. Therefore, using a subset of 60 basins, we did 3(models) * 3(calibration methods) * 60(basins) = 540 model calibrations. Hence, using a subset of the whole CAMELS-GB dataset was important to maintain a reasonable computational cost.
• Research question 2 addresses the physical interpretability of conceptual models and whether it is comprised by data driven dynamic parameterization. Fig. 8 shows some of the parameters for 2 catchments and how they vary in time. I am missing a few points that should be discussed: i) Are the variations of parameters du to structural imitations of the conceptual model component, or is it just needed because of averaging non-linear processes over spatial variable catchment characteristics, or is it compensating for biases in the ERA5 input data? Or all three?
With the methodology we used in this study we were not trying to differentiate which deficiencies in the conceptual model our data-driven part was compensating for. However, the three possibilities that the referee suggested are very likely to be included.
For example, our experiment with the different conceptual structures indicates that the data-driven part can compensate for structural deficiencies and missing processes. Moreover, as we explained in line 288, in the LSTM+NonSense variation, the LSTM is reducing as much as possible the initial lag caused by the baseflow and interflow modules, which suggests the data-driven part can even “turn-off” parts of the conceptual model that are not useful.
The possibility that the data-driven part is compensating for the limitations of averaging non-linear processes was discussed in line 367, where we indicated that all our conceptual models are being operated in a lumped manner. Lumped models handle multiple uncertainties and subprocesses by a single parameter, which is indeed a limitation. Therefore, the LSTM can vary the parameters in time to compensate for this limitation and get a better performance. In a similar study, Feng et al (2022), partially covered this problem by using 16 conceptual models parameterized by a LSTM, to consider a semi-distributed version. Moreover, they showed two models, one with static and one with dynamic parameters. The fact that the dynamic parameterization got a better performance may indicate that even with a semi distributed model, there are some deficiencies in the model structure that the LSTM is still able to compensate for.
Lastly, our model can be compensating for biases in the input data (however, we use CAMELS-GB input data not ERA5). It is known that, due to their structure, data-driven models can compensate for biased input data, and there is no reason to suggest that our model is not doing this.
Therefore, the data-driven part is compensating for multiple limitations of the model. However, disentangling which particular limitation is being compensated for does not align nor affect our objective to evaluate if hybrid model maintain interpretability and provide access to untrained variables. With respect to the topic raised by the referee here, we therefore suggest keeping the manuscript as it is.
• What do I learn from Fig. 8? Which weight is assigned to each individual input for driving the variation?
The LSTM processes the sequence of input variables using a series of gates (forget, input and output). Through weights, biases, and context dependent gates the network encodes the information in hidden and cell states to get an output. However, because of how the information is used there is not a one-to-one assignation of how much each input contributes to each output. Moreover, our study focuses on the interpretability remaining in the conceptual model structure and not on the internal functioning of the data-driven part. Figure 8 allowed us to analyze the time variation of the parameters and link these variations to our hydrological knowledge.
ii) How does the methodology compare to “more classical/statistical approaches” such as state and time dependent parameter estimation techniques.
Lan et al. (2020) indicate that the most common approach for dynamic parameterization of hydrological models is the calibration for different subperiods. He supports this statement by referencing over 20 studies on this subject published in the last 15 years. According to the authors, this method divides the data into subperiods, considering seasonal characteristics or clustering approaches, and proposes a set of parameters for each subperiod. The idea is to capture the temporal variations of the catchment characteristics.
Our dynamic parameterization technique is also intended to capture the temporal variation of the catchment characteristics. Specifically, we use a recurrent neural network that analyzes a given sequence length, so the proposed parameters are context informed, and reflect the current state of the catchment.
Therefore, in philosophy our technique is similar to “more classical approaches”. The main difference is that our dynamic parameterization is much more flexible, as a custom parameterization can be proposed for each prediction, and it is not constrained to a typical small set of predefined subperiods. Also, one can include as input of the LSTM any information that is considered useful to make an informed parameter inference, even if this is not used later in the conceptual part of the model.
We thank the referee for this question. We will include this information in a revised version of the manuscript.
iii) How does the methodology compare in philosophy and potential to approaches that have been introduced by e.g. Feigl et al. (2022), what do we learn here in this approach from mistakes?
The study by Feigl et al. (2022) proposes a technique where they use a machine learning technique to map the residuals of a conceptual model to deficiencies in model structure. This idea is quite interesting, but it does differ in philosophy and potential to the method we propose.
Feigl´s method is based on the hypothesis that the residuals are caused, in part, because of deficiencies in the model structure. He then uses a ML algorithm to associate this residual to a specific limitation and modifies the structure of the process-based model according to this. Therefore, the data-driven part is used to analyze the deficiencies, and based on the results of those analysis the structure of the conceptual model is modified.
In our case, the dynamic parameterization provided by the data-driven part also showed the capability to compensate for deficiencies in the model structure. However, this compensation is done directly through the dynamic parameterization, and there are not intermediate steps to analyze the residuals and map those residuals to changes in the process-based part.
Therefore, even though both methods are using a data-driven method to increase the performance of a process-based model, the idea of how this is done is quite different. We thank the referee for pointing out this reference. We will include this information in a revised version of the manuscript.• Overall, I miss a kind of “surprise” concerning the analysis – could that be more emphazided
Even though the dynamic parameterization of conceptual models had been applied before by Kraft et al. (2022) and Feng et al. (2022), our study does presents novelty:
We applied the hybrid model approach on CAMELS-GB, which to the best of our knowledge had not been done before. With this we increased the application range of the models, which contributed to testing the robustness of the approach.To the best of our knowledge, this is also the first time that the capability of LSTM to compensate for structural deficiencies in the process-based model has been tested. The LSTM+Bucket and LSTM+NonSense model allowed us to prove that the hyper flexibility of the data-driven method can overwrite the physical regularization given by the conceptual part. However, we also tested that if a meaningful conceptual model structure is given, physical interpretability can be maintained, which is consistent with previous studies.
Overall, we argue that there is novelty in the study and that the conclusions we draw from our analysis are consistent with the results.
Specific/technical Comments
The following minor comments/suggestions I would like to make:
• L9ff: The last part of the abstract is hard to understand/follow – I read it before the rest of text and did not know what is meant.
We will modify the abstract in a revised version of the manuscript.
• L20: Reference needed.
We will add a reference in a revised version of the manuscript.
• L136: how is ETp calculated (may one short sentence)
ETp is read directly from CAMELS-GB, so we did not calculate it. According to Coxon et al. (2020) ETp was calculated using the Penman–Monteith equation. We will add this information in a revised version of the manuscript.
• L161: how you calculate the gradiants for if/then and iterative loops with state updates?
The gradients are calculated using Automatic Differentiation, which is already implemented in PyTorch. This technique does not have a problem with if/then statements as the derivative is calculated depending on the path the if/then statement takes. It is also not a problem to include loops.
• L214: is 855 batches true hen you consider tat one data point consideres 180 previous days as input?
855 is the number of batches (each batch has 256 elements), while 180 is the sequence length. Therefore, to make a prediction, the LSTM considers the information of the last 180 days. However, this is independent of the number of batches used.
• L216: Why not optimizing the initial conditions?
We are using a warmup period of one year to stabilize the internal states of the conceptual model, therefore optimizing the initial conditions is not necessary.
• L232: this refers to one major comment – when is the model complex enough so that the LSTM is able to produce the full output space just by varying parameters!? Is this already possible with the structure I suggested). When an I see limitations/restictions?
The LSTM+Bucket is the simplest hybrid structure we can think of that still includes some hydrological concepts in the regularization. In the paper we showed that this model already reaches state of the art performance. However, this hybrid model behaves as a LSTM variant and the bucket regularization does not give us any extra information about our hydrological system. Therefore, the simplest model we can think of already produces a full output space, because of the flexibility of the LSTM.
• L265: what is the criterium for overfitting! Have you used ensembles of optimized networks to see how robust results are?
We are using dropout and tracking the validation loss during training to avoid overfitting. And yes, we have used ensembles, and the models are quite robust. We presented those results at EGU-2023. However, as this was not directly aligned with the main objective of our paper, we did not include this information.
• 6: it is hard to see any differences, perhaps you can enlarge an interesting part of the time seies!
The idea of Figure 6 was exactly this. To show that regardless of the regularization we used, there are some basins in which the simulated time series are almost identical. In a revised version of the manuscript, we can reduce the time series period we show, however this would not change the concept of what we are showing.
• L309: I would guess that ERA5-Land data are also computed and not observed quantities. So it is a model state intercomparison!
Yes, ERA5-Land data is also computed. We justified why we used this type of data in section 2.2. In a revised version of the manuscript, we can include the model state intercomparison term.
• L329: why looking at average values and not show the distribution?
We used the average as we were trying to give a general similarity metric of the models along all the testing basins. However, in a revised version of the manuscript we can add other metrics.
• L385: what has this paper contributed to a better understanding in this context! Be specific!
In line 385 we were doing a recap of previous studies and what motivated our research, but we were not referring to our research yet. From line 392 forward is where we stated the specific conclusions of our paper. There we described in detail the process we followed to answer our two research questions, and the results we obtained.
• L402: What s new compared to Feng et al., what are different findings!
In line 402 we indicated that our hybrid model achieved similar performance as stand-alone LSTMs and outperformed the conceptual models, and that these findings align with existing literature, including Feng et al. (2022). This paragraph is intended to compare part of our results with existing studies, which we argue is always a good practice to make, because it increases the robustness of the methods.
Nevertheless, these are not the only findings we summarized in the conclusions. From line 404 on, we described the specific findings of our study and the answers to the research questions, which to the best of our knowledge had not been studied before.
• L417: States (instead of variables!?
In a revised version of the manuscript, we will use the word states (or state variables) instead of variables.
• L421: correlation is a very weak measure-of -goodness-of-fit especially when dealing with cyclic data/processes)
We use the correlation metric to compare the dynamics of the unsaturated zone against ERA5-Land data. For this test, following the process proposed by Ehret et al (2020), we normalized the data before comparison (map the values to a 0-1 range), as we are interested in comparing the dynamics of the series and not the specific values. Therefore, given the purpose of the test (comparing the dynamic of the normalized series) we think the correlation coefficient can give us the information we need. Are there other specific metrics that are better suited for this case?
Overall, I feel, the manuscript has in general the potential to be a valuable contribution to HESS, however, questions and issues raised in the general comments would need to be addressed and discussed to a significant part before final acceptance.
We thank the referee for the overall positive evaluation of our manuscript and hope we could adress the questions raised in a satisfactory manner.
References
• Coxon, G., Addor, N., Bloomfield, J. P., Freer, J., Fry, M., Hannaford, J., Howden, N. J. K., Lane, R., Lewis, M., Robinson, E. L., Wagener, T., and Woods, R.: CAMELS-GB: hydrometeorological time series and landscape attributes for 671 catchments in Great Britain, Earth System Science Data, 12, 2459–2483, https://doi.org/10.5194/essd-12-2459-2020, 2020.
• Ehret, U., van Pruijssen, R., Bortoli, M., Loritz, R., Azmi, E., and Zehe, E.: Adaptive clustering: reducing the computational costs ofdistributed (hydrological) modelling by exploiting time-variable similarity among model elements, Hydrology and Earth System Sciences, 24, 4389–4411, https://doi.org/10.5194/hess-24-4389-2020, 2020.
• Frame, J., Kratzert, F., Klotz, D., Gauch, M., Shelev, G., Gilon, O., ... & Nearing, G. S. (2021). Deep learning rainfall-runoff predictions of extreme events. Hydrology and Earth System Sciences Discussions, 2021, 1-20.
• Feigl, M., Roesky, B., Herrnegger, M., Schulz, K., & Hayashi, M. (2022). Learning from mistakes—Assessing the performance and uncertainty in process‐based models. Hydrological Processes, 36(2), e14515.
• Feng, D., Liu, J., Lawson, K., and Shen, C.: Differentiable, Learnable, Regionalized Process-Based Models With Multiphysical Outputs can Approach State-Of-The-Art Hydrologic Prediction Accuracy, Water Resources Research, 58, e2022WR032 404, https://doi.org/https://doi.org/10.1029/2022WR032404, e2022WR032404 2022WR032404, 2022.
• Kraft, B., Jung, M., Körner, M., Koirala, S., and Reichstein, M.: Towards hybrid modeling of the global hydrological cycle, Hydrology and Earth System Sciences, 26, 1579–1614, https://doi.org/10.5194/hess-26-1579-2022, 2022.
• Lan, T., Lin, K., Xu, C. Y., Tan, X., & Chen, X. (2020). Dynamics of hydrological-model parameters: mechanisms, problems and solutions. Hydrology and Earth System Sciences, 24(3), 1347-1366.
• Lees, T., Buechel, M., Anderson, B., Slater, L., Reece, S., Coxon, G., and Dadson, S. J.: Benchmarking data-driven rainfall-runoff models in Great Britain: a comparison of long short-term memory (LSTM)-based models with four lumped conceptual models, Hydrology and Earth System Sciences, 25, 5517–5534, https://doi.org/10.5194/hess-25-5517-2021, 2021.Citation: https://doi.org/10.5194/egusphere-2023-1980-AC1
- The authors motivate they work by a paper of Feng et al. who propose a general framework of hybrid dPL modelling. They use the HBV model as a basis and estimate static and dynamically HBV parameters using Catchment parameters and meteorological input (as used do force HBV). This paper extends and slightly varies the this approach by analysing simple bucket based models as well as (what they call) NonSense model. Dynamic parameters are estimated with an LSTM DL. Research question 1 is “do conceptual models serve as a regionalization mechanism for thwe dynamic parameterization? I do think this is an important question (and I miss the reference of Frame et al, 2022 in this context), however, I believe it is not addressed in such a rigorous way as would be needed here. Conceptual models can range over a large range of complexity. Wha,t if we would just apply a simple equation relating Rainfall to runoff (Q = c(x,t) * P) and allow c to be estimated by a LSTM as suggested. This is the simplest model I can think of, and then I would systematically increase the complexity of the conceptual models.
-
RC2: 'Comment on egusphere-2023-1980', Grey Nearing, 13 Nov 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1980/egusphere-2023-1980-RC2-supplement.pdf
-
AC2: 'Reply on RC2', Eduardo Acuna, 04 Dec 2023
We want to thank Grey Nearing for the detailed evaluation of our paper. We attach the responses to his questions/comments in a PDF file. We believe that the changes proposed here will increase the quality of the manuscript and hope we addressed the questions raised satisfactorily.
-
AC2: 'Reply on RC2', Eduardo Acuna, 04 Dec 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1980', Anonymous Referee #1, 06 Oct 2023
General Comments
The authors introduce and analyse a hybrid hydrological model consisting of a conceptual hydrological model and a LSTM data-driven model to estimate time dependent model parameter dependent on the same inputs as used to drive the conceptual model. The intension is to keep the excellent performance of data driven approaches that have been demonstrated in recent years, but also to keep or improve the interpretability of such data driven approaches.
In general, I am in favour of an intensive analysis of such approaches, and think the manuscript is well suited for the readership of HESS, in continuation of a significant number of important papers in this area in the same journal.
It is in general well written and figures support the understanding and flow of the text! However, I have a number of major and minor comments/suggestion that I believe would improve the manuscript and should be addressed before final publication.
- The authors motivate they work by a paper of Feng et al. who propose a general framework of hybrid dPL modelling. They use the HBV model as a basis and estimate static and dynamically HBV parameters using Catchment parameters and meteorological input (as used do force HBV). This paper extends and slightly varies the this approach by analysing simple bucket based models as well as (what they call) NonSense model. Dynamic parameters are estimated with an LSTM DL. Research question 1 is “do conceptual models serve as a regionalization mechanism for thwe dynamic parameterization? I do think this is an important question (and I miss the reference of Frame et al, 2022 in this context), however, I believe it is not addressed in such a rigorous way as would be needed here. Conceptual models can range over a large range of complexity. Wha,t if we would just apply a simple equation relating Rainfall to runoff (Q = c(x,t) * P) and allow c to be estimated by a LSTM as suggested. This is the simplest model I can think of, and then I would systematically increase the complexity of the conceptual models.
(Frame, J. M., Kratzert, F., Klotz, D., Gauch, M., Shalev, G., Gilon, O., Qualls, L. M., Gupta, H. V., and Nearing, G. S.: Deep learning rainfall–runoff predictions of extreme events, Hydrol. Earth Syst. Sci., 26, 3377–3392, https://doi.org/10.5194/hess-26-3377-2022, 2022.)
In that procedure I would suggest to use a much wider set of catchments and characteristics in order to see under what physio-geographical properties and climate conditions (as has been used of plenty other previous application) to answer research question 1 in a more general way!
- Research question 2 addresses the physical interpretability of conceptual models and whether it is comprised by data driven dynamic parameterization. Fig. 8 shows some of the parameters for 2 catchments and how they vary in time. I am missing a few points that should be discussed: i) Are the variations of parameters du to structural imitations of the conceptual model component, or is it just needed because of averaging non-linear processes over spatial variable catchment characteristics, or is it compensating for biases in the ERA5 input data? Or all three? What do I learn from Fig. 8? Which weight is assigned to each individual input for driving the variation? ii) How does the methodology compare to “more classical/statistical approaches” such as state and time dependent parameter estimation techniques. iii) How does the methodology compare in philosophy and potential to approaches that have been introduced by e.g. Feigl et al. (2022), what do we learn here in this approach from mistakes?
(Feigl et al., 2022, Learning from mistakes-Assessing the performance and uncertainty in process-based models. Hydrological Processes 36). - Overall, I miss a kind of “surprise” concerning the analysis – could that be more emphazided.
Specific/technical Comments
The following minor comments/suggestions I would like to make:
- L9ff: The last part of the abstract is hard to understand/follow – I read it before the rest of text and did not know what is meant.
- L20: Reference needed.
- L136: how is ETp calculated (may one short sentence)
- L161: how you calculate the gradiants for if/then and iterative loops with state updates?
- L214: is 855 batches true hen you consider tat one data point consideres 180 previous days as input?
- L216: Why not optimizing the initial conditions?
- L232: this refers to one major comment – when is the model complex enough so that the LSTM is able to produce the full output space just by varying parameters!? Is this already possible with the structure I suggested). When an I see limitations/restictions?
- L265: what is the criterium for overfitting! Have you used ensembles of optimized networks to see how robust results are?
- 6: it is hard to see any differences, perhaps you can enlarge an interesting part of the time seies!
- L309: I would guess that ERA5-Land data are also computed and not observed quantities. So it is a model state intercomparison!
- L329: why looking at average values and not show the distribution?
- L385: what has this paper contributed to a better understanding in this context! Be specific!
- L402: What s new compared to Feng et al., what are different findings!
- L417: States (instead of variables!?
- L421: correlation is a very weak measure-of -goodness-of-fit especially when dealing with cyclic data/processes)
Overall, I feel, the manuscript has in general the potential to be a valuable contribution to HESS, however, questions and issues raised in the general comments would need to be addressed and discussed to a significant part before final acceptance.
-
AC1: 'Reply on RC1', Eduardo Acuna, 23 Oct 2023
Response to RC1: Comment of eguspere-2023-1980. Anonymous Referee #1. 06 Oct 2023
We want to thank the referee for the detailed evaluation of our paper. In this document we answer the questions, comments and suggestions given. We will address those comments individually. For clarity, the original comments posted by the referee are written in italic, while our answers are written in bold.
The authors introduce and analyse a hybrid hydrological model consisting of a conceptual hydrological model and a LSTM data-driven model to estimate time dependent model parameter dependent on the same inputs as used to drive the conceptual model. The intension is to keep the excellent performance of data driven approaches that have been demonstrated in recent years, but also to keep or improve the interpretability of such data driven approaches.
In general, I am in favour of an intensive analysis of such approaches, and think the manuscript is well suited for the readership of HESS, in continuation of a significant number of important papers in this area in the same journal.
It is in general well written and figures support the understanding and flow of the text! However, I have a number of major and minor comments/suggestion that I believe would improve the manuscript and should be addressed before final publication.
• The authors motivate they work by a paper of Feng et al. who propose a general framework of hybrid dPL modelling. They use the HBV model as a basis and estimate static and dynamically HBV parameters using Catchment parameters and meteorological input (as used do force HBV). This paper extends and slightly varies the this approach by analysing simple bucket based models as well as (what they call) NonSense model. Dynamic parameters are estimated with an LSTM DL.
We thank the referee for the well-structured summary of our paper until this point.
• Research question 1 is “do conceptual models serve as a regionalization mechanism for thwe dynamic parameterization?
We assume there was a typo in the word regionalization, as in line 61 of our original manuscript the research question was: “Do conceptual models serve as an efficient regularization mechanism…”. Therefore, we will answer the following comments/suggestions assuming the word regularization.
• I do think this is an important question (and I miss the reference of Frame et al, 2022 in this context), …
Frame et al (2022) evaluate the performance of deep learning methods for rainfall-runoff models in predicting extreme events. According to the authors: “The primary objective of this study is to test the hypothesis that data-driven models lose predictive accuracy in extreme events more than models based on process understanding.” To accomplish this objective, they compared the performance of a LSTM network, a mass conservative LSTM (MC-LSTM), a conceptual model (SAC-SMA) and a process-based model (NWM) for predicting extreme events. In their study they showed that the data-driven models were better than conceptual and process-based models at predicting peak flows under almost all conditions.
Most of their study is dedicated on answering their main objective, which is not directly related to our research. However, in the last paragraph of the conclusions the authors do discuss the differences between pure ML and physics informed ML. They argue that ‘’there is only one type of situation in which adding any type of constraint (physically based or otherwise) to a data-driven model can add value: if constraints help optimization”.
Given the relevance of this last paragraph, we will add this reference in a revised version of the manuscript. We will include the reference in the introduction. We thank the referee for pointing out this study.
• …however, I believe it is not addressed in such a rigorous way as would be needed here. Conceptual models can range over a large range of complexity. Wha,t if we would just apply a simple equation relating Rainfall to runoff (Q = c(x,t) * P) and allow c to be estimated by a LSTM as suggested. This is the simplest model I can think of, and then I would systematically increase the complexity of the conceptual models
The general idea of the hybrid models in our study is to test if we can reach the performance of data-driven methods while maintaining interpretability and access to untrained variables. A model Q = c(x,t) * P will very likely be able to reach a similar performance as a stand-alone LSTM, as the performance will be given by the data-driven part (coefficient c(x,t)). However, we will not be gaining any interpretability or access to untrained variables. Also, the stand-alone LSTM that we are using receives the precipitation as an input, and therefore has access to the precipitation to make the discharge prediction, therefore we argue that the case Q = c(x,t) * P is already being covered.
We evaluated in our study multiple conceptual structures: LSTM+Bucket, LSTM+NonSense and LSTM+SHM, which intended to cover a representative spectrum of conceptual models. The first case (LSTM+Bucket) removed most of the hydrological understanding we normally impose in our process-based model through its components (multiple buckets) and the fluxes between them. With the LSTM+Bucket model we only impose: mass conservation, the idea that some water may not reach the river (evapotranspiration is present) and the idea that the outflow is somehow proportional to the water content of the basin (Q = k*S). Even with this limited information, the LSTM+Bucket model was able to achieve similar performance as the other cases which indicated that the data-driven part can compensate for missing processes and flux interactions. The second case (LSTM+NonSense) allowed us to test that the data-driven part can even compensate for erroneous structure. Finally, the LSTM+SHM model covered the case where a well-structured conceptual model is given. This allowed us to evaluate the interpretability of our conceptual part and the access to untrained variables.
Therefore, we argue that we are evaluating in a rigorous way our research questions, as the spectrum of cases that help us achieve our objectives, is being covered. Testing other conceptual structures would be associated with the specific case of application and which untrained variables one is interested in recovering, however, this is not the main objective of study.
In that procedure I would suggest to use a much wider set of catchments and characteristics in order to see under what physio-geographical properties and climate conditions (as has been used of plenty other previous application) to answer research question 1 in a more general way!
About using CAMELS-GB:
Feng et al. (2022) conducted a study using a similar method in CAMELS-US. We wanted to test our method on a different dataset, which would increase the general testing conditions of studies involving hybrid models.
About using a subset of CAMELS-GB:
In our study we used 60 basins and 25 years of data per basin, which is not negligible to produce robust conclusions. As we explained in section 2.1, using a subset of the whole CAMELS-GB had different reasons.
First, we wanted to ensure a fair comparison between the models, on an even playing field. Therefore, we removed basins with high-anthropogenic impacts, as the process-based models did not consider these effects in their structure. We also considered the fact that we are using a daily resolution, so the basins should have a sufficient size such that the discharge variations can be resolved by daily data. Second, as shown in Figure 1 of the manuscript, the spatial location of the 60 basins covers most of the original range. So even if the overall range of hydroclimatic conditions in the (CAMELS-) UK may not be as wide as in the (CAMELS-) US, we made sure that it was fully represented by our test data set. Third our performance measurements aligned with the benchmark set by Lees et at. (2021) where he trained a data-driven method for the CAMELS-GB full dataset. Lastly, to have good baselines for our study we also calibrated the stand-alone conceptual models. We calibrated for each basin the SHM-only, Bucket-only and NonSense-only. During this process, to mitigate potential calibration biases that may favor our hybrid models, we calibrated each conceptual model with three different methods: SCE-UA, DREAM and gradient descent. Therefore, using a subset of 60 basins, we did 3(models) * 3(calibration methods) * 60(basins) = 540 model calibrations. Hence, using a subset of the whole CAMELS-GB dataset was important to maintain a reasonable computational cost.
• Research question 2 addresses the physical interpretability of conceptual models and whether it is comprised by data driven dynamic parameterization. Fig. 8 shows some of the parameters for 2 catchments and how they vary in time. I am missing a few points that should be discussed: i) Are the variations of parameters du to structural imitations of the conceptual model component, or is it just needed because of averaging non-linear processes over spatial variable catchment characteristics, or is it compensating for biases in the ERA5 input data? Or all three?
With the methodology we used in this study we were not trying to differentiate which deficiencies in the conceptual model our data-driven part was compensating for. However, the three possibilities that the referee suggested are very likely to be included.
For example, our experiment with the different conceptual structures indicates that the data-driven part can compensate for structural deficiencies and missing processes. Moreover, as we explained in line 288, in the LSTM+NonSense variation, the LSTM is reducing as much as possible the initial lag caused by the baseflow and interflow modules, which suggests the data-driven part can even “turn-off” parts of the conceptual model that are not useful.
The possibility that the data-driven part is compensating for the limitations of averaging non-linear processes was discussed in line 367, where we indicated that all our conceptual models are being operated in a lumped manner. Lumped models handle multiple uncertainties and subprocesses by a single parameter, which is indeed a limitation. Therefore, the LSTM can vary the parameters in time to compensate for this limitation and get a better performance. In a similar study, Feng et al (2022), partially covered this problem by using 16 conceptual models parameterized by a LSTM, to consider a semi-distributed version. Moreover, they showed two models, one with static and one with dynamic parameters. The fact that the dynamic parameterization got a better performance may indicate that even with a semi distributed model, there are some deficiencies in the model structure that the LSTM is still able to compensate for.
Lastly, our model can be compensating for biases in the input data (however, we use CAMELS-GB input data not ERA5). It is known that, due to their structure, data-driven models can compensate for biased input data, and there is no reason to suggest that our model is not doing this.
Therefore, the data-driven part is compensating for multiple limitations of the model. However, disentangling which particular limitation is being compensated for does not align nor affect our objective to evaluate if hybrid model maintain interpretability and provide access to untrained variables. With respect to the topic raised by the referee here, we therefore suggest keeping the manuscript as it is.
• What do I learn from Fig. 8? Which weight is assigned to each individual input for driving the variation?
The LSTM processes the sequence of input variables using a series of gates (forget, input and output). Through weights, biases, and context dependent gates the network encodes the information in hidden and cell states to get an output. However, because of how the information is used there is not a one-to-one assignation of how much each input contributes to each output. Moreover, our study focuses on the interpretability remaining in the conceptual model structure and not on the internal functioning of the data-driven part. Figure 8 allowed us to analyze the time variation of the parameters and link these variations to our hydrological knowledge.
ii) How does the methodology compare to “more classical/statistical approaches” such as state and time dependent parameter estimation techniques.
Lan et al. (2020) indicate that the most common approach for dynamic parameterization of hydrological models is the calibration for different subperiods. He supports this statement by referencing over 20 studies on this subject published in the last 15 years. According to the authors, this method divides the data into subperiods, considering seasonal characteristics or clustering approaches, and proposes a set of parameters for each subperiod. The idea is to capture the temporal variations of the catchment characteristics.
Our dynamic parameterization technique is also intended to capture the temporal variation of the catchment characteristics. Specifically, we use a recurrent neural network that analyzes a given sequence length, so the proposed parameters are context informed, and reflect the current state of the catchment.
Therefore, in philosophy our technique is similar to “more classical approaches”. The main difference is that our dynamic parameterization is much more flexible, as a custom parameterization can be proposed for each prediction, and it is not constrained to a typical small set of predefined subperiods. Also, one can include as input of the LSTM any information that is considered useful to make an informed parameter inference, even if this is not used later in the conceptual part of the model.
We thank the referee for this question. We will include this information in a revised version of the manuscript.
iii) How does the methodology compare in philosophy and potential to approaches that have been introduced by e.g. Feigl et al. (2022), what do we learn here in this approach from mistakes?
The study by Feigl et al. (2022) proposes a technique where they use a machine learning technique to map the residuals of a conceptual model to deficiencies in model structure. This idea is quite interesting, but it does differ in philosophy and potential to the method we propose.
Feigl´s method is based on the hypothesis that the residuals are caused, in part, because of deficiencies in the model structure. He then uses a ML algorithm to associate this residual to a specific limitation and modifies the structure of the process-based model according to this. Therefore, the data-driven part is used to analyze the deficiencies, and based on the results of those analysis the structure of the conceptual model is modified.
In our case, the dynamic parameterization provided by the data-driven part also showed the capability to compensate for deficiencies in the model structure. However, this compensation is done directly through the dynamic parameterization, and there are not intermediate steps to analyze the residuals and map those residuals to changes in the process-based part.
Therefore, even though both methods are using a data-driven method to increase the performance of a process-based model, the idea of how this is done is quite different. We thank the referee for pointing out this reference. We will include this information in a revised version of the manuscript.• Overall, I miss a kind of “surprise” concerning the analysis – could that be more emphazided
Even though the dynamic parameterization of conceptual models had been applied before by Kraft et al. (2022) and Feng et al. (2022), our study does presents novelty:
We applied the hybrid model approach on CAMELS-GB, which to the best of our knowledge had not been done before. With this we increased the application range of the models, which contributed to testing the robustness of the approach.To the best of our knowledge, this is also the first time that the capability of LSTM to compensate for structural deficiencies in the process-based model has been tested. The LSTM+Bucket and LSTM+NonSense model allowed us to prove that the hyper flexibility of the data-driven method can overwrite the physical regularization given by the conceptual part. However, we also tested that if a meaningful conceptual model structure is given, physical interpretability can be maintained, which is consistent with previous studies.
Overall, we argue that there is novelty in the study and that the conclusions we draw from our analysis are consistent with the results.
Specific/technical Comments
The following minor comments/suggestions I would like to make:
• L9ff: The last part of the abstract is hard to understand/follow – I read it before the rest of text and did not know what is meant.
We will modify the abstract in a revised version of the manuscript.
• L20: Reference needed.
We will add a reference in a revised version of the manuscript.
• L136: how is ETp calculated (may one short sentence)
ETp is read directly from CAMELS-GB, so we did not calculate it. According to Coxon et al. (2020) ETp was calculated using the Penman–Monteith equation. We will add this information in a revised version of the manuscript.
• L161: how you calculate the gradiants for if/then and iterative loops with state updates?
The gradients are calculated using Automatic Differentiation, which is already implemented in PyTorch. This technique does not have a problem with if/then statements as the derivative is calculated depending on the path the if/then statement takes. It is also not a problem to include loops.
• L214: is 855 batches true hen you consider tat one data point consideres 180 previous days as input?
855 is the number of batches (each batch has 256 elements), while 180 is the sequence length. Therefore, to make a prediction, the LSTM considers the information of the last 180 days. However, this is independent of the number of batches used.
• L216: Why not optimizing the initial conditions?
We are using a warmup period of one year to stabilize the internal states of the conceptual model, therefore optimizing the initial conditions is not necessary.
• L232: this refers to one major comment – when is the model complex enough so that the LSTM is able to produce the full output space just by varying parameters!? Is this already possible with the structure I suggested). When an I see limitations/restictions?
The LSTM+Bucket is the simplest hybrid structure we can think of that still includes some hydrological concepts in the regularization. In the paper we showed that this model already reaches state of the art performance. However, this hybrid model behaves as a LSTM variant and the bucket regularization does not give us any extra information about our hydrological system. Therefore, the simplest model we can think of already produces a full output space, because of the flexibility of the LSTM.
• L265: what is the criterium for overfitting! Have you used ensembles of optimized networks to see how robust results are?
We are using dropout and tracking the validation loss during training to avoid overfitting. And yes, we have used ensembles, and the models are quite robust. We presented those results at EGU-2023. However, as this was not directly aligned with the main objective of our paper, we did not include this information.
• 6: it is hard to see any differences, perhaps you can enlarge an interesting part of the time seies!
The idea of Figure 6 was exactly this. To show that regardless of the regularization we used, there are some basins in which the simulated time series are almost identical. In a revised version of the manuscript, we can reduce the time series period we show, however this would not change the concept of what we are showing.
• L309: I would guess that ERA5-Land data are also computed and not observed quantities. So it is a model state intercomparison!
Yes, ERA5-Land data is also computed. We justified why we used this type of data in section 2.2. In a revised version of the manuscript, we can include the model state intercomparison term.
• L329: why looking at average values and not show the distribution?
We used the average as we were trying to give a general similarity metric of the models along all the testing basins. However, in a revised version of the manuscript we can add other metrics.
• L385: what has this paper contributed to a better understanding in this context! Be specific!
In line 385 we were doing a recap of previous studies and what motivated our research, but we were not referring to our research yet. From line 392 forward is where we stated the specific conclusions of our paper. There we described in detail the process we followed to answer our two research questions, and the results we obtained.
• L402: What s new compared to Feng et al., what are different findings!
In line 402 we indicated that our hybrid model achieved similar performance as stand-alone LSTMs and outperformed the conceptual models, and that these findings align with existing literature, including Feng et al. (2022). This paragraph is intended to compare part of our results with existing studies, which we argue is always a good practice to make, because it increases the robustness of the methods.
Nevertheless, these are not the only findings we summarized in the conclusions. From line 404 on, we described the specific findings of our study and the answers to the research questions, which to the best of our knowledge had not been studied before.
• L417: States (instead of variables!?
In a revised version of the manuscript, we will use the word states (or state variables) instead of variables.
• L421: correlation is a very weak measure-of -goodness-of-fit especially when dealing with cyclic data/processes)
We use the correlation metric to compare the dynamics of the unsaturated zone against ERA5-Land data. For this test, following the process proposed by Ehret et al (2020), we normalized the data before comparison (map the values to a 0-1 range), as we are interested in comparing the dynamics of the series and not the specific values. Therefore, given the purpose of the test (comparing the dynamic of the normalized series) we think the correlation coefficient can give us the information we need. Are there other specific metrics that are better suited for this case?
Overall, I feel, the manuscript has in general the potential to be a valuable contribution to HESS, however, questions and issues raised in the general comments would need to be addressed and discussed to a significant part before final acceptance.
We thank the referee for the overall positive evaluation of our manuscript and hope we could adress the questions raised in a satisfactory manner.
References
• Coxon, G., Addor, N., Bloomfield, J. P., Freer, J., Fry, M., Hannaford, J., Howden, N. J. K., Lane, R., Lewis, M., Robinson, E. L., Wagener, T., and Woods, R.: CAMELS-GB: hydrometeorological time series and landscape attributes for 671 catchments in Great Britain, Earth System Science Data, 12, 2459–2483, https://doi.org/10.5194/essd-12-2459-2020, 2020.
• Ehret, U., van Pruijssen, R., Bortoli, M., Loritz, R., Azmi, E., and Zehe, E.: Adaptive clustering: reducing the computational costs ofdistributed (hydrological) modelling by exploiting time-variable similarity among model elements, Hydrology and Earth System Sciences, 24, 4389–4411, https://doi.org/10.5194/hess-24-4389-2020, 2020.
• Frame, J., Kratzert, F., Klotz, D., Gauch, M., Shelev, G., Gilon, O., ... & Nearing, G. S. (2021). Deep learning rainfall-runoff predictions of extreme events. Hydrology and Earth System Sciences Discussions, 2021, 1-20.
• Feigl, M., Roesky, B., Herrnegger, M., Schulz, K., & Hayashi, M. (2022). Learning from mistakes—Assessing the performance and uncertainty in process‐based models. Hydrological Processes, 36(2), e14515.
• Feng, D., Liu, J., Lawson, K., and Shen, C.: Differentiable, Learnable, Regionalized Process-Based Models With Multiphysical Outputs can Approach State-Of-The-Art Hydrologic Prediction Accuracy, Water Resources Research, 58, e2022WR032 404, https://doi.org/https://doi.org/10.1029/2022WR032404, e2022WR032404 2022WR032404, 2022.
• Kraft, B., Jung, M., Körner, M., Koirala, S., and Reichstein, M.: Towards hybrid modeling of the global hydrological cycle, Hydrology and Earth System Sciences, 26, 1579–1614, https://doi.org/10.5194/hess-26-1579-2022, 2022.
• Lan, T., Lin, K., Xu, C. Y., Tan, X., & Chen, X. (2020). Dynamics of hydrological-model parameters: mechanisms, problems and solutions. Hydrology and Earth System Sciences, 24(3), 1347-1366.
• Lees, T., Buechel, M., Anderson, B., Slater, L., Reece, S., Coxon, G., and Dadson, S. J.: Benchmarking data-driven rainfall-runoff models in Great Britain: a comparison of long short-term memory (LSTM)-based models with four lumped conceptual models, Hydrology and Earth System Sciences, 25, 5517–5534, https://doi.org/10.5194/hess-25-5517-2021, 2021.Citation: https://doi.org/10.5194/egusphere-2023-1980-AC1
- The authors motivate they work by a paper of Feng et al. who propose a general framework of hybrid dPL modelling. They use the HBV model as a basis and estimate static and dynamically HBV parameters using Catchment parameters and meteorological input (as used do force HBV). This paper extends and slightly varies the this approach by analysing simple bucket based models as well as (what they call) NonSense model. Dynamic parameters are estimated with an LSTM DL. Research question 1 is “do conceptual models serve as a regionalization mechanism for thwe dynamic parameterization? I do think this is an important question (and I miss the reference of Frame et al, 2022 in this context), however, I believe it is not addressed in such a rigorous way as would be needed here. Conceptual models can range over a large range of complexity. Wha,t if we would just apply a simple equation relating Rainfall to runoff (Q = c(x,t) * P) and allow c to be estimated by a LSTM as suggested. This is the simplest model I can think of, and then I would systematically increase the complexity of the conceptual models.
-
RC2: 'Comment on egusphere-2023-1980', Grey Nearing, 13 Nov 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1980/egusphere-2023-1980-RC2-supplement.pdf
-
AC2: 'Reply on RC2', Eduardo Acuna, 04 Dec 2023
We want to thank Grey Nearing for the detailed evaluation of our paper. We attach the responses to his questions/comments in a PDF file. We believe that the changes proposed here will increase the quality of the manuscript and hope we addressed the questions raised satisfactorily.
-
AC2: 'Reply on RC2', Eduardo Acuna, 04 Dec 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
616 | 251 | 36 | 903 | 24 | 29 |
- HTML: 616
- PDF: 251
- XML: 36
- Total: 903
- BibTeX: 24
- EndNote: 29
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Eduardo Acuña Espinoza
Ralf Loritz
Manuel Álvarez Chaves
Nicole Bäuerle
Uwe Ehret
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(4290 KB) - Metadata XML