the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Analyzing the generalization capabilities of hybrid hydrological models for extrapolation to extreme events
Abstract. Data-driven techniques have shown the potential to outperform process-based models for rainfall-runoff simulation. Recently, hybrid models, which combine data-driven methods with process-based approaches, have been proposed to leverage the strengths of both methodologies, aiming to enhance simulation accuracy while maintaining certain interpretability. Expanding the set of test cases to evaluate hybrid models under different conditions, we test their generalization capabilities for extreme hydrological events, comparing their performance against Long Short-Term Memory (LSTM) networks and process based models. Our results indicate that hybrid models show similar performance as LSTM networks for most cases. However, hybrid models reported slightly lower errors in the most extreme cases, and were able to produce higher peak discharges.
- Preprint
(2807 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2024-2147', Chaopeng Shen, 22 Aug 2024
Dear authors,
Thanks for the contribution.
We have run similar experiments on our end, which show that our version of single hybrid model, dHBV, outperformed LSTM in nearly all return-period categories. These results are documented here: https://t.co/BnWtEy6NEk. The conclusions seemed to be moderately different from Espinoza24.
To understand where discrepancies lie, we performed extensive due diligence by running multiple experiments with the same setups as the authors to understand the observed differences. We appreciate the authors for making their code available, enabling this exploration. It seems the neuralhydrology vs. hydroDL implementations of differentiable HBV are the main cause of discrepancies, which have the potential the change the conclusions. We would like to share the findings with the authors. While Espinoza24 conducted an experiment to verify that NH-hybrid could reproduce earlier results from Feng et al. (2022), it is important to note that this does not imply that other experiments would yield the same outcome. The claimed equivalence is not established here.
Please see the attached PDF for details.
-
AC1: 'Reply on CC1', Eduardo Acuna, 09 Sep 2024
Please find attached our response to the comment on egusphere-2024-2147 by Chaopeng Shen
-
CC3: 'Reply on AC1', Chaopeng Shen, 19 Sep 2024
Please see the attachment. We ran the model dHBV1.0 with more random seeds. Every single time, it has a lower error than LSTM or the "hybrid" in the authors' plot. Hence, the authors' assumption about statistical noise is not correct, and the difference in appearance is still due to the difference between dHBV1.0 and the "hybrid" that the authors trained.
- AC3: 'Reply on CC3', Eduardo Acuna, 04 Oct 2024
-
CC3: 'Reply on AC1', Chaopeng Shen, 19 Sep 2024
-
AC1: 'Reply on CC1', Eduardo Acuna, 09 Sep 2024
-
CC2: 'Comment on egusphere-2024-2147', John Ding, 30 Aug 2024
AR2 second-order autoregressive process of the streamflow
Besides the LSTM, HBV and a hybrid of the two, the authors may wish to revisit an autoregressive baseline model called AR(2) or AR2. This, an acceleration-based metric, is expressed by:
Qar2[t+1]=2Qobs[t]-Qobs[t-1],
see Azmi et al. (2021, SC1, Eq. 1).
The subject was previously discussed between me and Uwe Ehret, the current closing author, on a storm event scale in a different but related context (ibid., AC1, Table 1).
To summarize my take of our discussion, below are two main points:
1) a third-order AR model, AR-3 (Model-07, therein) when rounding off the time lag coefficients, is identical to AR2, and
2) it outperforms an ANN model (Model-08) by an NSE value of 0.99 to 0.12.
For the 531 CAMELS-US basins (Lines 125-130, and Figure 2), can we infer from point 2 above that an AR2 will be a better performing model? Let’s consider this a hypothesis for falsification in another open discussion forum.
In theory, an AR2 projection hydrograph over/under shoots the observed peak/trough flows - just visualize a USDA-SCS triangular unit hydrograph having an upslope and a downslope projection. This is in contrast to the authors' finding that 'all [three of their] models underestimated extreme flow scenarios,' (Line 243).
References
Azmi, E., Ehret, U., Weijs, S. V., Ruddell, B. L., and Perdigão, R. A. P.: Technical note: “Bit by bit”: a practical and general approach for evaluating model computational complexity vs. model performance, Hydrol. Earth Syst. Sci., 25, 1103–1115, https://doi.org/10.5194/hess-25-1103-2021, 2021.
Citation: https://doi.org/10.5194/egusphere-2024-2147-CC2 - AC2: 'Reply on CC2', Eduardo Acuna, 09 Sep 2024
-
RC1: 'Comment on egusphere-2024-2147', Basil Kraft, 05 Sep 2024
Dear Authors,Thank you for sharing this manuscript. It was a pleasure to review your study on comparing a hybrid hydrological model with both a neural network and a conceptual model. Such investigations are crucial for understanding the strengths and limitations of hybrid approaches.My full review is attached.Best regards,Basil Kraft
- AC4: 'Reply on RC1', Eduardo Acuna, 04 Oct 2024
-
RC2: 'Comment on egusphere-2024-2147', Shijie Jiang, 06 Oct 2024
The manuscript "Analyzing the generalization capabilities of hybrid hydrological models for extrapolation to extreme events" compares the generalization capabilities of hybrid models, LSTM networks, and process-based models for rainfall-runoff simulations, with a particular focus on extreme events. The study examines whether hybrid models provide a meaningful advantage over standalone data-driven or process-based models. The results suggest that hybrid models show marginal improvements in predicting extreme peak flows, but overall perform similarly to LSTM networks. The authors argue that given the comparable performance, the choice of model depends on user needs. Overall, the study does a great job of providing a balanced perspective on the hybrid models. The paper is valuable in stimulating further discussion in the field.
Major comments
1) One of the central claims for hybrid models is that they combine the predictive power of data-driven approaches with the interpretability of process-based models. However, the manuscript focuses more on marginal differences in predictive performance than on the added interpretability that might justify hybrid models. I suggest including a discussion of the trade-off between accuracy and interpretability. For example, does the hybrid model help to better understand the causes of extreme flows, such as snowmelt, soil moisture dynamics, or precipitation anomalies? Could the explicit encoding of hydrologic concepts in the hybrid model be more valuable for decision making, even if the predictive gains are minimal?2) While the paper touches on model errors during extreme events, it does not provide an analysis of where and why each model is better or worse, e.g., under which geophysical, climatic, or soil conditions. This could be helpful to better understand the strengths and limitations of each model type and provide a useful guide to when hybrid / LSTM models are most beneficial.
3) A related comment is that while the authors conclude that the choice of model depends on user needs, the manuscript does not provide clear guidance on how to make this choice. For example, in data-poor environments where high-quality or long-term observational data may not be available, should hybrid models be preferred because they incorporate process-based knowledge that could compensate for sparse data? Is it possible to make a comparison that assumes limited data? I think it would be helpful for practitioners working in regions with poor monitoring infrastructure.
Specific comments:
L12, the term “out-of-sample conditions” is somewhat ambiguous. Please specify what type of generalization is meant (temporal or spatial domains).
L16, the phrase "notion of interpretability" could be clearer. What does "notion" mean in this context? It sounds vague. If interpretability is considered to be a key reason for adopting hybrid models over purely data-driven ones, it should be more clearly defined and quantified. Does interpretability mean the ability to interpret the parameters, processes, or outputs in a hydrologically meaningful way? Or are you suggesting that it's a "so-called" interpretability?
L30, what specific structural deficiencies are you referring to here?
L35, the focus on "higher predictive accuracy" may overlook the fact that accuracy alone may not be the best criterion for assessing model suitability. Authors should clarify that other criteria (such as robustness, model transparency, applicability) besides accuracy may be equally important in model evaluation.
L100, the explanation of the hybrid model’s parameterization is complex and may not be easily understood by just reading this paper. At least a clearer explanation of the buckets and parameters is needed.
L127 without discussing the potential limitations of the HBV model, this claim seems overly simplistic. It is useful to explain here why the HBV model underperformed, even though it has been studied in previous studies.
L150, again, this conclusion of equivalence is overly simplistic and could lead to believing that there are no meaningful differences between the models. Are there certain types of basins or hydrological conditions (e.g., arid basins) where one model clearly outperforms the other?
L167, it's hard to read from the figure about the "slightly lower errors".
L215, this observation is important but lacks sufficient follow-up. If the dynamic parameterization reaches its limits during extreme events, it indicates a potential flaw in the model design, but the text does not discuss how this issue could be addressed or what its implications are. Could the predefined intervals be adjusted or extended to better handle extreme events?
L220, I am very confused here. How does the snowmelt effect indicate the potential bias in the input data? If the snowmelt flux is high, it's not surprising to see a discrepancy between precipitation and runoff. This statement also raises the question of a structural flaw in the HBV model, but it is not elaborated. I'm left wondering what specific deficiencies in the snow module are responsible for the poor performance and how these deficiencies could be addressed in future work. For example, is the snowmelt process not adequately modeled due to insufficient temperature data, or is the parameterization of the snow module too simplistic?
L225, it's vague and doesn't provide enough insight into what types of hybrid architectures might yield different results. In my opinion, the hybrid model used in this paper considers model with a conceptual model as the backbone and neural networks for parameter learning. It would be more actionable to point out some other types of hybrid models, e.g., component replacement or more conceptual frameworks (e.g., https://hess.copernicus.org/articles/26/1579/2022/) that might address some of the limitations identified in the study.
L230, I'm afraid this recommendation is too general and simplistic...
L241, is it possible to use more precise numbers or statistical analysis to support the claim of “slight” outperformance. If the differences are marginal, do you think they might still matter in practical scenarios?
L245, the mention of "possible bias in the input data" is speculative without further analysis. And if that's the case, does it imply that LSTM is insensitive to the bias?
L249, the statement about dynamic parameterization is not sufficiently elaborated. It doesn't provide enough detail about how this adaptation happens or why it is particularly useful for extreme events. Also, the comparison with LSTM gating is interesting, but lacks further discussion.Citation: https://doi.org/10.5194/egusphere-2024-2147-RC2 - AC5: 'Reply on RC2', Eduardo Acuna, 15 Oct 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
529 | 170 | 203 | 902 | 3 | 4 |
- HTML: 529
- PDF: 170
- XML: 203
- Total: 902
- BibTeX: 3
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1