the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
How well do hydrological models learn from limited discharge data? A comparison of process- and data-driven models
Abstract. A widespread assumption is that data-driven models only achieve good results with sufficiently large training data, while process-based models are usually expected to be superior in data-poor situations. In our study, we investigate this assumption by calibrating several process-based and data-driven hydrological models with training data sets of observed discharge that differ in the number of data points and the type of data selection. The tested models include four commonly used process-based models (GR4J, HBV, mHM, and SWAT+) and four data-driven models (conditional probability distributions, regression trees, ANN, and LSTM), which are calibrated for three meso-scale catchments representing three different landscapes in Germany: the Iller in the Alpine region, the Saale in the low mountain ranges, and the Selke in the Central German lowlands. We used conditional entropy to evaluate model performance and the learning capability of a model (i.e., change in model performance with increasing sample size).
In addition to the main question of this study, i.e., to what extent the performance of the different models depends on the training data set, we also investigated whether the selection of the training data (random or according to information content, selection of contiguous time periods, or independent time points) plays a role. We also investigated whether there is a relationship between the information contained in the data and the shape of the learning curve for different models that allows prediction of the achievable model performance, and whether the use of more spatially distributed model inputs leads to improved model performance compared to spatially lumped inputs.
Process-based models outperformed data-driven models for small amounts of training data due to their predefined structure based on process representation. However, with increasing amounts of training data, the learning curve of process-based models quickly saturates, and using about 2 to 5 years of training data, the data-driven LSTM consistently outperforms all process-based models. In particular, the LSTM continues to learn from more training data without approaching saturation. Surprisingly, fully random sampling of training data points for the HBV model leads to better learning results not only compared to consecutive random sampling but also compared to optimal sampling in terms of information content. Analyzing multivariate catchment data allows predictions about how these data can be used to predict discharge. When no memory was considered, the conditional entropy was large, but as soon as some memory was introduced in the form of a past day or past week, the conditional entropy became smaller, suggesting that memory is a very important component in the data and that capturing it improves model performance. This was particularly the case for the catchment from the low mountain ranges and the Alpine region.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(10944 KB) - Metadata XML
-
Supplement
(155 KB) - BibTeX
- EndNote
Status: open (until 15 May 2025)
-
RC1: 'Comment on egusphere-2025-1076', Salvatore Manfreda, 18 Apr 2025
reply
This study provides a comprehensive and timely comparison between process-based (PB) and data-driven (DD) hydrological models in data-scarce conditions, using three representative German catchments. The work is well-structured, methodologically sound, and contributes valuable insights, particularly regarding the learning behavior of LSTM models over increasing training data volumes. Below are my detailed comments and suggestions for improvement.
Major Comments
- The abstract clearly outlines the study’s objectives and key findings. However, the novelty of systematically comparing PB and DD models under limited data scenarios deserves more emphasis upfront. The finding that LSTM outperforms PB models after just 2–5 years of data is particularly impactful and should be highlighted earlier.
- The mention of “conditional entropy” as a key metric is intriguing, but its relevance is not explained. A brief statement on why this information-theoretic approach was chosen over conventional performance metrics (e.g., NSE, KGE) would improve accessibility for readers unfamiliar with this concept.
- The introduction effectively positions PB and DD models in the hydrological modeling landscape. That said, the discussion of “limited data” could benefit from greater contextual depth. Recent literature on hybrid or semi-physically based models as a response to data scarcity could be cited to further motivate the study.
- The research questions (Q1–Q4) are well articulated. However, Q3 (related to information content) would be more compelling if its practical utility—such as in guiding monitoring network design or data prioritization—were better explained.
- The selection of PB models is appropriate and widely accepted. For DD models, while the choices are generally valid, the authors could briefly justify why other promising alternatives (e.g., NARX networks, Random Forests) were not included.
- The experiment on sampling strategies (random, consecutive, Douglas-Peucker) is a strong point. However, using only the HBV model for this analysis (E2) limits the generalizability of results. Extending this comparison to at least one DD and one additional PB model would add robustness.
- The explanation of entropy-based evaluation is informative, though potentially dense for some readers. Including a simplified table comparing conditional entropy with more familiar metrics like NSE or KGE (perhaps in the appendix) would be helpful.
- The core finding—PB models plateau early, while LSTM continues to improve with more data—is well demonstrated. The authors could enhance this discussion by elaborating on why LSTM is particularly effective (e.g., its ability to capture long-term temporal dependencies via memory cells).
- The observed advantage of HBV under random sampling is noteworthy. However, the paper would benefit from discussing whether similar trends are seen in other PB models, such as SWAT+ or mHM.
- The discussion is candid and thoughtful, particularly regarding limitations of model performance (e.g., mHM) and sampling methods. One area for improvement is the generalizability of the findings. The study is focused on humid temperate catchments—how well might these conclusions hold in arid, snow-dominated, or tropical regions?
- The finding that fully random sampling outperforms the Douglas-Peucker method is intriguing and counterintuitive. The authors should explore potential reasons, such as whether event-based sampling might inadvertently introduce overfitting or neglect broader variability.
- The conclusions are concise and well aligned with the results. However, the broader implications—especially for practitioners—could be emphasized more clearly. For example, LSTM’s scalability and adaptability make it promising for ungauged or data-poor basins, and the observed model behavior supports the development of hybrid PB-DD approaches.
Minor Comment
- Section 2.3 offers detailed descriptions of model architectures, which could be streamlined. A reference to supplementary material (if available) or a summary table might improve readability without sacrificing depth.
Overall Recommendation:
This manuscript presents a valuable and methodologically rigorous contribution to hydrological modeling. With minor revisions to better emphasize novelty, expand on generalizability, and clarify certain methodological choices, the paper will significantly enhance understanding of model behavior in data-scarce contexts.
Citation: https://doi.org/10.5194/egusphere-2025-1076-RC1
Data sets
MariStau/IMPRO_infotheory_Data_Code: Data and code used to calculate conditional entropy values Maria Staudinger and Uwe Ehret https://doi.org/10.5281/zenodo.14938050
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
185 | 19 | 6 | 210 | 10 | 8 | 5 |
- HTML: 185
- PDF: 19
- XML: 6
- Total: 210
- Supplement: 10
- BibTeX: 8
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 61 | 23 |
Germany | 2 | 49 | 18 |
China | 3 | 24 | 9 |
Canada | 4 | 15 | 5 |
undefined | 5 | 12 | 4 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 61