the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis
Abstract. An increasing number of studies have shown the prowess of Long Short-Term Memory (LSTM) networks for hydrological modelling and forecasting. One commonly cited drawback of these methods, however, is the requirement for large amounts of training data to properly reproduce streamflow events. For maximum annual streamflow, this can be problematic since they are by definition less common than mid- or low-flows, leading to under-representation in the model’s training set and, ultimately, parameterization. This study investigates six methods to improve peak streamflow simulation skill of LSTM models used to extend streamflow observation time series for flood frequency analysis (FFA). Methods include adding meteorological data variables, providing streamflow simulations from a distributed hydrological model, oversampling peak streamflow events, adding multihead attention mechanisms, adding data from a large set of “donor” catchments and combining some of these elements in a single model. Furthermore, results are compared to those obtained by the distributed hydrological model HYDROTEL. The study is performed on 88 catchments in the province of Quebec using a leave-one-out cross-validation implementation and an FFA is applied using observations as well as model simulations. Results show that LSTM-based models are able to simulate peak streamflow as well (for a simple LSTM model implementation) or better (with hybrid LSTM-hydrological model implementations) than the distributed hydrological model. Multiple pathways forward to further improve the LSTM-based model’s ability to predict peak streamflow are provided and discussed.
- Preprint
(1683 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2134', Emilio Graciliano Ferreira Mercuri, 21 Sep 2024
The manuscript entitled "Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis" presents an interesting comparison between a distributed hydrological model (HYDROTEL) and Long Short-Term Memory (LSTM) deep learning models. Below are some points regarding its methodology, results, and potential areas for improvement:
1. LSTM is one class of machine learning algorithms. There are other types being used with good quality of results such as Convolutional Neural Networks (CNNs), Random Forests, or Gradient Boosted Trees. This should be considered in the literature review and/or as a future development.
2. One of the key methods tested, oversampling of extreme peak streamflow events, performed poorly. This suggests a more nuanced approach to data augmentation might be required. Future work could explore advanced synthetic data generation techniques like the Synthetic Minority Over-sampling Technique (SMOTE) rather than simply replicating extreme events. One example is the paper: Wu, Yirui, Yukai Ding, and Jun Feng. "SMOTE-Boost-based sparse Bayesian model for flood prediction." EURASIP Journal on Wireless Communications and Networking 2020 (2020): 1-12.
3. The multihead attention mechanism did not significantly improve the LSTM model’s performance. This raises questions about whether it was fully optimized or if a different attention configuration could be more effective. The complexity added by the attention mechanism might not have been justified, given the size of the dataset. I know that the codes were shared, but some diagram and/or a more complete description of the attention mechanism would be interesting to be added, to help future research in the area.
4. One of the paper's recurring challenges is the inherent scarcity of extreme flood events, which makes it difficult for LSTMs to train effectively. Although the study attempts to mitigate this issue, it highlights that LSTMs struggle with rare event prediction without sufficient data. The paper could benefit from exploring more advanced techniques for handling imbalanced datasets, such as ensemble methods or using generative models to simulate extreme events.
5. Given the results across different test periods, there seems to be a risk of overfitting, particularly in models like LSTM-Combined. The paper could benefit from a more thorough discussion and results presentation on the loss function variation during training and testing epochs.
6. The authors could provide some explanation about the reasons why floods are occurring in Quebec, Canada. Is it increasing the frequency over the years? Are soil or land use reasons for that? Is it related to climate change?
Overall, the paper provides valuable insights into the utility of LSTMs for hydrological modeling, especially in terms of hybrid model approaches.
Citation: https://doi.org/10.5194/egusphere-2024-2134-RC1 - AC1: 'Reply on RC1', Jean-Luc Martel, 07 Nov 2024
-
RC2: 'Comment on egusphere-2024-2134', Andre Ballarin, 26 Dec 2024
OVERVIEW
This paper evaluates the performance of different LSTM-based frameworks for simulating streamflow time series, focusing on their ability to characterize extreme events and, consequently, enhance flood frequency analysis (FFA). To achieve this, the authors applied a set of 7 different LSTM configurations to simulate streamflow from 88 catchments in Quebec, Canada. These configurations included 1 baseline model (LSTM-base) and 6 alternative schemes, which incorporated observed meteorological inputs, a multihead attention structure, and/or hydrological model-based simulations as inputs in addition to the original ERA5-based data, among other aspects.
In my perspective, this work holds relevance for the hydrology field and is suitable for publication in HESS.
The use of ML-methods for hydrological simulations is rapidly gaining traction in the hydrological community and hence, improving our understanding of their benefits and drawbacks is paramount. As mentioned by the authors, there is still no consensus on how LSTM-based simulations perform on representing extreme flood events and how this can influence FFA.
In my opinion, the methods are sound, and their results are well presented. Overall, it was a pleasant read. I have, however, some questions and suggestions pinpointed below which I believe will help the authors to improve the overall quality of the paper. Once addressed, I believe the paper will be a good addition to HESS.
General Comments:
- The authors state in the Introduction and Discussion sections that one of the primary goals of the paper is to assess the “potential of LSTM to extend streamflow records.” However, this aspect is not clearly demonstrated in the manuscript. I did not see any analysis or experiment specifically designed to evaluate this claim. So my question is: how are the authors addressing this in their work? Extending streamflow records is indeed a promising application of LSTM techniques with potential benefits for FFA. For example, to address this gap, the authors could consider an experiment using catchments with longer datasets (e.g., 40 years of data), training the LSTM on subsets (e.g., 20–30%) and assessing how effectively it extends the records and how this lengthening improves FFA. Alternatively, they could revise the manuscript to remove this objective and avoid any misunderstanding.
- Some of their methodological choices require further clarification. For example, it is not clear why they opt to use the Gumbel and GEV distributions for different stations; why the Cunnane plotting position was chosen; and which parameter estimation method was used. I suggest the authors to reevaluate their manuscript seeking to better detail these aspects to improve its reproducibility.
- Although it provides valuable insights into LSTMs performance to characterize extreme events, I missed a more in-depth analysis and discussion about the different LSTM configurations and how they perform in FFA. For instance, I believe the manuscript would benefit from an expansion of the results section, including not only the FFA-based assessment for 4 catchments, but for all evaluated catchments, discussing their spatial distribution, general performance and differences between LSTM and HYDROTEL FFA for catchments with different data availability, and uncertainties, which were not included in the original manuscript.
- Regarding the multihead and oversampling approaches, were the lower performances somewhat expected? Given the inherent scarcity of extreme flood data, exploring alternative data lengthening approaches—such as using synthetic series (e.g., Papalexiou 2022)—could enrich the discussion and provide directions for future research.
Minor Comments
L78 – (Shen and Lawson, 2021) – Review the reference format
Figure 1 - Is it possible to improve this figure by adding some additional information in subpanels, such as a histograms of avaiable years of observed streamflow (besides its spatial distribution) and climatics variables for each catchment (such as P, PET, ...)?
L170 – PETas – Typo
L185 - I believe the text would benefit here from 1 or 2 short sentences explaining the two different configurations of the HYDROTEL (2.3.1 and 2.3.2). It is not clear here if the authors will use both configurations as different inputs for the LSTM or whether the regional model was used only as a a initial step (for example,by recalibrating only 11 out of the 27 parameters)
L323 - Briefly explaining what is the standard scaler will help readers.
L401 - is Figure 3a, b the same of Figure 2 but displaying all 7 approaches in the same panel?
Figure 3 - Is it possible to include additional details on model performance in panels c and d? It is challenging to distinguish model performance. A bar or pie chart summarizing the percentage of catchments where each model performed better could enhance clarity and complement the existing text.
L420 – Suggestion: while Figures 4 and 5 are interesting, they contribute less to the main text. Consider moving them to the Supplementary Material (suggestion only).
Figure 6 – I believe using distinct background colors (e.g., grayscale) - instead of a line - for different periods (training, validation, test) would improve visualization and readability. Also, is it possible include the daily streamflow KGE for all periods (Hydrotel and LSTM-Combined)? It would help readers to assess and compare performances.
L453 - For plot d this is not valid. I suggest the authors include some metrics such as NRMSE here to support what they are claiming. Also try to avoid hyperbolic language, such as "much more similar", "much better",..
REFERENCES
Papalexiou S M 2022 Rainfall Generation Revisited: Introducing CoSMoS-2s and Advancing Copula-Based Intermittent Time Series Modeling Water Resources Research 58 1-33
Citation: https://doi.org/10.5194/egusphere-2024-2134-RC2
Data sets
HYSETS - A 14425 watershed Hydrometeorological Sandbox over North America R. Arsenault, F. Brissette, J. L. Martel, M. Troin, G. Lévesque, J. Davidson-Chaput, M. Castañeda Gonzalez, A. Ameli, and A. Poulin https://doi.org/10.17605/OSF.IO/RPC3W
Model code and software
LSTM for FFA - codes and data R. Arsenault, J.-L. Martel, and F. Brissette https://osf.io/zwtnq/
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
318 | 157 | 58 | 533 | 4 | 5 |
- HTML: 318
- PDF: 157
- XML: 58
- Total: 533
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1