Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis

Martel, Jean-Luc; Arsenault, Richard; Turcotte, Richard; Castañeda-Gonzalez, Mariana; Brissette, François; Armstrong, William; Mailhot, Edouard; Pelletier-Dumont, Jasmine; Lachance-Cloutier, Simon; Rondeau-Genesse, Gabriel; Caron, Louis-Philippe

doi:https://doi.org/10.5194/egusphere-2024-2134

Preprints

https://doi.org/10.5194/egusphere-2024-2134

Preprints

20 Aug 2024

| 20 Aug 2024

Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis

Jean-Luc Martel, Richard Arsenault, Richard Turcotte, Mariana Castañeda-Gonzalez, François Brissette, William Armstrong, Edouard Mailhot, Jasmine Pelletier-Dumont, Simon Lachance-Cloutier, Gabriel Rondeau-Genesse, and Louis-Philippe Caron

Abstract. An increasing number of studies have shown the prowess of Long Short-Term Memory (LSTM) networks for hydrological modelling and forecasting. One commonly cited drawback of these methods, however, is the requirement for large amounts of training data to properly reproduce streamflow events. For maximum annual streamflow, this can be problematic since they are by definition less common than mid- or low-flows, leading to under-representation in the model’s training set and, ultimately, parameterization. This study investigates six methods to improve peak streamflow simulation skill of LSTM models used to extend streamflow observation time series for flood frequency analysis (FFA). Methods include adding meteorological data variables, providing streamflow simulations from a distributed hydrological model, oversampling peak streamflow events, adding multihead attention mechanisms, adding data from a large set of “donor” catchments and combining some of these elements in a single model. Furthermore, results are compared to those obtained by the distributed hydrological model HYDROTEL. The study is performed on 88 catchments in the province of Quebec using a leave-one-out cross-validation implementation and an FFA is applied using observations as well as model simulations. Results show that LSTM-based models are able to simulate peak streamflow as well (for a simple LSTM model implementation) or better (with hybrid LSTM-hydrological model implementations) than the distributed hydrological model. Multiple pathways forward to further improve the LSTM-based model’s ability to predict peak streamflow are provided and discussed.

Received: 09 Jul 2024 – Discussion started: 20 Aug 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1683 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1683 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

06 Oct 2025

Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis

Hydrol. Earth Syst. Sci., 29, 4951–4968, https://doi.org/10.5194/hess-29-4951-2025,https://doi.org/10.5194/hess-29-4951-2025, 2025

Short summary

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-2134', Emilio Graciliano Ferreira Mercuri, 21 Sep 2024

The manuscript entitled "Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis" presents an interesting comparison between a distributed hydrological model (HYDROTEL) and Long Short-Term Memory (LSTM) deep learning models. Below are some points regarding its methodology, results, and potential areas for improvement:
1. LSTM is one class of machine learning algorithms. There are other types being used with good quality of results such as Convolutional Neural Networks (CNNs), Random Forests, or Gradient Boosted Trees. This should be considered in the literature review and/or as a future development.
2. One of the key methods tested, oversampling of extreme peak streamflow events, performed poorly. This suggests a more nuanced approach to data augmentation might be required. Future work could explore advanced synthetic data generation techniques like the Synthetic Minority Over-sampling Technique (SMOTE) rather than simply replicating extreme events. One example is the paper: Wu, Yirui, Yukai Ding, and Jun Feng. "SMOTE-Boost-based sparse Bayesian model for flood prediction." EURASIP Journal on Wireless Communications and Networking 2020 (2020): 1-12.
3. The multihead attention mechanism did not significantly improve the LSTM model’s performance. This raises questions about whether it was fully optimized or if a different attention configuration could be more effective. The complexity added by the attention mechanism might not have been justified, given the size of the dataset. I know that the codes were shared, but some diagram and/or a more complete description of the attention mechanism would be interesting to be added, to help future research in the area.
4. One of the paper's recurring challenges is the inherent scarcity of extreme flood events, which makes it difficult for LSTMs to train effectively. Although the study attempts to mitigate this issue, it highlights that LSTMs struggle with rare event prediction without sufficient data. The paper could benefit from exploring more advanced techniques for handling imbalanced datasets, such as ensemble methods or using generative models to simulate extreme events.
5. Given the results across different test periods, there seems to be a risk of overfitting, particularly in models like LSTM-Combined. The paper could benefit from a more thorough discussion and results presentation on the loss function variation during training and testing epochs.
6. The authors could provide some explanation about the reasons why floods are occurring in Quebec, Canada. Is it increasing the frequency over the years? Are soil or land use reasons for that? Is it related to climate change?
Overall, the paper provides valuable insights into the utility of LSTMs for hydrological modeling, especially in terms of hybrid model approaches.

Citation: https://doi.org/10.5194/egusphere-2024-2134-RC1
- AC1: 'Reply on RC1', Jean-Luc Martel, 07 Nov 2024
  
  Please see the attached PDF for our detailed response.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2134-AC1
RC2:
'Comment on egusphere-2024-2134', Andre Ballarin, 26 Dec 2024
OVERVIEW
This paper evaluates the performance of different LSTM-based frameworks for simulating streamflow time series, focusing on their ability to characterize extreme events and, consequently, enhance flood frequency analysis (FFA). To achieve this, the authors applied a set of 7 different LSTM configurations to simulate streamflow from 88 catchments in Quebec, Canada. These configurations included 1 baseline model (LSTM-base) and 6 alternative schemes, which incorporated observed meteorological inputs, a multihead attention structure, and/or hydrological model-based simulations as inputs in addition to the original ERA5-based data, among other aspects.
In my perspective, this work holds relevance for the hydrology field and is suitable for publication in HESS.
The use of ML-methods for hydrological simulations is rapidly gaining traction in the hydrological community and hence, improving our understanding of their benefits and drawbacks is paramount. As mentioned by the authors, there is still no consensus on how LSTM-based simulations perform on representing extreme flood events and how this can influence FFA.
In my opinion, the methods are sound, and their results are well presented. Overall, it was a pleasant read. I have, however, some questions and suggestions pinpointed below which I believe will help the authors to improve the overall quality of the paper. Once addressed, I believe the paper will be a good addition to HESS.
General Comments:
The authors state in the Introduction and Discussion sections that one of the primary goals of the paper is to assess the “potential of LSTM to extend streamflow records.” However, this aspect is not clearly demonstrated in the manuscript. I did not see any analysis or experiment specifically designed to evaluate this claim. So my question is: how are the authors addressing this in their work? Extending streamflow records is indeed a promising application of LSTM techniques with potential benefits for FFA. For example, to address this gap, the authors could consider an experiment using catchments with longer datasets (e.g., 40 years of data), training the LSTM on subsets (e.g., 20–30%) and assessing how effectively it extends the records and how this lengthening improves FFA. Alternatively, they could revise the manuscript to remove this objective and avoid any misunderstanding.

Some of their methodological choices require further clarification. For example, it is not clear why they opt to use the Gumbel and GEV distributions for different stations; why the Cunnane plotting position was chosen; and which parameter estimation method was used. I suggest the authors to reevaluate their manuscript seeking to better detail these aspects to improve its reproducibility.

Although it provides valuable insights into LSTMs performance to characterize extreme events, I missed a more in-depth analysis and discussion about the different LSTM configurations and how they perform in FFA. For instance, I believe the manuscript would benefit from an expansion of the results section, including not only the FFA-based assessment for 4 catchments, but for all evaluated catchments, discussing their spatial distribution, general performance and differences between LSTM and HYDROTEL FFA for catchments with different data availability, and uncertainties, which were not included in the original manuscript.

Regarding the multihead and oversampling approaches, were the lower performances somewhat expected? Given the inherent scarcity of extreme flood data, exploring alternative data lengthening approaches—such as using synthetic series (e.g., Papalexiou 2022)—could enrich the discussion and provide directions for future research.

Minor Comments
L78 – (Shen and Lawson, 2021) – Review the reference format
Figure 1 - Is it possible to improve this figure by adding some additional information in subpanels, such as a histograms of avaiable years of observed streamflow (besides its spatial distribution) and climatics variables for each catchment (such as P, PET, ...)?
L170 – PETas – Typo
L185 - I believe the text would benefit here from 1 or 2 short sentences explaining the two different configurations of the HYDROTEL (2.3.1 and 2.3.2). It is not clear here if the authors will use both configurations as different inputs for the LSTM or whether the regional model was used only as a a initial step (for example,by recalibrating only 11 out of the 27 parameters)

L323 - Briefly explaining what is the standard scaler will help readers.
L401 - is Figure 3a, b the same of Figure 2 but displaying all 7 approaches in the same panel?
Figure 3 - Is it possible to include additional details on model performance in panels c and d? It is challenging to distinguish model performance. A bar or pie chart summarizing the percentage of catchments where each model performed better could enhance clarity and complement the existing text.
L420 – Suggestion: while Figures 4 and 5 are interesting, they contribute less to the main text. Consider moving them to the Supplementary Material (suggestion only).
Figure 6 – I believe using distinct background colors (e.g., grayscale) - instead of a line - for different periods (training, validation, test) would improve visualization and readability. Also, is it possible include the daily streamflow KGE for all periods (Hydrotel and LSTM-Combined)? It would help readers to assess and compare performances.
L453 - For plot d this is not valid. I suggest the authors include some metrics such as NRMSE here to support what they are claiming. Also try to avoid hyperbolic language, such as "much more similar", "much better",..
REFERENCES
Papalexiou S M 2022 Rainfall Generation Revisited: Introducing CoSMoS-2s and Advancing Copula-Based Intermittent Time Series Modeling Water Resources Research 58 1-33
Citation: https://doi.org/10.5194/egusphere-2024-2134-RC2
- AC2: 'Reply on RC2', Jean-Luc Martel, 24 Jan 2025
  
  Please see the attached PDF for our detailed response.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2134-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-2134', Emilio Graciliano Ferreira Mercuri, 21 Sep 2024

The manuscript entitled "Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis" presents an interesting comparison between a distributed hydrological model (HYDROTEL) and Long Short-Term Memory (LSTM) deep learning models. Below are some points regarding its methodology, results, and potential areas for improvement:
1. LSTM is one class of machine learning algorithms. There are other types being used with good quality of results such as Convolutional Neural Networks (CNNs), Random Forests, or Gradient Boosted Trees. This should be considered in the literature review and/or as a future development.
2. One of the key methods tested, oversampling of extreme peak streamflow events, performed poorly. This suggests a more nuanced approach to data augmentation might be required. Future work could explore advanced synthetic data generation techniques like the Synthetic Minority Over-sampling Technique (SMOTE) rather than simply replicating extreme events. One example is the paper: Wu, Yirui, Yukai Ding, and Jun Feng. "SMOTE-Boost-based sparse Bayesian model for flood prediction." EURASIP Journal on Wireless Communications and Networking 2020 (2020): 1-12.
3. The multihead attention mechanism did not significantly improve the LSTM model’s performance. This raises questions about whether it was fully optimized or if a different attention configuration could be more effective. The complexity added by the attention mechanism might not have been justified, given the size of the dataset. I know that the codes were shared, but some diagram and/or a more complete description of the attention mechanism would be interesting to be added, to help future research in the area.
4. One of the paper's recurring challenges is the inherent scarcity of extreme flood events, which makes it difficult for LSTMs to train effectively. Although the study attempts to mitigate this issue, it highlights that LSTMs struggle with rare event prediction without sufficient data. The paper could benefit from exploring more advanced techniques for handling imbalanced datasets, such as ensemble methods or using generative models to simulate extreme events.
5. Given the results across different test periods, there seems to be a risk of overfitting, particularly in models like LSTM-Combined. The paper could benefit from a more thorough discussion and results presentation on the loss function variation during training and testing epochs.
6. The authors could provide some explanation about the reasons why floods are occurring in Quebec, Canada. Is it increasing the frequency over the years? Are soil or land use reasons for that? Is it related to climate change?
Overall, the paper provides valuable insights into the utility of LSTMs for hydrological modeling, especially in terms of hybrid model approaches.

Citation: https://doi.org/10.5194/egusphere-2024-2134-RC1
- AC1: 'Reply on RC1', Jean-Luc Martel, 07 Nov 2024
  
  Please see the attached PDF for our detailed response.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2134-AC1
RC2:
'Comment on egusphere-2024-2134', Andre Ballarin, 26 Dec 2024
OVERVIEW
This paper evaluates the performance of different LSTM-based frameworks for simulating streamflow time series, focusing on their ability to characterize extreme events and, consequently, enhance flood frequency analysis (FFA). To achieve this, the authors applied a set of 7 different LSTM configurations to simulate streamflow from 88 catchments in Quebec, Canada. These configurations included 1 baseline model (LSTM-base) and 6 alternative schemes, which incorporated observed meteorological inputs, a multihead attention structure, and/or hydrological model-based simulations as inputs in addition to the original ERA5-based data, among other aspects.
In my perspective, this work holds relevance for the hydrology field and is suitable for publication in HESS.
The use of ML-methods for hydrological simulations is rapidly gaining traction in the hydrological community and hence, improving our understanding of their benefits and drawbacks is paramount. As mentioned by the authors, there is still no consensus on how LSTM-based simulations perform on representing extreme flood events and how this can influence FFA.
In my opinion, the methods are sound, and their results are well presented. Overall, it was a pleasant read. I have, however, some questions and suggestions pinpointed below which I believe will help the authors to improve the overall quality of the paper. Once addressed, I believe the paper will be a good addition to HESS.
General Comments:
The authors state in the Introduction and Discussion sections that one of the primary goals of the paper is to assess the “potential of LSTM to extend streamflow records.” However, this aspect is not clearly demonstrated in the manuscript. I did not see any analysis or experiment specifically designed to evaluate this claim. So my question is: how are the authors addressing this in their work? Extending streamflow records is indeed a promising application of LSTM techniques with potential benefits for FFA. For example, to address this gap, the authors could consider an experiment using catchments with longer datasets (e.g., 40 years of data), training the LSTM on subsets (e.g., 20–30%) and assessing how effectively it extends the records and how this lengthening improves FFA. Alternatively, they could revise the manuscript to remove this objective and avoid any misunderstanding.

Some of their methodological choices require further clarification. For example, it is not clear why they opt to use the Gumbel and GEV distributions for different stations; why the Cunnane plotting position was chosen; and which parameter estimation method was used. I suggest the authors to reevaluate their manuscript seeking to better detail these aspects to improve its reproducibility.

Although it provides valuable insights into LSTMs performance to characterize extreme events, I missed a more in-depth analysis and discussion about the different LSTM configurations and how they perform in FFA. For instance, I believe the manuscript would benefit from an expansion of the results section, including not only the FFA-based assessment for 4 catchments, but for all evaluated catchments, discussing their spatial distribution, general performance and differences between LSTM and HYDROTEL FFA for catchments with different data availability, and uncertainties, which were not included in the original manuscript.

Regarding the multihead and oversampling approaches, were the lower performances somewhat expected? Given the inherent scarcity of extreme flood data, exploring alternative data lengthening approaches—such as using synthetic series (e.g., Papalexiou 2022)—could enrich the discussion and provide directions for future research.

Minor Comments
L78 – (Shen and Lawson, 2021) – Review the reference format
Figure 1 - Is it possible to improve this figure by adding some additional information in subpanels, such as a histograms of avaiable years of observed streamflow (besides its spatial distribution) and climatics variables for each catchment (such as P, PET, ...)?
L170 – PETas – Typo
L185 - I believe the text would benefit here from 1 or 2 short sentences explaining the two different configurations of the HYDROTEL (2.3.1 and 2.3.2). It is not clear here if the authors will use both configurations as different inputs for the LSTM or whether the regional model was used only as a a initial step (for example,by recalibrating only 11 out of the 27 parameters)

L323 - Briefly explaining what is the standard scaler will help readers.
L401 - is Figure 3a, b the same of Figure 2 but displaying all 7 approaches in the same panel?
Figure 3 - Is it possible to include additional details on model performance in panels c and d? It is challenging to distinguish model performance. A bar or pie chart summarizing the percentage of catchments where each model performed better could enhance clarity and complement the existing text.
L420 – Suggestion: while Figures 4 and 5 are interesting, they contribute less to the main text. Consider moving them to the Supplementary Material (suggestion only).
Figure 6 – I believe using distinct background colors (e.g., grayscale) - instead of a line - for different periods (training, validation, test) would improve visualization and readability. Also, is it possible include the daily streamflow KGE for all periods (Hydrotel and LSTM-Combined)? It would help readers to assess and compare performances.
L453 - For plot d this is not valid. I suggest the authors include some metrics such as NRMSE here to support what they are claiming. Also try to avoid hyperbolic language, such as "much more similar", "much better",..
REFERENCES
Papalexiou S M 2022 Rainfall Generation Revisited: Introducing CoSMoS-2s and Advancing Copula-Based Intermittent Time Series Modeling Water Resources Research 58 1-33
Citation: https://doi.org/10.5194/egusphere-2024-2134-RC2
- AC2: 'Reply on RC2', Jean-Luc Martel, 24 Jan 2025
  
  Please see the attached PDF for our detailed response.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2134-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (27 Feb 2025) by Zhongbo Yu

AR by Jean-Luc Martel on behalf of the Authors (11 Apr 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (22 Apr 2025) by Zhongbo Yu

RR by Andre Ballarin (23 Apr 2025)

ED: Publish subject to minor revisions (review by editor) (17 Jun 2025) by Zhongbo Yu

AR by Jean-Luc Martel on behalf of the Authors (27 Jun 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (30 Jun 2025) by Zhongbo Yu

AR by Jean-Luc Martel on behalf of the Authors (30 Jun 2025)

Journal article(s) based on this preprint

06 Oct 2025

Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis

Hydrol. Earth Syst. Sci., 29, 4951–4968, https://doi.org/10.5194/hess-29-4951-2025,https://doi.org/10.5194/hess-29-4951-2025, 2025

Short summary

Data sets

HYSETS - A 14425 watershed Hydrometeorological Sandbox over North America R. Arsenault, F. Brissette, J. L. Martel, M. Troin, G. Lévesque, J. Davidson-Chaput, M. Castañeda Gonzalez, A. Ameli, and A. Poulin https://doi.org/10.17605/OSF.IO/RPC3W

Model code and software

LSTM for FFA - codes and data R. Arsenault, J.-L. Martel, and F. Brissette https://osf.io/zwtnq/

Viewed

Total article views: 1,556 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,129	348	79	1,556	34	51

HTML: 1,129
PDF: 348
XML: 79
Total: 1,556
BibTeX: 34
EndNote: 51

Views and downloads (calculated since 20 Aug 2024)

Month	HTML	PDF	XML	Total
Aug 2024	116	52	4	172
Sep 2024	59	25	50	134
Oct 2024	33	11	0	44
Nov 2024	53	31	3	87
Dec 2024	35	23	1	59
Jan 2025	45	24	2	71
Feb 2025	20	13	0	33
Mar 2025	41	17	8	66
Apr 2025	29	18	3	50
May 2025	36	18	1	55
Jun 2025	38	17	2	57
Jul 2025	50	15	1	66
Aug 2025	115	24	3	142
Sep 2025	442	51	1	494
Oct 2025	17	9	0	26

Cumulative views and downloads (calculated since 20 Aug 2024)

Month	HTML	PDF	XML	Total
Aug 2024	116	52	4	172
Sep 2024	59	25	50	134
Oct 2024	33	11	0	44
Nov 2024	53	31	3	87
Dec 2024	35	23	1	59
Jan 2025	45	24	2	71
Feb 2025	20	13	0	33
Mar 2025	41	17	8	66
Apr 2025	29	18	3	50
May 2025	36	18	1	55
Jun 2025	38	17	2	57
Jul 2025	50	15	1	66
Aug 2025	115	24	3	142
Sep 2025	442	51	1	494
Oct 2025	17	9	0	26

Viewed (geographical distribution)

Total article views: 1,591 (including HTML, PDF, and XML) Thereof 1,591 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 06 Oct 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1683 KB)
Metadata XML

Short summary

This study explores six methods to improve the ability of Long Short-Term Memory (LSTM) neural networks to predict peak streamflows, crucial for flood analysis. By enhancing data inputs and model techniques, the research shows LSTM models can match or surpass traditional hydrological models in simulating peak flows. Tested on 88 catchments in Quebec, Canada, these methods offer promising strategies for better flood prediction.

Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Data sets

Model code and software

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.


Total:	0
HTML:	0
PDF:	0
XML:	0