the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Assessing the adequacy of traditional hydrological models for climate change impact studies: A case for long-short-term memory (LSTM) neural networks
Abstract. Climate change impact studies are essential for understanding the effects on water resources under changing climate conditions. This paper assesses the effectiveness of Long Short-Term Memory (LSTM) neural networks versus traditional hydrological models for these studies. Traditional hydrological models, which rely on historical climate data and simplified process parameterization, are scrutinized for their capability to accurately predict future hydrological streamflow in scenarios of significant warming. In contrast, LSTM models, known for their ability to learn from extensive sequences of data and capture temporal dependencies, present a viable alternative. This study utilizes a domain of 148 catchments to compare four traditional hydrological models, each calibrated on individual catchments, against two LSTM models. The first LSTM model is trained regionally across the study domain of 148 catchments, while the second incorporates an additional 1,000 catchments at the continental scale, many of which are in climate zones indicative of the future climate within the study domain. The climate sensitivity of all six hydrological models is evaluated using four straightforward climate scenarios (+3 °C, +6 °C, -20 %, and +20 % mean annual precipitation), as well as using an ensemble of 22 CMIP6 GCMs under the SSP5-8.5 scenario. Results indicate that LSTM-based models exhibit a different climate sensitivity compared to traditional hydrological models. Furthermore, analyses of precipitation elasticity to streamflow and multiple streamflow simulations on analogue catchments suggest that the continental LSTM model is most suited for climate change impact studies, a conclusion that is also supported by theoretical arguments.
- Preprint
(2699 KB) - Metadata XML
-
Supplement
(1339 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2133', Anonymous Referee #1, 26 Aug 2024
The study is sound and methodology is robust. The experimental design is clever, dataset curation (especially train/val data setup) is suitable to answer the research questions and the rationality in choosing climate change scenarios and climate models is also reasonable. Language, presentation quality and significance are good, and results/conclusion are also sound and relevant. I recommend publication, however after minor revisions.
1) you can merely infer from “between the lines” in the method section, that for the hydrological model runs evaluating climate change, the hydro-models are actually also trained on the hindcasts from the same GCMs that are used for the projections (as it should be done, so the model is trained from the same population that it is tested on). If it was done like this, then maybe just add a small paragraph or sentence to e.g. 2.4.2.3 to make this crystal clear. If I am mistaking and it is not done like this, we have a bigger problem and you need to redo the results and train on the hindcasts.
2) The model seems overly complicated, given the fact that up to recently, the state of the art model only comprised one single LSTM layer (https://doi.org/10.5194/hess-25-2685-2021). It would be useful to benchmark (i.e. plot/compare) against the above-mentioned studies’ performance to see if it actually gives an advantage in performance. The structure obviously doesn’t hurt performance, as can be seen from the high scores shown in the manuscript, but does it help? As a comment without need for action: probably 90% of your network is inactive, but the remaining 10% is why it still works well.
3) Chapter 4.4.1.: results and literature stand next to each other disconnectedly; no true conclusion is drawn in the chapter. Insert it.
4) Results and discussion are formally separated from each other, but then there is another bunch of analysis and four (!) figures in the discussion section, which renders the separation of results and discussion irrational. Either clearly separate results from discussion, or – much preferred, because much easier to follow/ understand in general – make a single “results and discussion” section and discuss noteworthy point right next to the figures.
5) the conclusion is a stump and contains mostly commonplace statements. Revise to revolve around actual key conclusions from your results.
6) typos:
- Line 576 “mpdels” à models
- Line 671 “streamflow electricity” à elasticity
Citation: https://doi.org/10.5194/egusphere-2024-2133-RC1 - AC1: 'Reply on RC1', Jean-Luc Martel, 30 Oct 2024
-
RC2: 'Comment on egusphere-2024-2133', Anonymous Referee #2, 03 Sep 2024
Paper: Assessing the adequacy of traditional hydrological models for climate change impact studies: A case for long-short-term memory (LSTM) neural networks
The study aimed to compare process-based models (PB) and machine learning models (ML) when used for scenario analysis. Specifically, they assessed three common PBs and two Long-Short Term Memory (LSTM) configurations. The researchers focused on examining the sensitivity of streamflow to different forcing perturbations. Their findings indicated that LSTM trained at a continental scale is a more reliable model due to its training on a wider range of variability. However, the study suggested that in most cases, the sensitivity between PB and ML is similar, with a few exceptions.
Mayor comments:
- The use of latitude, longitude, and climatic static attributes in the LSTM models restricts the model to learn local information rather than capturing the spatial variability in the dataset. This could affect the model's ability to accurately analyze sensitivity under different climatic conditions. I suggest running the model without latitude and longitude and using the minimum number of climatic attributes while changing them according to the sensitivity being analyzed (ex. 20% increase in precipitation means a 20% of the mean annual precipitation).
- There is an excessive number of figures and sections presented. Some results are repetitive, and certain sections may not add significant information due to high uncertainty (ex. 4.4.2). I recommend including only the figures and sections that directly support the main conclusion of the report while considering moving additional figures and analyses to the appendix or supplemental information.
Minor comments:
Line 14. The comment about traditional hydrological models relying on historical climate data is misleading. ML models rely on historical data too.
Line 85. The definition of what is long and short is subjective. In fact, how long the memory in an LSTM model remains an open question.
Line 109. The problem is not the new climatic scenario, the problem is how to define when LSTM is in extrapolation mode. Because we can have an extreme condition in one catchment but the same would be normal in another, so the model can infer the relationship. In that case, the model is still in interpolation mode.
Line 118. You should have an introduction between sections and subsections. Probably a title as a dataset or data would match better with what you have.
Line 132. More data is better, but how did you define this number?
Line 161 – 166. You do not need that paragraph. You can use a reference to explain this in more detail.
Figure 1. It needs a legend.
Line 193. This attribute shouldn’t be used because you are anchoring the dynamic to a location which is exactly what you are trying to avoid. That can have serious effects on the sensitivity of your model.
Line 195. How are you disentangling the correlation between catchment attributes and the meteorological forcing?
Line 317. How did you apply that modification? only testing or in the entire period?
Line 404. This is not the reason for not presenting the validation period. All your results should be in a period that was never used during the training and validation.
Figure 3. I recommend using CDF plots. This format has been widely used in streamflow models. I suggest adding a line where the best value is found.
Line 453 – 454. This could be a consequence of using latitude and longitude as input. The LSTM model is fixed to the location so it is less sensitive to precipitation because part of the precipitation correlation is shared with those attributes or with the climatic ones.
Line 461 – 462. Something similar could be happening here with elevation. Temperature and elevation are very correlated. An interesting experiment would be increasing temperature and decreasing elevation by using an altitudinal gradient. Should the sensitivity increase or decrease?
Figure 4. I would decrease the y-axis range. Probably [-60,20] for a and b. [-50,50] for c and d.
Figures 5 and 6. Move it to the appendix or supplement information.
Line 479 – 480. Explain why it is clear to you.
Figure 7. In many cases the differences look not significant, you should do hypothesis testing to check the level of significance. Moreover, you should mention something about the differences in the variability between some models. All the sub-figures must have the same y-axis range to do a fair comparison. I do not think you need all the sub-figures; you should show here only the most significant.
Line 537. No result can be considered a result, but in that case, you could move the entire section to supplemental information.
Line 558. But exactly for this reason you have the third period. Are you training to say that this period is not representative enough? If this is the case, you should check that.
Line 559 – 561. This is not part of the discussion; this is just part of the methodology.
Line 563 – 564. This is a strong statement without support. Remember that the number of parameters is not comparable between PB and ML models.
Line 568 – 569. This sentence is exactly the opposite you said in line 558. You should put everything in one paragraph to tell a more consistent story.
Line 571 – 572. 1000 sounds like a large number of catchments so I disagree.
Line 575 – 576. Are you talking about distributed models? If this is the case, this would not be a fair comparison. If you want to add more degrees of freedom to PB, for example, you could combine the different sources of precipitation. Moreover, remember that the parameters in a PB encode the local descriptors (local characteristics) that the model needs.
Line 580. I disagree. More variables increase performance but decrease interpretability. How could you do the same sensitivity analysis with 10 hydroclimatic variables highly correlated?
Line 590 – 591. You do not need to be sorry for finding that those models are the worst, this is just part of the results. Delete the sentence.
Line 605 – 605. I disagree. The situation is exactly the opposite. You are not counting all the sensibilities; a temperature change can change the precipitation too, meaning that the final sensitivity of streamflow can be higher or lower. So, your analysis is a simplified sensitivity analysis which does not mean you are more accurate.
Line 622. What family is that?
Line 625. This is a contra argument about sensitivity coming from structure.
Section 4.4. You focus just on which is the best. You must analyze the benefit of the ensemble of models (multi-representation approach). The concept of the best model does not exist.
Line 653. That is not true. There is a lot of research on interpretability. We do not have yet the same level of interpretability as PB, but this does not mean we are not going to get it in the future.
Figure 9. I would prefer a table or a CDF figure showing the distribution. It is very hard to compare models within each group.
Line 701 – 703. I agree about the catchment attributes used however you considered only input similarity. This is not enough to define dynamic similarity, at least you should consider adding similarity in the streamflow signature.
Line 705. If this is the case, you should use a uniform weight distribution. Add information supporting your decision to use something different than uniform.
Figure 10. Given the huge difference between the analogues and the models, it is impossible to say that one model is better than the other. Moreover, if I suppose that the catchments presented are the best ones, it is impossible to infer more from this type of comparison.
Line 759 – 764. Exactly for this reason I would drop the entire section. The results from this section are not different than before but with a huge uncertainty.
Citation: https://doi.org/10.5194/egusphere-2024-2133-RC2 - AC3: 'Reply on RC2', Jean-Luc Martel, 30 Oct 2024
-
RC3: 'Comment on egusphere-2024-2133', Anonymous Referee #3, 07 Sep 2024
The paper evaluates LSTM-based hydrological models and traditional hydrological models in Climate Change impact assessments. The models are assessed regarding their ability to simulate future streamflow and against their sensitivity to climatic changes. The authors conclude that LSTMs, given they are trained with sufficient data, are a viable and most likely a better alternative for climate change impact assessments.
In my opinion the title does not fully express what you actually did in the study. You assessed the adequacy of both traditional models and LSTMs. Not sure, you might want to highlight that.
The authors address an important topic and use a suitable dataset and methods for the evaluation. However, the main problem I have with the paper is the structure:
Nowhere in the paper do the authors refer back to the three objectives outlined at the end of the introduction. I would expect that you explain how you are going to achieve the objectives in the methods, show the respective results and discuss those, and ideally come back to the objectives in the conclusion. The paper seems unstructured in that regard and it is hard to understand which of the subchapters contributes to which objective, thus disconnecting the analysis from the objectives.
Another structural problem is already evident in the abstract: Your last sentence of the abstract highlights the analysis of precipitation elasticity and catchment analogues. I like these two analyses, but I find it strange that these are shown (including figures) in the discussion only. I do not see a reason why these analyses should not be structured into methods, results, discussion.
Additional Comments:l.130-134ff: It seems unclear at this point if the current study is also limited to >500km² and 30yr data. Suggest to add a brief explanation, particularly as you mention later (l.159) that you restrict to 20yr streamflow data.
l.136: I think the term 'scenario' is somewhat misleading in this context. I suggest to write something along the lines "An extra set of 1000 donor catchments was selected for an additional LSTM application."
l.149: against the background of the previous sentence, it is unclear what you mean by 'common denominator for all models'
l.153-154: suggest to check Tarek et al. if that is generalizable for any catchment / region.
l.195: I assume the climatic descriptors are kept constant under climate change? Do you think condsidering a possible spatial shift of these climatic conditions under climate change could further improve the models? Perhaps a point worth discussing.
l.255: µ is missing in explanation
l.289-290. I do not understand this part. Why is data combined from multiple catchments for the computation of the objective function? And why was the NSE used for the LSTM's objective function and not the KGE as for the classical hydrological models?
l.305: Suggest to introduce LSTM-R and LSTM-C earlier in the manuscript and mention the respective 148 and 1000 catchments.
l.322-327: I assume when implementing one test, all other variables were held constant (so no combination of TMP and PCP changes)? I suggest to mention that.
l.384-385: why no low flow metric, such as the annual minimum streamflow? Also, I suggest to add the time periods for which these metrics are calculated (I assume the hindcast and the future climate?)
l.395-397: You mention NSE and KGE at three locations now. I am confused what data and models are used for which metric. And if different metrics were used for different models, I see a significant bias here given that you compare other metrics (see 2 comments further below). I suggest to mention the KGE and NSE metrics only once in the manuscript to avoid confusion. Also, in l.403 you mention NRMSE which was not mentioned in the methods.
l.410: You did not introduce the optimum values for each metric. I suggest to add this information to where the metrics are introduced first, or/and the optimum could be added as a line to the diagrams.
l.431ff: Could the difference you see in the variability ratio between conventional and LSTM models be due to the different objective functions you used (KGE vs NSE)? The tradeoff between the different performance criteria is interesting. For further discussion of the relationship between the performance criteria, you can look into Guse et al. 2020 (https://doi.org/10.1080/02626667.2020.1734204)
l.442-443: suggest to summarize the main results of the other metrics here. I would assume the QMM is of interest to many readers.
l.465: I do not understand why a more pronounced response to temperature changes implies providing more accurate projections. Without comparing this to observations, I think this is not defendable. Please explain.
l.500-501: I am not sure about this statement. The classical hydrological models project an increase in streamflow. That would mean precipitation is overruling temperature.
l.509: what do you mean by "near the surface"?
l.532: Why did you evaluate this section only for QMA?
l.561: I do not understand what you mean with "to prevent contamination"
l.576-577: Why do you write here that the LSTMs only use the climatic data? I suggeest to add "..rain, and snow besides the catchment descriptors represent a fraction...".
l.582-585: There are many hydrological models around that can make use of additional physical and temporal data. While I understand the advantage of complex LSTMs that can ingest all this data, I think you should not overstress this 'advantage' only because you chose traditional lumped hydrological models only, that cannot use additional data.
l.625-629: Isn't it also reassuring that the different classical hydrological models performed similar? It could also mean that the projections can be considered robust.
l.634-635: Well, this is clearly beyond scope for your study, but you could at least discuss that there are studies that used historical streamflow change to evaluate models (see for instance Eyring et al. 2019, https://doi.org/10.1038/s41558-018-0355-yor; Kiesel et al 2020, https://doi.org/10.1007/s10584-020-02854-8 who both propose out-of-sample evaluations). And also, Krysanova et al (2018, already cited in your manuscript) provide a 5-step validation procedure that allows an assessment of how well models are suitable for climate change impact assessments. This might also be worth discussing.
l.646-650: I do not completely agree with this statement based on your study, considering that other traditional models exist that utilize additional data sources as well. I suggest to add to l.648: ... a theoretical advantage over the four traditional hydrological models used in this study.
l.660: Why is this a disadvantage of conceptual hydrological models? You also fixed the LSTM parameters after training?
l.685-694: in l.690, could you mention percentages instead of "a few" or "most"? Also, you could simply calculate the elasticity ratios yourself based on your the historical data?
l.773-777: I suggest to also mention that you did not include more physically-based hydrological models that can utilize additional data which could react differently to the future climate forcings.
Minor comments:
l.31 consider "studies evaluate" to avoid repeating "assess"
l.182 comprehnsive -> comprehensive
l.280 ...performance gains in other... ?
l.370 you express pcp change in %, therefore *100 should be added to the calculation example.
l.511: wet vs dry models
l.576: mpdels -> models
l.671: electricity -> elasticity
l.720: unit should be m3 s-1 km-2 ?
l.730 and supplementary material Figures: suggest to add or mention the RMSE unit of normalized streamflow.
Citation: https://doi.org/10.5194/egusphere-2024-2133-RC3 - AC2: 'Reply on RC3', Jean-Luc Martel, 30 Oct 2024
Data sets
HYSETS - A 14425 watershed Hydrometeorological Sandbox over North America R. Arsenault, F. Brissette, J. L. Martel, M. Troin, G. Lévesque, J. Davidson-Chaput, M. Castañeda Gonzalez, A. Ameli, and A. Poulin https://doi.org/10.17605/OSF.IO/RPC3W
Model code and software
LSTM climate change paper codes and data R. Arsenault, J.-L. Martel, and F. Brissette https://osf.io/5yw4u/
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
465 | 130 | 78 | 673 | 39 | 5 | 2 |
- HTML: 465
- PDF: 130
- XML: 78
- Total: 673
- Supplement: 39
- BibTeX: 5
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1