the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Assessing the stability of LSTM runoff projections in Switzerland under climate scenarios
Abstract. Climate change is intensifying the global water cycle, altering both mean runoff and extremes, and strengthening the need for reliable hydrological projections to support adaptation. Traditionally, such projections have relied on process-based models. More recently, machine learning models, and in particular Long Short-Term Memory (LSTM) networks, have shown strong skill in predicting and reconstructing runoff from observations, raising interest in their use for hydrological projections. However, their ability to provide stable and physically credible results when forced with future climates beyond their training domain remains largely unexplored. Here we evaluate this question in Switzerland, a region strongly exposed to warming due to its alpine environment and glacier influence. An LSTM trained on observed meteorological and discharge data is driven with CH2018 climate and glacier projections for 1981–2100, and benchmarked against Hydro-CH2018 simulations from the process-based model PREVAH under identical forcings. Results show that the LSTM reproduces key hydrological signals closely – wetter winters, drier summers, and elevation-dependent trends – consistently across catchments and climate chains. Divergences are most pronounced in alpine and glacier-fed catchments, where runoff dynamics are more complex, yet the main governing patterns are captured. The largest limitation arises for extremes, where the LSTM underestimates peak flows, consistent with previously reported saturation effects. Overall, this study demonstrates that LSTMs can deliver robust mean-flow projections and trends comparable to a process-based benchmark, while highlighting persistent challenges in representing hydrological extremes.
- Preprint
(27026 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-6058', Anonymous Referee #1, 20 Jan 2026
-
RC2: 'Comment on egusphere-2025-6058', Anonymous Referee #2, 24 Jan 2026
The study by Courvoisier et al. compares an LSTM architecture with a process-based model regarding runoff behaviour across Switzerland - for historical as well as future (extreme) climate change projections.
The paper addresses an important topic and I generally like how the authors approached the problem, presented and discussed the results. I enjoyed reading the work, particularly the discussion and interpretation.
However, from a process-based perspective and for explaining differences between the models, I am missing some crucial information. And I think a few additions to to the results could help with the interpretation. While I feel the paper has the quality to be accepted in HESS, I think a few of my comments require this to be a 'major revision'.
I hope the authors find my comments useful to improve the manuscript.
Comments to individual sections:Section 2.1: Suggest to better explain in the text what the differently labelled catchments in figure 1 are used for. e.g. unclear what "second part of our analysis" means in the caption at this stage.
It is unclear why 96 minimally impacted catchments were chosen and then projected on 307 - you introduce a bias already (e.g. missing out on reservoir impacts to name one obvious point)?Section 2.5: The glaciers play a crucial role in the future runoff predictions, I think. Could you add some more detail here. E.g. which/how many of the basins have no glacier extent in summer by end of century and is this consistently applied in both LSTM and PREVAH?
Section 2.6 and 3.1: You compare these two models, and try to attribute differences - but I don't know where the models actually differ. I strongly suggest to give an overview (a table with a side-by-side comparison would work nicely) of the diverging input data and model structure of PREVAH vs the LSTM - observed meteo, cc data (and meteo parameters used), cal-val-test approach, glacier data and simulation approach in PREVAH, spatial representation, observational vs projection catchments, you train the LSTM on natural catchments - what about reservoirs in the projection catchments - are those included in PREVAH...?
In this sense, I don't see a value in section 3.1 without mentioning PREVAH. I think it is more important to understand the differences than explaining the LSTM in detail alone (E.g. I'd be interested in how PREVAH simulates glacier, snow, ET which is needed to understand the differences presented (e.g. in section 4.5)).Section 4.3/4.4/4.7 I like the maps. However, I wonder how a scatter plot LSTM vs PREVAH Runoff (could add the corresponding r²) would look like next to the maps (one dot=one catchment, x=LSTM runoff, y=PREVAH runoff, color of dot could be represented by observational, projection, selected catchments or the elevation band color or the ecoregion ... whatever you find most appropriate). This would give a better sense of the actual differences between the models and allow more intuitive diagnostics for why you see the differences.
Major comments:
l.132ff I'd like to understand the bias adjustment and cc data better: Which "Swiss gridded observations" - are those the same that you forced the model with (chapter 2.2)? Did you conduct the QM yourself or does CH2018 provide that? I suggest to discuss the implications if the bias adjustment was done on a meteorology dataset different to the forcing dataset. Also, how did you prepare the cc data for the models: Spatial aggregation->Bias adjustment. Or Bias adjustment->spatial aggregation?l.220ff I like this cross validation methodology. I assume, per fold, you have 12 catchments that are validation, 12 test and 72 train. Is it correct that you took for valid: year 2016, 2017, 2018 of the 12 validation catchments; year 2019-2024 of the 12 testing catchments and the remaining years of the 72 training catchment for train? But what do the grey shades in Fig 2ab mean - do they correspond? what is the light grey in b that is not part of a?
l.242/369ff as far as I understand your LSTM model architecture is the same as Kraft et al. 2024 - with the only difference that you added dynamic glacier data. I'd have expected the LSTM would perform slightly better with this additional information. Do you have a (short) reasoning why it didn't (including Kraft et al's LSTM in the suggested table might help to explain this - see comment to 2.6)?
l.339 I am confused by this. Yes, fig 8 shows a stronger decline for PREVAH vs the LSTM, but I can't see that in fig 7. For 7b,c,d,f I even see the opposite.
l.347ff, 413ff and l.529ff I think the key question here is: Why is this happening and which model is closer to reality? I know that you cannot answer this question as it's the future. This likely goes beyond your scope, but I wonder if it would help (you could add your thoughts to the discussion):
1. when comparing to observed data. I assume these future events are significantly outside your training data? If that wouldn't be too extreme, you could look at the test data of the LSTM (section 3.5) and extract the maximum/minimum precip periods and check the same events for PREVAH, and evaluate which model is closer to obs (this could help for your discussion in chapter 5.7)?
2. to attribute this to PREVAH's 'knowledge' of the driving processes (such as the additional climate data it gets to calculate ET) different constraints (or no constraints) for the glacier extent, the internal mass balance constraint that the LSTM doesn't have (could the application of a mass-conserving LSTM improve the situation)?l.584 You earlier mention that this interpretation of low-flow performance requires caution and I don't see this adequately evaluated in your paper to merit mentioning this in the short conclusion.
Minor comments:l.93 when does the data end/what was your cutoff? Mention the spatial resolution.
l.96 PRISM and SYMAP - reference and spell out on first use
l.102 how did you spatially average in detail? Extract entire cells or did you 'split' cells? If the catchment area is small and the grids large, you can introduce errors particularly in mountainous terrain.
l.116 Can you give a rational why you used topsoil information only?
l.126 suggest to add that it is based on the CMIP5 framework
l.129 you earlier write that the resolution was 2km - how is the product downscaled from the 12-50km CORDEX resolution to 2km?
l.148 Suggest to make it clearer whether you did any glacier simulations of if this was Brunner et al. 2019b work.
l.150 I think you mean section 3.2
l.185 in section 2.5 you cite Brunner et al. 2019b as the source for the glacier data
l.201 I don't see the 24 tested alternatives in Kraft et al. 2025 section 3.6 - seems you employed the pre-print version of their approach?
l.251 fig 4: suggest to add the number of catchments per boxplot (n=x) in the caption.
l.261 I don't think DJF is a period where much melt dynamics is going on in Switzerland and JJA is not really low flow in most of the catchments (you mention this yourself in l.271-272)
l.262 add that this is about the annual panel
l.307 I think the LSTM surpasses PREVAH by 2060, or?
l.327 fig 8 caption "(mm period−1 vs 1991–2020) for 2071-2100" I think you mean something like 2071-2100 - 1991–2020?
l.476-482 the language here suggests we are still in results. Suggest to rephrase.
l.539 deeper deeper
l.576 whether
l.715, l.623 provide links to final paper?
Citation: https://doi.org/10.5194/egusphere-2025-6058-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 243 | 133 | 15 | 391 | 24 | 26 |
- HTML: 243
- PDF: 133
- XML: 15
- Total: 391
- BibTeX: 24
- EndNote: 26
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Sanika Baste