the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
BiasCast: Learning and adjusting real time biases from meteorological forecasts to enhance runoff predictions
Abstract. The use of deep learning models in hydrology is becoming an ever more prevalent application in operational flood forecasting. Such operational systems face performance degradation when transitioning from high quality reanalysis to meteorological forecast data with lower accuracy. This study investigates training strategies and Long Short-Term Memory network architectures to mitigate forecast-induced bias in maximum daily discharge predictions using the Extended LamaH- CE dataset and a subset of 451 basins. We systematically evaluated cross-domain generalization, transfer learning approaches, Encoder–Decoder LSTMs, Sequential Forecast LSTMs, and the role of input embeddings and integrating past discharge observations. The results show that domain shifts between reanalysis and forecast data lead to substantial skill loss, with median Nash–Sutcliffe Efficiency decreasing from 0.58 to 0.33. Among the tested strategies, the Sequential Forecast LSTM demonstrated the most stable improvements, achieving a median NSE of 0.63. Integrating recent discharge observations further enhanced performance, raising median NSE to 0.71 and surpassing even the reanalysis-driven baseline. In contrast, integrating archived forecasts or using more complex input embeddings did not yield consistent benefits and in some cases degraded model stability. These findings highlight the value of training strategies that allow models to directly learn bias correction during forecast transitions and emphasize the operational potential of combining sequential processing with near real-time discharge observations.
- Preprint
(1571 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-4978', Anonymous Referee #1, 02 Jan 2026
-
RC2: 'Comment on egusphere-2025-4978', Anonymous Referee #2, 01 Mar 2026
Review of HESS Manuscript: “BiasCast: Learning and adjusting real time biases from meteorological forecasts to enhance runoff predictions”
Dear editor, please find attached my review of the manuscript.
1. Scope
The scope of the article is inside the scope of HESS.
2. Summary
The authors proposed methods to improve model performance under forecast-induced bias (performance drop when models trained on reanalysis data are used with forecast data). They test different strategies including encoder-decoder lstms, sequential-lstms and transfer learning. Moreover, they investigate how linear embeddings, and the inclusion of past-observed discharge influence performance.
3. General comments
I think the article is really well written, with a good introduction, clear objective and well-posed experiments. The results are presented in a clean way. Even with many model variations, it was easy to follow, which is not always the case. The only major limitation of the study is that the authors are limiting themselves to one-day-ahead prediction, when the LSTMs architectures and the available data already allow them to increase the lead times. This limits the conclusions that can be drawn, especially with the effect of decaying quality of the forecast as the lead time increases. From lines 95 and 489, I understand that this article is a stepping stone for multi-day prediction, so I understand that they want to leave that for a future study, but then they should clearly state this in the limitations section.
4. Specific comment
Line 139-142: It is good that you used embeddings, but this paragraph makes it sound as if an LSTM without embeddings cannot capture non-linear relationships nor learned how to combine the features, which they actually can. The main advantage of embeddings is that (1) you can reduce large input dimensions into smaller latten spaces, (2) the embeddings can learn to compensate for systematic bias in your data, for example if you use a different embedding for hindcast and forecast, and (3) if you use different type/groups of inputs with different number of variables, you can map them to a shared dimension for further processing (e.g Acuna2025 for multiple frequencies or Gauch2025 for missing data).
Line 146-148: I do not understand what you are trying to say, can you please rephrase or further explain?
Line 259-262: The problem with using tanh as the activation of the embeddings is that tanh saturates. Saturation is a known problem for lstms (Kratzert2024, Acuna2025, Baste2025), and I think it can be further increased if you also saturate the input before it goes into the LSTM. Was there a specific reason you used tanh? Have you tried if ReLu gives you better results, especially considering that in section 3.5 you indicate that more complex embeddings gave you worse performance for the enc-dec and seq-lstm. Are you using dropout in the more complex embeddings to avoid overfitting?
Section 3.1: Can you further explain the difference between BaseLine Reanalysis and CrossDomain (Reanalysis, Pretrain)?
Line 324-326: The sequential data processing is not only done in the sequential lstm, is it? I agree with what you said at the end of the paragraph, that sequential-lstm is better than encoder-decoder, because the hindcast-forecast transition is done on the same lstm instead of having to initialize a new one, especially in your case, where the forecast part is only run for one day. But this makes it sound like only the sequential-lstm process data sequentially and in temporal order, which is not true.
Line 435: The limitation of integrating reanalysis data depends on the test case. Multiple meteorological services have real-time observed data (from stations or radar), which is the data that can be included in the hindcast period, and then the forecast data comes from the meteorological models. I understand that if you are thinking on a global or continental scale, you might need reanalysis data, but in national-scale applications, you can directly use observed data (if the country has this available).
5. Recommendation
Dear editor, given the quality of the preprint, I recommend accepting it subject to minor revisions.
References:
Acuña Espinoza, E., Kratzert, F., Klotz, D., Gauch, M., Álvarez Chaves, M., Loritz, R., & Ehret, U. (2025). Technical note: An approach for handling multiple temporal frequencies with different input dimensions using a single LSTM cell. Hydrology and Earth System Sciences, 29(6), 1749–1758.
Acuña Espinoza, E., Loritz, R., Kratzert, F., Klotz, D., Gauch, M., Álvarez Chaves, M., & Ehret, U. (2025). Analyzing the generalization capabilities of a hybrid hydrological model for extrapolation to extreme events. Hydrology and Earth System Sciences, 29(5), 1277–1294. https://doi.org/10.5194/hess-29-1277-2025
Baste, S., Klotz, D., Acuña Espinoza, E., Bardossy, A., & Loritz, R. (2025). Unveiling the limits of deep learning models in hydrological extrapolation tasks. Hydrology and Earth System Sciences, 29(21), 5871–5891. https://doi.org/10.5194/hess-29-5871-2025
Gauch, M., Kratzert, F., Klotz, D., Nearing, G., Cohen, D., & Gilon, O. (2025). How to deal w___ missing input data. Hydrology and Earth System Sciences, 29(21), 6221–6235. https://doi.org/10.5194/hess-29-6221-2025
Kratzert, F., Gauch, M., Klotz, D., & Nearing, G. (2024). HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin. Hydrology and Earth System Sciences, 28(17), 4187–4201. https://doi.org/10.5194/hess-28-4187-2024Citation: https://doi.org/10.5194/egusphere-2025-4978-RC2
Data sets
Experimental Setups and Results for "BiasCast: Learning and adjusting real time biases from meteorological forecasts to enhance runoff predictions" Oliver Konold et al. https://doi.org/10.5281/zenodo.17241922
Extended LamaH-CE: LArge-SaMple DAta for Hydrology and Environmental Sciences for Central Europe Oliver Konold et al. https://doi.org/10.5281/zenodo.17119634
Model code and software
Forked NeuralHydrology Version Oliver Konold https://github.com/conestone/neuralhydrology
Interactive computing environment
Experiments and Results Code for "BiasCast: Learning and adjusting real time biases from meteorological forecasts to enhance runoff predictions" Oliver Konold https://github.com/conestone/biascast
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 351 | 301 | 23 | 675 | 125 | 254 |
- HTML: 351
- PDF: 301
- XML: 23
- Total: 675
- BibTeX: 125
- EndNote: 254
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript addresses the challenge of deploying machine-learning hydrological models in operational forecasting by explicitly considering domain shift between reanalysis and forecast meteorological inputs. The authors explore alternative training strategies and LSTM architectures to improve 1-day streamflow forecasts, and the results suggest that architectures combining hindcast and forecast phases, which use reanalysis and forecast data respectively, provide the greatest performance gains. The study tackles an important problem, presents interesting results, and is structured well. Some additional analysis and clarifications would further strengthen the interpretation of the experiments and results.
General comments
Specific comments
Technical corrections
References
Seibert J, Vis MJP, Lewis E, van Meerveld HJ. Upper and lower benchmarks in hydrological modelling. Hydrological Processes. 2018; 32: 1120–1125. https://doi.org/10.1002/hyp.11476