the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
From Weather Data to River Runoff: Leveraging Spatiotemporal Convolutional Networks for Comprehensive Discharge Forecasting
Abstract. The quality of the river runoff determines the quality of regional climate projections for coastal oceans or other estuaries. This study presents a novel approach to river runoff forecasting using Convolutional Long Short-Term Memory (ConvLSTM) networks. Our method accurately predicts daily runoff for 97 rivers within the Baltic Sea catchment by modeling runoff as a spatiotemporal sequence defined by atmospheric forcing. The ConvLSTM model performs similarly to traditional hydrological models, effectively capturing the intricate spatial and temporal patterns that influence individual river runoff across the Baltic Sea region. Our model offers the advantages of faster processing and easier integration into climate models.
- Preprint
(8247 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2685', Anonymous Referee #1, 05 Nov 2024
General Comments: The authors present a machine learning approach to forecasting river runoff
from weather data using convolutional long-short-term memory neural networks. They present
convincing evidence that the utilized ml model shows results of equal quality as its training data. At
the same time the ml method offers faster processing speeds and thus an easier direct integration
into regional climate models. With this approach, they present a scientifically significant and
qualitative contribution to the integration of river runoff forecasting into climate models. While the
manuscript shows great potential, I think that it requires minor revisions. My comments are listed
below.
Minor Comments: As I do not possess a deeper understanding of the river runoff modelling and
come from the machine learning side, I will limit my comments mostly to the technical aspects.
First, to me it does not become clear exactly how well your training data performs in comparison to
other state-of-the-art models. I understand that your ConvLSTM is able to reproduce its training
data’s quality but I’m not fully able to grasp the strengths and weaknesses of the utilized training
model, which I can assumed are transferred to the ML model. It would be helpful to extend the
technical details section or the model section by a short description of the training data and
especially its strengths and weaknesses compared to other possible runoff forecasting models.
Although I see that the point of the paper is more the proof that is able to reproduce a state-of-the-
art river runoff forecasting and not the exact strenghts and weaknesses of the utilized training data,
it would help give perspective to the strengths of your method.
For example, your training data seems to present a bias compared to observational data (Figure 7b),
which the network reproduces.
In connection to that, you describe that you utilize the time period from 1979 to 2011, because they
are not bias corrected. As a bias correction seems to be usually conducted, I would like to know if
that can be similarly performed on the ConvLSTM outputs.
Connected to that, have you tried to train the ConvLSTM on any other runoff models? Training for
400 epochs on daily training data from 32 years is a lot of training input. Just out of interest, have
you tried training on less data and how does the performance of the ConvLSTM differ? I would
guess, that not all hydrological models provide such a comprehensive dataset. Could you thus
comment on how easy it would be to extend this method to other runoff prediction models and how
much training data would be required.
I would also be interested, if all ocean/regional climate models are able to utilize runoff predictions
from similar sources or if they require their own in-model consistent runoff forcing. Because, if
other climate models would require the ConvLSTM to be trained on different runoff predictions, it
would significantly limit this method’s applicability if that runoff model would be required to
possess such a comprehensive training dataset as the EHYPE model presented in your study.
Additionally, I would be interested out of curiosity how many timesteps are necessary for the LSTM
to significantly improve the CNN output. Have you tried training with significantly less than 30
timesteps? What was your reasoning behind choosing these 30 days? Or was it just based on model
performance/loss functions?Finally, you claim that “While the initial training of the model requires
substantial computational resources, it remains significantly less intensive than running
comprehensive hydrological models” (Page 17). Could you give an estimate on how big this“significant” reduction of computational resources is? Because in the end this time saving is the
important improvement of your method compared to other numerical prediction systems/models.
In general I felt the content of the paper was novel and the method would be of interest to others in
the field, but some details should be explained further or lack a bit of background information.Citation: https://doi.org/10.5194/egusphere-2024-2685-RC1 -
RC2: 'Comment on egusphere-2024-2685', Anonymous Referee #2, 10 Nov 2024
General Comments:
The authors introduce an impressive new way to forecast river runoff using ConvLSTM network models at the scale of the Baltic Sea catchment. Their study demonstrates that ConvLSTM can accurately predict daily runoff for 97 rivers in the Baltic Sea area by using weather data. The study shows the trained ConvLSTM model, on daily timescales, can predict river runoff with an accuracy of ± 5% compared to the original E-HYPE data. Impressively, the ConvLSTM model performs just as well as traditional hydrological models in capturing runoff patterns, but with faster processing and greater computational efficiency. This makes it a great fit for integration into regional climate models, enabling real-time runoff forecasting and improving the accuracy of coastal climate impact predictions. Overall, the authors make a strong case for ConvLSTM networks being well suited for integrating in regional climate models and a valuable tool for real-time river runoff prediction during climate projections. The ConvLSTM model proves reliable when using the river runoff in a comprehensive ocean model of the Baltic Sea to predict salinity.
While I think this is a strong article, I have a few minor comments that I think could enhance it.
Specific Comments:
My suggestions for the technical sections are as follows:
- In Väli et al. (2019), they originally generated 97 potential freshwater input locations from rivers in the Baltic Sea area, but this was later reduced to 91 in the final dataset. Could you clarify this discrepancy and explain why you state that 97 inputs are used in the ConvLSTM model?
- In the "Runoff Data for Training" section, it would be useful to add context around why the BMIP project runoff data is necessary. Specifically, it would help to mention that a new homogeneous runoff dataset was created because no consistent river discharge data was available for the full period (1961–2018), and the E-HYPE model originally only covered a few recent years.
- In the "Runoff Data for Training" section, while you clearly state that the 1979-2011 period of E-HYPE hindcast simulation data is used, it would improve clarity to specify that this data is on a daily scale. Additionally, while you indicate which periods were excluded (1961-1978 and 2012-2018), providing more insight into the spatial and temporal adjustments applied to these excluded periods would help to justify the selection the 1979-2011 data. For example:
- The 1961-1978 data, based on Bergstrom & Carlsson (1994), was interpolated from monthly to daily values.
- The 2012-2018 data is an E-HYPE forecast product, but further clarification on why this recent data was omitted would be beneficial.
- Additionally, as noted in Väli et al. (2019), the Neva River is an exception, with its data coming from observational records (1961-2016) from the Russian State Hydrological Institute, rather than E-HYPE hindcasts. This exception should be explicitly highlighted as it is one of the four river locations evaluated in detail.
- Finally, In the "Runoff Data for Training" section, the statement that the “quality of the runoff was extensively evaluated” is a bit broad. Since you are comparing ConvLSTM model output against E-HYPE hindcast data, it would help to detail the methods used for this evaluation. Including a statement about confidence in the BMIP data would also be valuable. Specifically, you might note from Väli et al. (2019) that the BMIP dataset closely aligns with historical observations for various rivers and with Bergstrom & Carlsson (1994)’s dataset, showing a difference of under 1% for total Baltic Sea runoff. This would reinforce the reliability of the BMIP data in the ConvLSTM modelling.
- In the "Atmospheric Forcing" and "Ocean" sections confirm the temporal resolution of the Essential Climate Variables (ECV) are daily.
- In the "Atmospheric Forcing" and "Ocean" sections, the horizontal resolution of the models I think should be expressed in consistent units. It would be clearer to use kilometres (km) throughout rather than miles.
- The ConvLSTM model was trained and tested using daily data from 1979 to 2011, with 80% for training, 10% for validation, and 10% for testing. It performed well on both the training and test data. Have you thought about how reducing the training data might impact the model's performance? This could give you some insight into the model robustness with less data.
- It might be helpful to mention that the rivers feeding freshwater into the Baltic Sea have runoff data that is not stationary. One of the benefits of using LSTM models over other machine learning (ML) methods is that they’re specifically designed to capture patterns and dependencies in sequences, making them a great fit for non-stationary data like this. Could you comment on alternative ML models that might be suitable for runoff prediction for freshwater inputs into the Baltic Catchment.
- The paper goes into a lot of detail about the ConvLSTM model architecture in the ‘Implemented model architecture’ section (Section 2), but the ‘Neural network hyperparameters’ section 3.4 could use a bit more explanation. It would be helpful to explain the model architecture a bit more such as whether a sequential model was used, which lets you stack layers in a simple, linear way. Also, it would help to clarify how the hyperparameter values in Table 1 were determined.
- Neural network hyperparameters section 3.4, Table 1 shows details for only one layer, but it is unclear how many units (neurons) were in that layer. Did you consider adding more layers to help the model capture more complex patterns? Also, did you include a Dropout layer after the LSTM layer to help prevent overfitting by randomly dropping some neurons during training? For the Fully Connected Layer, did you use a dense layer to create the final output? And when compiling the ConvLSTM model, what loss function (like MSE or MAE) and optimizer (e.g., Adam) did you use?
- When fitting the ConvLSTM model to the training data, how did you decide on the number of epochs (400), batch size (50), and learning rate? Did you choose these values through trial and error, or did you use a more structured approach like grid search or randomized search to find the best model and parameters? Also, was Early Stopping used to prevent overfitting by stopping training when the validation loss started increasing?
- Additionally, it would help to explain why 30-day timesteps were chosen for the 4-channel atmospheric inputs, even though the runoff data is daily. This would make it clearer how the model is handling temporal input.
- Neural network hyperparameters section 3.4, you mention that "the model performance can be described as relatively robust when slightly changing the set of hyperparameters." Could you clarify what you mean by "slightly changing" the hyperparameters? It would be more helpful if you could quantify the model's performance for different sets of hyperparameter values to give a clearer picture of its robustness.
- In Section 4.1, where the ConvLSTM model’s output is compared to the test data, Figure 5 shows the “total predicted river runoff.” Could you clarify what exactly “total predicted river runoff” means? Does it represent the combined daily predicted runoff from all 97 rivers flowing into the Baltic Sea or for the four individual rivers? I’m assuming it’s the total daily runoff for all these rivers, but confirming this would make things clearer.
- In Section 4.1, could you clarify why the E-HYPE data is labeled as the “original HYPE data” in the Figure 5 caption and in the results text, and in Figure 6’s text and legend, it may be better to just refer to it as “E-HYPE data”? It would be clearer if it were consistently referred to as “E-HYPE data” throughout.
- Additionally, only a minor point to avoid confusion, it would help to consistently refer to “total predicted river runoff” instead of switching between “predicted river runoff” (as in line 215). “Original river runoff” should also be replaced with “E-HYPE data” for clarity.
- In Section 4.1, it might be helpful to present the model performance results for the “total predicted river runoff” into the Baltic Sea in a table, showing metrics like accuracy, correlation, RMSE, and MAE. If available, showing these metrics individually for the four rivers—Neva, Oder, Umeälven, and Neman—would add useful detail, rather than only displaying residual errors for the daily runoff predictions on the four plots. Additionally, including density plots for the predictions of each of these four rivers could provide a clearer view of the model’s performance for individual rivers.
- In Section 4.1, you make the point that the ConvLSTM model’s performance is already satisfactory, the “discrepancies between the actual values and the predictions can partly be attributed to the use of a different atmospheric dataset than the ones originally used to drive the E-HYPE model”. This is a key point, and it would be helpful to draw out this point earlier in the technical section when outlining the runoff data used for training and when describing the input datasets for the atmospheric forcing.
- In Section 4.1 in Figure 5, you describe the right panel (b), showing the distribution of residuals as a density plot with a Gaussian shape—a bell curve centred around zero. You mention that there is no systematic bias, with residuals mostly within a narrow range around zero, though there is a slight positive bias at the peak. Could you explain this slight positive bias? Is it related to the differences in atmospheric datasets used in the ConvLSTM model versus those originally used as input in the E-HYPE model?
- In Figure 6, it might be helpful to use the same y-axis value range for all four plots showing the residual error. This would make it easier to see that Neva has the lowest residual error, with the ConvLSTM model’s total predicted runoff within +/- 2.5%, compared to the other rivers.
- In Figure 6, the prediction errors are larger for the other three rivers compared to the River Neva. Have you considered whether this could be because the River Neva uses observed runoff data, while the other rivers rely on the E-HYPE hindcast simulation data? This could be linked to the issue of using different atmospheric datasets in the ConvLSTM model compared to the datasets originally used in the E-HYPE model.
- In Section 4.1 in the Figure 6 caption, you mention that “the residuals were calculated as the relative difference between the predicted and observed values, normalized by the observed.” However, the total runoff data is based on an E-HYPE hindcast simulation. Referring to “observed runoff data” makes it sound like this is measured runoff data from river gauges. Is this the case for all four rivers, or just for the Neva River? It would be helpful to clarify what the total predicted runoff data for each of the four individual rivers is compared to calculate the residual error.
- In Figure A1, the legend refers to the "hydrological model," but it would be clearer to specify the E-HYPE model and ConvLSTM model. For Neva, the comparison should be with the measured flow data, as it is not based on the E-HYPE hindcast simulation data. Additionally, in Figure A1, you refer to the residuals as the relative difference between the predicted and observed values. However, these are not actually observed values but rather E-HYPE simulated values, except for Neva. It would be helpful to clarify this distinction.
- In Section 4.2, specifically in line 235, you mention that the predicted salinity from the ConvLSTM model matches the "original data" well, capturing short-term fluctuations effectively. It would be helpful to clarify what you mean by "original data"—is this the salinity forced with the E-HYPE runoff, or is it the measured salinity at BY15? In the Figure 7 legend, it would be clearer to use "ConvLSTM model" and "E-HYPE model" instead of "original". Additionally, in the caption, it might be better to avoid "original E-HYPE data" and simply use "E-HYPE data."
- In Section 4.2, you don’t address why the salinity at the surface, and to some extent at the bottom, as computed using the ConvLSTM model and E-HYPE runoff prediction with the MOM5 Ocean model, does not match well with the observed salinity at BY15. Specifically, at the surface it tends to over-predict the high and low salinity cycles. It would be helpful to acknowledge this discrepancy and offer some possible explanation for it.
- In Section 5, you conclude that all results lie within the error margin of the hydrological model itself when compared to observations, with the average error on daily time scales for individual rivers mostly under 10%. It would be helpful to mention this average error of 10% earlier in Section 4.1, specifically around line 255, when introducing supplementary Figure A1. This would provide context for the reader before the conclusion in Section 5.
- In Section 5, you conclude that the ConvLSTM model is significantly less computationally intensive than running comprehensive hydrological models . Could you provide a more detailed quantification of the reduction in computational demand when forecasting with the ConvLSTM model compared to these hydrological models? Have you tested the computing speed against any other traditional hydrological modes or only the E-HYPE hydrological model to make this conclusion?
Technical Corrections:
Other minor suggestions:
- The title could be simplified by changing "leveraging" to "using" and removing "comprehensive" before "discharge forecasting." You could also add "into the Baltic Sea" for more clarity.
- The abstract is currently too broad, and it would benefit from including some specific numerical results to quantify the ConvLSTM model's performance against the e-HYPE data and compared to traditional hydrological models. This would demonstrate the ConvLSTM model's effectiveness at predicting runoff.
- Equation 1 line 60 for Xtk the k I think should be subscript.
- Overall, the study presents a novel approach to forecasting river runoff using ConvLSTM network models. The ConvLSTM model performs similarly to traditional hydrological models such as the E-HYPE model in capturing runoff patterns but offers faster processing and greater computational efficiency, which makes it a valuable contribution to the field. However, I think some details need further explanation, and in places more clarity is required. Minor revisions would strengthen the paper, but this use of ConvLSTM models to forecast runoff on such a widespread scale of the Baltic Sea catchment is definitely of interest to others in the field and a excellent study.
Citation: https://doi.org/10.5194/egusphere-2024-2685-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
183 | 60 | 23 | 266 | 5 | 5 |
- HTML: 183
- PDF: 60
- XML: 23
- Total: 266
- BibTeX: 5
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1