the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Extended range forecasting of stream water temperature with deep learning models
Abstract. Stream water temperatures influence water quality with effects on aquatic biodiversity, drinking water provision, electricity production, agriculture, and recreation. Therefore, stakeholders would benefit from an operational forecasting service that would support timely action. Deep learning models are well-suited to provide probabilistic forecasts at individual stations of a monitoring network. Here we train and evaluate several state-of-the-art models using 10 years of data from 54 stations across Switzerland. Static catchment features, time of the year, meteorological observations from the past 64 days, and their ensemble forecasts for the following 32 days are included as predictors in the models to estimate daily maximum water temperature over the next 32 days. Results show that the Temporal Fusion Transformer (TFT) model performs best with a Continuous Rank Probability Score (CRPS) of 0.70 °C averaged over all lead times, stations, and 90 forecasts distributed over 1 year. The TFT is followed by the Recurrent Neural Network Encoder – Decoder with a CRPS of 0.74 °C, and the Neural Hierarchical Interpolation for Time Series with a CRPS of 0.75 °C. These deep learning models outperform other simpler models trained at each station: Random Forest (CRPS = 0.80 °C), Multi-layer Perceptron neural network (CRPS = 0.81 °C), and Autoregressive linear model (CRPS = 0.96 °C). The average CRPS of the TFT degrades from 0.38 °C at lead time of 1 day to 0.90 °C at lead time of 32 days, largely driven by the uncertainty of the meteorological ensemble forecasts. In addition, TFT water temperature predictions at new and ungauged stations outperform those from the other models. When analyzing the importance of model inputs, we find a dominant role of observed water temperature and future air temperature, while including precipitation and time of the year further improve predictive skill. Operational probabilistic forecasts of daily maximum water temperature are generated twice per week with our TFT model and are publicly available at https://www.drought.ch/de/impakt-vorhersagen-malefix/wassertemperatur-prognosen/. Overall, this study provides insights on the extended range predictability of stream water temperature, and on the applicability of deep learning models in hydrology.
- Preprint
(3857 KB) - Metadata XML
-
Supplement
(3813 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2591', Anonymous Referee #1, 20 Sep 2024
The article investigates three main models on their aptitude of predicting the water temperature at specific locations of rivers in Switzerland. It goes on further to compare these models to a set of three simpler, more traditional ML models (RF, ARX and MLP). These models are evaluated in three distinct settings, namely when they were trained on data from all stations and only on a subset of stations while predicting the water temperature on gauged and ungauged stations. As their predictions, the models provide quantile forecasts and therefore directly a measure of uncertainty which is important in real world applications. In addition of investigating the predictive skills of each of the models, the article also provides an analysis of the feature importance for the best DL model (temporal fusion transformer).
All in all this work provides a valuable comparison of multiple model architectures for time series forecasting, probably acting as guidepost for future works.Main comments:
1) The model description (starting at L.123) is a bit crammed and for the three deep learning models one architecture illustration each would go a long way to making later aspects more understandable. The fact that NHITS, RNNED and TFT all use encoder and decoders is not clear, and neither is the fact where exactly they use the encoder normalisation. This, however, becomes important on L. 212 (p. 7) where you describe setup B, i.e., the models trained on 20 stations worth of data less and the swap from encoder normalisers to group normalisers. So the suggestion is to include (maybe simpler versions of) diagrams of the models' architectures, aiding the understanding of the later adaptation to the encoder. Another option would be to detail the encoding process for each model where its architecture is shortly described.
2) The description of the date index (DI) L.111 leaves the question of why it includes a shift by one month, s.t., DI=0 is approx. at the end of January instead of at the beginning? A short explanation with a reference would be nice here.
3) Lastly, section 2.1 describes the data used for training, with it also mentioning on L.97 that "catchment characteristics" are used. However, a list of which characteristics are considered is only given on L.186, in section 2.3. The suggestion is to also explicitly mention the four static characteristics near L.97.
Citation: https://doi.org/10.5194/egusphere-2024-2591-RC1 - AC1: 'Reply on RC1', Ryan Padrón, 17 Oct 2024
-
RC2: 'Comment on egusphere-2024-2591', Anonymous Referee #2, 27 Sep 2024
The authors evaluated three deep learning models for predicting daily maximum water temperatures at 54 river stations in Switzerland over the next 32 days. While the paper is generally well-structured, there are several issues that need to be addressed.
- Please add an LSTM model as a benchmark.
- Please provide common evaluation metrics such as correlation coefficient, NSE, and MAE. These metrics help readers compare this study's results with other related work.
- Line 25: The website provided by the authors does not seem to be updated in real-time, with the last update on August 30th. Please ensure the data is up-to-date.
- Line 31: The use of "e.g." followed only by references seems unconventional. The authors should add specific examples or restructure the sentence.
- Line 42: Please explain what "right direction" means in this context.
- Please explain the significance of 52 in Equation 1.
- Lines 114-116: Using single grid cell data for some time series variables is inappropriate. For variables like precipitation and air temperature, the authors should calculate catchment averages rather than point values.
- Line 119: The authors use Swiss Federal Office for Meteorology and Climatology data for training, and ECMWF data for the 32-day meteorological forecast as input for water temperature forecasting. These two datasets may be inconsistent in spatial-temporal resolution and data distribution. The authors must validate the effectiveness after switching datasets. Suggestion: Assuming today is August 1, 2024, the authors should use ECMWF data from July 1 to August 1, 2024, for prediction and compare it with actual measurements to verify the model's performance in real forecasting scenarios.
- The authors use 2022 data for testing, but the model can only predict 32 days at a time. Please explain how a full year of predictions is obtained. Is a sliding window method used? If so, please provide details.
- Line 164: Please explain the meaning of "We use 24 forecast creation times every 15 days".
- Line 179: The authors tuned parameters for the three deep learning models but used default parameters for ARX, RF, and MLP, which results in an unfair comparison. It is recommended to optimize parameters for all models to ensure a fair comparison.
- In the appendix, table headers should be placed above the tables.
- The meaning of table S3 is not clear. Please explain.
- Line 196: Why don't ARX, RF, and MLP use static features? If input data is inconsistent, the comparison loses meaning.
- Figure 2(b) is inappropriate because the x-axis (station ID) has no sequential relationship. It is suggested to use a box plot instead of a line graph.
- Figure 2(a): For lead times 1, 2, 3...32, does the CRPS refer to metrics calculated from predictions for the entire year 2022? Please explain how a full year of predictions and CRPS values are obtained through lead times 1, 2, 3...32.
- Figure 2(b): Different forecast start times will result in different CRPS values. Does this mean the results are calculated from each forecast start date to the end of 2022? Please explain the calculation method in detail.
- Please explain lines 237-239 in detail. What does "omitting the uncertainty stemming from the meteorological forecasts" mean? What specific operation does "when using their observed values instead of their forecasts over the 32 days of the prediction horizon" refer to?
- Line 311: Please explain the meaning of "90 forecast start dates".
- In Section 3.3, RF models can also calculate feature importance. It is suggested that the authors calculate the feature importance of the RF model and compare it with the results of the deep learning models.
Citation: https://doi.org/10.5194/egusphere-2024-2591-RC2 - AC2: 'Reply on RC2', Ryan Padrón, 17 Oct 2024
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
214 | 52 | 144 | 410 | 31 | 6 | 10 |
- HTML: 214
- PDF: 52
- XML: 144
- Total: 410
- Supplement: 31
- BibTeX: 6
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1