the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Real-time flood forecasting with Machine Learning using scarce rainfall-runoff data
Abstract. Flooding is the most devastating natural hazard that our society must adapt to worldwide, especially as the severity and the occurrence of flood events intensify with climate change. Several initiatives have joined efforts in monitoring and modelling river hydrodynamics, in order to provide Decision Support System services with accurate flood prediction at extended forecast lead times. This work presents how fully data-driven machine learning models predict discharge with better performance and extended lead-time, with respect to the current empirical Lag and Route model used operationally at the local flood forecasting services for the Garonne River in Toulouse. The database is composed of discharge and rainfall data, upstream of Toulouse, for 36 flood events over the past 15 years (40 k data points). This scarce data set is used to train a Linear Regression, a Gradient Boosting Regressor and a MultiLayer Perceptron in order to forecast the discharge in Toulouse at 6-hour and 8-hour lead times. We showed that the machine learning approach outperforms the empirical Lag and Route for 6-hour lead-time. It also provides a reliable solution for extended lead times and saves the implementation of a new empirical Lag and Route model. It was demonstrated that the scarcity and the heterogeneity of the data heavily weigh on the learning strategy and that the layout of the learning and validation sets should be adapted to the presence of outliers. It was also shown that the addition of rainfall data increases the predictive performance of machine learning models, especially for longer lead times. Different strategies for rainfall data preprocessing were investigated. This study concludes that, with the present test case, time-averaged rain information should be favored over instantaneous or time varying data.
- Preprint
(18267 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2023-2621', Anonymous Referee #1, 18 Mar 2024
reply
Review of the paper hess-2023-2621: Real-time flood forecasting with Machine Learning using scarce rainfall-runoff data
General comments
The aim of this article is to implement and discuss the application of 3 data-driven models for flood forecasting on the Garonne River in Toulouse. The 3 models are a linear model and two types of Neural Networks: MLP and GBR. The title of the article mentions the fact that the available data are scarce, but this seems inaccurate to me. The database is derived from meteorological radar data and water level and flow data from several upstream stations, with a 1h time step, which seems to be a very satisfactory basis for designing such models.
The approach relies on a methodology based on choices that are not always well explained. Why radar and not raingauge data, how forecast horizons are chosen, why increment to flow and not flow is calculated, why neglect the rai upstream Toulouse, etc. But the methodology used seems rigorous insofar as the validation and test sets are quite different, and hyperparameters are selected by cross-validation. The authors also make no secret of the intense event of 2018 on which the forecasts fail. The description of the database lacks cross-correlations between rainfall and discharge, or better still, discharge increases. This would help identify propagation or response times, and the significance of the rainfall used.
The quality criteria chosen: Nash and persistence are both used, which is good and rigorous. The EPR focuses on the peak, but it seems to me that measuring only the error on the peak's maximum value would be more selective than the proposed calculation, to avoid diluting the value of the peak's maximum in those of the peak as a whole. The results would, it seems to me, be very different.
The main problem with the study is that all the models, whether linear or non-linear, produce virtually the same results, and that the latter are of no operational use, since the actual forecast horizon is closer to the hour than to the envisaged forecast horizon (4, 6, 8h).
Furthermore, in the description of the models, it is necessary for each model to have the exact inputs applied. For example, it is clearly stated that upstream flows are applied, but is the flow at Toulouse also entered at time t?
If so, we should try to remove it. Indeed, this information is so important to the model that it runs the risk of ignoring the rainfall, especially if the latter is not very well observed.
My feeling is that the article presents a negative result: the neural networks fail to forecast the flow of the Garonne at Toulouse. In itself, a negative result can be published, but an attempt must then be made to explain it, and it is this last phase of the work that is lacking in the article. The authors do try to indicate that over-fitting is the cause of this lack of performance, but without argument. Moreover, this hypothesis contradicts the proposal to use an LSTM to perform prediction, as the LSTM is far more complex than, say, the MLP.
I would therefore urge the authors to attempt a more precise analysis of the quality of the precipitation applied, in order to find out whether it is the latter that is the cause of the models' inability to make a correct forecast for the 2018 event.
Specific comments
LL 81-82 “Most strategies are based on Neural Network (NN) models such as Multilayer Perceptron (MLP) (Riad et al., 2004; Mosavi et al., 2018; Noymanee and Theeramunkong, 2019), which is a simple version of feed-forward neural networks”. The sentence is not right. MLP can be also recurrent.
L85 “ highly correlated networks”. What does ”highly correlated networks” means ?
LL 107 “and the flow is quasi-linear,”. The flow in itself cannot be linear or not linear. It is important to pay great attention to the strict meaning of what we write.
Ll 89-90 “Recent publications also showed the use of advanced NN models such as Recurrent Neural Networks (RNNs) and more specifically Long Short-Term Memory networks (LSTMs)” In fact, recurrent networks have been proposed since the 90s, as has the LSTM.
L 148 “It is here assumed that the effect of rainfall between these stations and Toulouse is negligible”: A correlatory analysis between flow increase and rainfall between the upstream stations and Toulouse would be a good way of proving its validity.
Eq. 2.2. The coherence of equation 2.2. doesn't immediately strike me: why do we have t-n and positive times in the first line, and t+n and negative times in the third? It seems to me that both lines must be written in the same way.
Results analysis
First of all, the criterion tables in Figure 2 show that the results are quite good, except for the July 2018 event. Indeed, when we look at the limnigram in figure 3, the result is far less satisfactory. All the forecast peaks are about 4 or 5 hours late. This is obviously not satisfactory for a 6-hour forecasting horizon especially knowing that the event is in the training database. The performance of the different models is not significantly different.
The authors express the view that the poor results could be due to over-fitting. However, to be able to draw any real conclusions, we would need to know the complexity of the PM: how many layers, how many neurons, how many parameters? This is important if the radar image is to be applied as is.
We can see that as the forecast horizon increases, the offset of the forecast curve increases accordingly.
The diagnosis that can be made is that the model waits for the observed flow value to increase at its input before passing this increase on to its output. The model therefore relies essentially on its flow inputs and neglects rainfall inputs. This is the main drawback of the feedforward model. This could indicate that the rainfall is not of good quality, or that the rain is falling very close to Toulouse and that the response time is therefore very short for the 2018 event. It would be good to discuss this point in the article. The visualization of the rainfall on the graph does not remove the doubt because, unless I read it inattentively, we don't know exactly what it represents (the average over all pixels)? If this is the case, it can't help us to answer the previous question.
Moreover, rainfall is only applied to the model via an instantaneous value (of the entire radar image?) or via the average (over 24h -2h). My recommendation is therefore twofold:
- firstly, to calculate the cross-correlation between radar rainfall and flow increase, and secondly, to do the same calculation with rain gauge information (several to introduce spatialization). If rain gauges have a better correlation with flow increase, then we try to use them.
- To study the question of initialization, of the model, this doesn't seem to be decisive, as all the models have the same behavior even though they have different architectures. But this is important information.
Technical corrections
L 152: “As the non-linearity of the flow”. Flow alone is not linear or non-linear, but the upstream_flow-downstream_flow relationship can be. Please tell us if this is the relationship we're talking about;
Eq. 2.2. The coherence of equation 2.2. doesn't immediately strike me: why do we have t-n and positive times in the first line, and t+n and negative times in the third? It seems to me that both lines could be written in the same way. This is not a good way of differentiating between calculation with mean value and calculation without mean value.
Table 1: Is Q or deltaQ used for input and output? Indeed, what is indicated in the table seems to contradict what is written in equation 2.1 where it is the DelatQ that is estimated? You also need to be careful with difference calculations: they amplify noise, unlike additions, which average it out.
The use of acronyms such as TDS adds nothing to the reading and forces the reader to go back and look up the definitions on the previous pages. It would be better to spell out all names.
LL 224-225,I do not understand the sentence: “The ML models are trained to predict an ensemble of discharge values in TPN, issued hourly at the targeted forecast lead time, for all events in the DB.” In fact, the test cannot be the entire database containing the learning events.
Figure 3: it is july 2018 and not 2028.
Citation: https://doi.org/10.5194/egusphere-2023-2621-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
282 | 123 | 14 | 419 | 6 | 7 |
- HTML: 282
- PDF: 123
- XML: 14
- Total: 419
- BibTeX: 6
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1