Real-time flood forecasting with Machine Learning using scarce rainfall-runoff data
Abstract. Flooding is the most devastating natural hazard that our society must adapt to worldwide, especially as the severity and the occurrence of flood events intensify with climate change. Several initiatives have joined efforts in monitoring and modelling river hydrodynamics, in order to provide Decision Support System services with accurate flood prediction at extended forecast lead times. This work presents how fully data-driven machine learning models predict discharge with better performance and extended lead-time, with respect to the current empirical Lag and Route model used operationally at the local flood forecasting services for the Garonne River in Toulouse. The database is composed of discharge and rainfall data, upstream of Toulouse, for 36 flood events over the past 15 years (40 k data points). This scarce data set is used to train a Linear Regression, a Gradient Boosting Regressor and a MultiLayer Perceptron in order to forecast the discharge in Toulouse at 6-hour and 8-hour lead times. We showed that the machine learning approach outperforms the empirical Lag and Route for 6-hour lead-time. It also provides a reliable solution for extended lead times and saves the implementation of a new empirical Lag and Route model. It was demonstrated that the scarcity and the heterogeneity of the data heavily weigh on the learning strategy and that the layout of the learning and validation sets should be adapted to the presence of outliers. It was also shown that the addition of rainfall data increases the predictive performance of machine learning models, especially for longer lead times. Different strategies for rainfall data preprocessing were investigated. This study concludes that, with the present test case, time-averaged rain information should be favored over instantaneous or time varying data.
Viewed (geographical distribution)