Preprints
https://doi.org/10.5194/egusphere-2023-2621
https://doi.org/10.5194/egusphere-2023-2621
31 Jan 2024
 | 31 Jan 2024

Real-time flood forecasting with Machine Learning using scarce rainfall-runoff data

Théo Defontaine, Sophie Ricci, Corentin J. Lapeyre, Arthur Marchandise, and Etienne Le Pape

Abstract. Flooding is the most devastating natural hazard that our society must adapt to worldwide, especially as the severity and the occurrence of flood events intensify with climate change. Several initiatives have joined efforts in monitoring and modelling river hydrodynamics, in order to provide Decision Support System services with accurate flood prediction at extended forecast lead times. This work presents how fully data-driven machine learning models predict discharge with better performance and extended lead-time, with respect to the current empirical Lag and Route model used operationally at the local flood forecasting services for the Garonne River in Toulouse. The database is composed of discharge and rainfall data, upstream of Toulouse, for 36 flood events over the past 15 years (40 k data points). This scarce data set is used to train a Linear Regression, a Gradient Boosting Regressor and a MultiLayer Perceptron in order to forecast the discharge in Toulouse at 6-hour and 8-hour lead times. We showed that the machine learning approach outperforms the empirical Lag and Route for 6-hour lead-time. It also provides a reliable solution for extended lead times and saves the implementation of a new empirical Lag and Route model. It was demonstrated that the scarcity and the heterogeneity of the data heavily weigh on the learning strategy and that the layout of the learning and validation sets should be adapted to the presence of outliers. It was also shown that the addition of rainfall data increases the predictive performance of machine learning models, especially for longer lead times. Different strategies for rainfall data preprocessing were investigated. This study concludes that, with the present test case, time-averaged rain information should be favored over instantaneous or time varying data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Théo Defontaine, Sophie Ricci, Corentin J. Lapeyre, Arthur Marchandise, and Etienne Le Pape

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-2621', Anonymous Referee #1, 18 Mar 2024
    • AC2: 'Reply on RC1', Théo Defontaine, 31 Jul 2024
  • RC2: 'Reply on Real-time flood forecasting with Machine Learning using scarce rainfall-runoff data', Anonymous Referee #2, 31 May 2024
    • AC1: 'Reply on RC2', Théo Defontaine, 31 Jul 2024
Théo Defontaine, Sophie Ricci, Corentin J. Lapeyre, Arthur Marchandise, and Etienne Le Pape
Théo Defontaine, Sophie Ricci, Corentin J. Lapeyre, Arthur Marchandise, and Etienne Le Pape

Viewed

Total article views: 794 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
496 265 33 794 28 19
  • HTML: 496
  • PDF: 265
  • XML: 33
  • Total: 794
  • BibTeX: 28
  • EndNote: 19
Views and downloads (calculated since 31 Jan 2024)
Cumulative views and downloads (calculated since 31 Jan 2024)

Viewed (geographical distribution)

Total article views: 815 (including HTML, PDF, and XML) Thereof 815 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 

Cited

Latest update: 05 Oct 2024
Download
Short summary
This work presents how machine learning models predict discharge and outperforms the 6 h lead-time empirical model used operationally in Toulouse. The 40 k points database includes discharge and rainfall data, for 36 flood events. The approach also provides a reliable solution for extended 8 h lead-time. The scarcity and the heterogeneity of the data, especially in presence of outliers, heavily weigh on the learning strategy. Rainfall data processing increases the predictive performances.