Skilful probabilistic predictions of UK floods months ahead using machine learning models trained on multimodel ensemble climate forecasts
Abstract. Seasonal streamflow forecasts are an important component of flood risk management. Hybrid forecasting methods that predict seasonal streamflow using machine learning models driven by climate model outputs are currently underexplored, yet have some important advantages over traditional approaches using hydrological models. Here we develop a hybrid subseasonal to seasonal streamflow forecasting system to predict the monthly maximum daily streamflow up to four months ahead. We train a random forest machine learning model on dynamical precipitation and temperature forecasts from a multimodel ensemble of 196 members (eight seasonal climate forecast models) from the Copernicus Climate Change Service (C3S) to produce probabilistic hindcasts for 579 stations across the UK for the period 2004–2016, with up to four months lead time. We show that multi-site ML models trained on pooled catchment data together with static catchment attributes are significantly more skilful compared to single-site ML models trained on data from each catchment individually. Considering all initialization months, 60 % of stations show positive skill (CRPSS > 0) relative to climatological reference forecasts in the first month after initialization. This falls to 41 % in the second month, 38 % in the third month and 33 % in the fourth month.