Using Random Forests to Predict Extreme Sea-Levels at the Baltic Coast at Weekly Timescales
Abstract. We have designed a machine-learning method to predict the occurrence of daily extreme sea-level at the Baltic Sea coast with lead times of a few days. The method is based on a Random Forest Classifier and uses spatially resolved fields of daily sea level pressure, surface wind, precipitation, and the prefilling state of the Baltic Sea as predictors for daily sea level above the 95 % quantile at each of seven tide-gauge stations representative of the Baltic coast.
The method is purely data-driven and is trained with sea-level data from the Global Extreme Sea Level Analysis (GESLA) data set and from the meteorological reanalysis ERA5 of the European Centre for Mid-range Weather Forecasting. Sea-level extremes at lead times of up to 3 days are statisfactorily predicted by the method and the relevant predictor regions are identified. The sensitivity, measured as the proportion of correctly predicted extremes is, depending on the stations, of the order of 70 %.
The proportion of false warnings, related to the specificity of the predictions, is typically as low as 10 to 20 %. For lead times longer than 3 days, the predictive skill degrades; for 7 days, it is comparable to a random skill. These values are generally higher than those derived from storm-surge reanalysis of dynamical models.
The importance of each predictor depends on the location of the tide gauge. Usually, the most relevant predictors are sea level pressure, surface wind and prefilling. Extreme sea levels in the Northern Baltic are better predicted by surface pressure and the meridional surface wind component. By contrast, for stations located in the south, the most relevant predictors are surface pressure and the zonal wind component. Precipitation was not a relevant predictor for any of the stations analysed.
The Random Forest classifier is not required to have considerable complexity and the computing time to issue predictions is typically a few minutes on a personal laptop. The method can, therefore, be used as a pre-warning system triggering the application of more sophisticated algorithms to estimate the height of the ensuing extreme sea level or as a warning to run larger ensembles with physically based numerical models.