the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Computing Extreme Storm Surges in Europe Using Neural Networks
Abstract. Because of the computational costs of computing storm surges with hydrodynamic models, projections of changes in extreme storm surges are often based on small ensembles of climate model simulations. This may be resolved by using data-driven storm-surge models instead, which are computationally much cheaper to apply than hydrodynamic models. However, the potential performance of data-driven models at predicting extreme storm surges is unclear because previous studies did not train their models to specifically predict the extremes, which are underrepresented in observations. Here, we investigate the performance of neural networks at predicting extreme storm surges at 9 tide-gauge stations in Europe when trained with a cost-sensitive learning approach based on the density of the observed storm surges. We find that density-based weighting improves both the error and timing of predictions of exceedances of the 99th percentile made with Long-Short-Term-Memory (LSTM) models, with the optimal degree of weighting depending on the location. At most locations, the performance of the neural networks also improves by exploiting spatiotemporal patterns in the input data with a convolutional LSTM (ConvLSTM) layer. The neural networks generally outperform an existing multi-linear regression model, and at the majority of locations, the performance of especially the ConvLSTM models approximates that of the hydrodynamic Global Tide and Surge Model. While the neural networks still predominantly underestimate the highest extreme storm surges, we conclude that addressing the imbalance in the training data through density-based weighting helps to improve the performance of neural networks at predicting the extremes and forms a step forward towards their use for climate projections.
- Preprint
(2747 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-196', Anonymous Referee #1, 10 Feb 2025
-
AC1: 'Reply on RC1', Tim Hermans, 10 Sep 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-196/egusphere-2025-196-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Tim Hermans, 10 Sep 2025
-
RC2: 'Comment on egusphere-2025-196', Anonymous Referee #2, 06 Aug 2025
This paper presents an investigation into the use of neural networks (NNs) for predicting extreme storm surges. The core contribution is the application and evaluation of a cost-sensitive learning approach (DenseLoss) to specifically improve the prediction of the rare, high-impact events. Two NN architectures (LSTM and ConvLSTM) were compared against both a simpler statistical model (MLR) and a hydrodynamic model (GTSM) across nine European tide-gauge locations. This is a well-written paper on an interesting topic. I believe the manuscript could be strengthened by considering the following points.
- The paper identifies data imbalance as a major issue, but the choice of DenseLoss needs stronger justification. It’s basically a simple re-weighting technique—and more advanced options (like SMOGN) can be used? Does just up-weighting rare events actually help the model learn their complex, nonlinear physics, or does it merely force better scores on a few outliers while impacting overall physical consistency?
- The conclusions are based on an experimental setup with several fixed, important parameters—like using just nine tide gauges. Why these, and do they really capture Europe’s varied coastal dynamics? And how did you choose a 5×5° domain and a 24-hour lookback? A sensitivity analysis would show whether your results hold up when these parameters change.
- Based on your results, the NNs still tend to underestimate the very highest extremes (99.9 percentile). Since accurate tail behavior is key for hazard assessment, it might help to investigate why this happens. Is it a smoothing effect in the ERA5 reanalysis, or a limit in the network’s ability to extrapolate even with DenseLoss? A brief investigation could really strengthen your conclusions.
- Noting that the best α parameter changes from site to site raises practical challenges: is the model capturing general patterns or simply fitting each location’s unique data distribution? If you must tune α for every gauge, rolling this out to hundreds of sites becomes both computationally heavy and methodologically challenging.
- Perhaps the comparison would be more balanced if both models used the same input cadence (ConvLSTM and GTSM). The ConvLSTM is driven by 3-hourly data, whereas GTSM benefits from hourly forcing, which may contribute to its sharper extreme peaks. Ideally, the authors could run GTSM on the same 3-hourly inputs; if that isn’t feasible, a clearer justification for the differing cadences would be helpful.
- The paper would be improved by a brief discussion of its findings in the context of other advanced architectures. The authors should consider contextualizing their work with respect to models like Graph Neural Networks (GNNs), hierarchical deep neural networks, and Gaussian Process models, which have been successfully applied to similar spatiotemporal problems. This would provide valuable perspective on why LSTM/ConvLSTM were chosen and how they fit within the rapidly evolving field.
Minor Comment:
- Line 487 (Appendix A): "Regularization and normalization help to avoid overfitting..." It should be "Regularization and dropout help to avoid overfitting...". Batch normalization serves a different primary purpose (stabilizing and accelerating training).
Citation: https://doi.org/10.5194/egusphere-2025-196-RC2 -
AC2: 'Reply on RC2', Tim Hermans, 10 Sep 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-196/egusphere-2025-196-AC2-supplement.pdf
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
917 | 143 | 34 | 1,094 | 43 | 55 |
- HTML: 917
- PDF: 143
- XML: 34
- Total: 1,094
- BibTeX: 43
- EndNote: 55
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Review of the Manuscript “Computing Extreme Storm Surges in Europe Using Neural Networks”
The manuscript presents an approach to storm surge prediction using deep learning. Several methodological issues must be addressed to improve the manuscript’s clarity and strength. Specifically, clarifying dataset construction, justifying hyperparameter choices, and improving performance evaluation will significantly enhance the manuscript. In its current form, the manuscript is not appropriate for publication.
Introduction:
Methodology – Data Preparation:
Model Training & Hyperparameter Tuning:
Performance Evaluation:
Discussion & Conclusion:
Minor Edits: