Probabilistic flood hazard mapping for dike-breach floods via graph neural networks
Abstract. Flood hazard maps are essential for protection and emergency plans, yet their probabilistic application is constrained by the computational cost of numerical models. Deep learning surrogates can provide orders of magnitude faster predictions, but their use for uncertainty quantification in realistic settings and their ability to incorporate hydraulic structures remain largely unexplored. Studying deep learning surrogates for probabilistic flood maps is non-trivial because of the lack of reference ground-truth data that might lead to misleading confidence in predictions. Moreover, hydraulic structures are challenging to include due to their generally unidimensional nature. In this work, we investigate the use of deep learning surrogates for realistic, large-scale flood simulations in case studies with hydraulic structures, under diverse boundary conditions. To this end, we employ the multi-scale hydraulic graph neural network (mSWE-GNN) that enjoys transferability to different boundary conditions and locations and whose graph-based architecture allows to represent structures such as canals, underpasses, and elevated elements as inputs. To address the lack of reference ground-truth data, we further introduce the average relative mass error (ARME), a mass-conservation-based criterion that helps identify physically plausible simulations. We apply the model on dike ring 41 in the Netherlands, generating probabilistic flood maps that account for uncertainties in breach location and breach outflow hydrographs. The model was trained on 30 simulations, generated with Delft3D, and evaluated against unseen benchmark simulations from the Dutch national flood catalogue, achieving a critical success index (CSI) of 73.6 % while running 10,000 times faster than the numerical simulator. The proposed ARME is negatively correlated with the CSI, with a Pearson correlation coefficient of −0.7, making it a useful indicator of simulation plausibility when evaluating unseen case studies. We obtained probabilistic flood maps by running 10,000 different flooding scenarios on a computational mesh of 180,000 cells in approximately 10 hours with about half of the simulations classified as plausible based on the mass-conservation check. This framework offers a practical tool for rapid probabilistic flood hazard assessment and a way to prioritize detailed physical simulations, supporting more efficient and robust flood risk management.
The study employed the mSWE-GNN model developed by the authors to investigate its applicability for large-scale flood simulations that incorporate the information of hydraulic structures, and introduced a metric based on mass conservation to evaluate the model performance in absence of ground-truth hydrologic data. Overall, this study is comprehensive and the findings are meaningful for more informed and efficient flood risk management. However, I still have several comments and suggestions as follows to improve the current manuscript.
1) Line 8: The “mSWE-GNN” developed by the authors stands for the “multi-scale hydraulic graph neural network”. While “SWE” may be short for shallow water equations, strictly speaking, it is not the same as “hydraulic”.
2) Line 16, Figure 11, and Table 2: The Pearson’s r is not good measure of correlation in this case, because it is sensitive to outliers and is not applicable when two variables do not show a clear linear pattern. The Spearman’s correlation coefficient could be a better choice in this case.
3) Lines 25-28: It should be noted that the uncertainty in model evaluation should not be ignored given the sampling uncertainty over limited space and time. The authors can refer to the paper below for more information about the limitations of some commonly used evaluation metrics in flood modeling.
Reference: “Beyond a fixed number: Investigating uncertainty in popular evaluation metrics of ensemble flood modeling using bootstrapping analysis” (https://doi.org/10.1111/jfr3.12982)
4) Lines 30-32: Too many references are used here. It is suggested to remove some old ones.
5) It is suggested to add a list of acronyms mentioned in the manuscript. The full term of the acronym only needs to be presented the first time it appears, e.g., ARME and CSI.
6) Figure 1: It would be helpful to explain the terms like “Z_ee” in the figure or figure caption.
7) Lines 113-114: What is “p” in the superscript, and how to determine the value of “p”?
8) Figure 3: Is it necessary to force the mesh to align with the boundaries of structures and riverbanks?
9) Line 163: The u0_hat instead of u0 are the predicted hydraulic variables.
10) Figure 5: Please add the units for both longitudes and latitudes.
11) Figure 7(d): What is the definition of the roughness coefficient?
12) Table 1: The numbers after “±” are standard deviations or standard errors? For the MAE in the validation dataset, how could 1.41-1.72 < 0?
13) Line 344: Please correct the text “outlier Contrarily”.