Validation of the Open-Source Hydrodynamic Model SFINCS on Historical River Floods at the Global Scale

Sadana, Tarun; Aerts, Jeroen C. J. H.; Eilander, Dirk; Merz, Bruno; de Moel, Hans; Busker, Tim; Bril, Veerle; de Bruijn, Jens

doi:10.5194/egusphere-2025-4387

Preprints

https://doi.org/10.5194/egusphere-2025-4387

Preprints

06 Nov 2025

| 06 Nov 2025

Validation of the Open-Source Hydrodynamic Model SFINCS on Historical River Floods at the Global Scale

Tarun Sadana, Jeroen C. J. H. Aerts, Dirk Eilander, Bruno Merz, Hans de Moel, Tim Busker, Veerle Bril, and Jens de Bruijn

Abstract. We evaluate the performance of the Super-Fast INundation of CoastS (SFINCS) hydrodynamic model for simulating riverine floods, combined with a fully automated open-source data preprocessing pipeline. To do this, we assessed the simulated extent of 499 historic flood events against the satellite derived flood extents using the Critical Success Index (CSI) as a performance metric. We utilised simulated discharges from the Global Flood Awareness System (GloFAS) hydrological model and found that SFINCS performance improved with upstream basin size, with a global mean CSI of 0.42 for basins with large upstream area (>1,000 km²) and a CSI of 0.29 for basins with small upstream area (<50 km²). Our results illustrate the importance of accurate discharge data input to flood hazard simulations. When the (globally simulated) GloFAS data replaced with observed discharge data for ten events in the US, the CSI improved from 0.39 to 0.67. These results suggest that global hydrological model performance limits the accuracy of the flood hazard simulations. Our findings also showed a significant improvement in the CSI (from 0.37 to 0.57) when changing to a higher-resolution elevation input by contrasting a ~1 m digital elevation model (DEM; 3DEP) with our default ~30 m global DEM (FABDEM) in six U.S. events. Sensitivity analysis of bathymetric calculations revealed a systematic underestimation of the default 2-year return period estimated by GloFAS discharge, likely driven by underrepresentation of annual block maxima, which resulted in underestimated channel dimensions. All of these factors resulted in a loss of detail, which impacted model performance, especially in smaller headwater rivers. We recommend to improve the estimation of bathymetry, for instance by employing the "gradually varying solver" method or using data from the SWOT mission. Furthermore, incorporating additional validation data which ideally includes flood depth measurements can largely enhance our understanding of the model performance.

Received: 08 Sep 2025 – Discussion started: 06 Nov 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Tarun Sadana, Jeroen C. J. H. Aerts, Dirk Eilander, Bruno Merz, Hans de Moel, Tim Busker, Veerle Bril, and Jens de Bruijn

Status: final response (author comments only)

RC1: 'Comment on egusphere-2025-4387', Francesco Dottori, 15 Jan 2026

This paper describes a modelling framework for simulating riverine flooding, focusing in particular on its validation. The authors investigate the sensitivity of the model performance to different geographic conditions, input data and modelling parameters, leveraging a large validation set of flood events at global scale. This allows for drawing general conclusions that in my opinion are interesting for researchers and practitioners working with large-scale flood models.

Overall the paper is clear and well structured, with a detailed analysis of the results and in-depth discusssions of several aspects influencing the skill of the model. Also, I would like to commend the open-source structure of the modelling framework, which gives more relevance to the work.

I have some moderate/minor comments and suggestions that should be addressed to improve the quality of the manuscript before publication .
Main comments:
Have you investigated the performance of the modelling framework in different climate zones? Hydrological models, including GloFAS, are usually less skillful in arid and cold regions, and this can negatively affect results. Given the large number of case studies, I think this factor could be included in the analysis.
Another point is the influence of flood protection structures. To my knowledge, global DEMs are currently not able to capture the presence of embankments along the rivers, and this limitation can heavily affect flood estimates, as observed by Dottori et al (2022) in several European river basins. The authors quicky discuss this issue at the end of Section 3.3 but I think it deserves some more detail. For instance, is there a difference in performance between more protected regions (e.g. rivers on Europe and US) and less protected ones (e.g. in Africa, South America)?
Minor comments:
Please add the scale on maps where appropriate (e.g. Figure 6 and 9)
Lines 47 "While local flood inundation studies have demonstrated promising results..." I would rather say that the use of local-scale flood models is a consolidated approach to map flood hazard and assess flood risk.
L 83-84: "Studies conducting event-based validation have not been global in scale, or they have focused on a limited number of events". I would include in the list the work by Risling et al (2024), as one of the few examples of validation studies involving multiple global flood models. The outcomes of the study (e.g. Figure 2) might be also relevant when discussing the performance of the SFINCS modelling framework .
Section 3.2.3: My understanding is that all simulations described in this section are based on GloFAS discharge, is it correct?
L 777-778: If I remember well, DEM for this study was also based on higher-resolution DTMs, can authors provide here more references?
Section 4: are you planning to use the SFINCS to produce global-scale flood maps, similarly to what was done by JRC (see for instance the latest update by Baugh et al., 2024)?
Can you include a link to the Global Flood Database and specify whether flood extent data are freely accessible?
L1240:"Hence, the results were generated in a timely manner". This is rather vague. Can you provide a broad estimate of running times based on the temporal and spatial extent of the events?
References
Risling, A., Lindersson, S. & Brandimarte, L. A comparison of global flood models using Sentinel-1 and a change detection approach. Nat Hazards 120, 11133–11152 (2024). https://doi.org/10.1007/s11069-024-06629-7
Baugh, C.; Colonese, J.; D'Angelo, C.; Dottori, F.; Neal, J.; Prudhomme, C.; Salamon, P. (2024): Global river flood hazard maps. European Commission, Joint Research Centre (JRC) [Dataset] PID: http://data.europa.eu/89h/jrc-floods-floodmapgl_rp50y-tif

Citation: https://doi.org/10.5194/egusphere-2025-4387-RC1
RC2: 'Comment on egusphere-2025-4387', Anonymous Referee #2, 28 Jan 2026

General comments
I enjoyed reading the article by Sadana et al. which presents a landmark global validation of the open-source SFINCS hydrodynamic model using 499 historical riverine flood events. The study demonstrates that while SFINCS achieves a mean CSI of 0.39 using simulated discharges, performance is highly sensitive to input data quality and basin scale. The fully automated and reproducible, open-source workflow, which couples SFINCS with global hydrological models, represents a valuable contribution to the topic.

Overall the article is well written, using grammar and English wording adequate to the standards of a scientific journal. Models and data used are mostly open source, which is appreciated. Main concerns about this work are detailed below, followed by a list of more specific comments.

The main criticism is related to the appropriateness and coherence of the data used, which lead to results and limitations that are discussed as if they were findings of the work, though many of them could have been well foreseen since the start. In fact some choices taken sound as if the authors did not fully analyzed the range of use of each data source. For example, discharges derived from GloFAS v.4 are, according to the published documentation, meaningful for river basins larger than 500 km^2 (see Grimaldi et al., 2022 and Baugh et al., 2024), due to relatively coarse resolution of the model and even more of the meteorological input data. Here such data is used to model floods in basins in over 60% of cases smaller than 500km^2 (according to Table 3), later on reporting a tendency of GloFAS discharges to underestimate peak flows with selected return periods. This suggests a non optimal knowledge and use of the data. Similarly, I found it sub-optimal the choice to model inundation at 30m resolution in relatively small basins and then aggregate the output to 250m to be evaluated versus MODIS data. The comment made by the authors on using higher resolution data from SAR based products could have been addressed already in this stage of the work, rather than proposing it as an improvement for future works.

Lastly, the research questions to be tested in this work could be made clearer. Some side analyses such as the comparison of GloFAS versus USGS observations are a bit unexpected and distract the reader from the main application shown.

Specific comments
L23: “is” or “was” replaced
L162-170: matching hydrological and hydraulic river networks is far from being a simple task and remains one of the main challenges for operational inundation forecasting at large scales. A matching based only on the upstream area is generally not very accurate, particularly 1) when the resolution of the hydraulic and hydrological model DEMs are substantially different (here ~30m vs. ~5km) and 2) when the 2 underlying DEMs have different sources (here FABDEM and MERIT). I encourage to provide justifications for such choice and validation for selected cases (or summary statistics for a larger sample).
L395: “the river roughness value (0.02) derived from widely accepted Manning’s roughness values”. River roughness depends on the land use and considering the application to so many different floods/ world regions, one value cannot accurately represent roughness in all context. Why did you not link roughness to the map of land use? If the approach relies on a constant roughness everywhere some additional justifications would be necessary, ideally with supporting references.
Sect 3.1.1 and Table 3: (following the main comments) here the authors focused on classes of upstream area at and beyond the limits of applicability of GloFAS streamflows. GloFAS is a global system made for large riverine floods and cannot capture the flood dynamics in river basins with small upstream area. Its historical runs have daily resolution and are forced by ERA5 (~30x30km resolution), hence with constant precipitation rates over 1 day and almost 1000km^2, which in the analysis shown is the threshold of the largest basin class. At such scales, actual floods are typically induced by shorter rainfall extremes and at smaller spatial scales. The authors provided some sound justifications (L486-523) for increasing performance in larger basins, and the choice of the classes of upstream area was also driven by trying to have classes with similar sample size. However, GloFAS streamflow are recommended to be used for river basins larger than 500 km^2 (see Grimaldi et al., 2022 and Baugh et al., 2024). I recommend the authors to improve the performance analysis on different classes of upstream area, by focusing on larger limits, where the smaller class includes all basins smaller than 500km^2 , to show that performance deteriorates when using data outside the range of usability of the streamflow input.
L599-608: It sounds like this issue may be addressed to a good extent by merging the MODIS images with the permanent water layer before comparing it with the SFINCS output (following regridding of both maps on a common grid)
Sect 3.2.2: As commented above, the pattern shown in the graph is a result of using modelled discharges from points with upstream area at (and beyond) the lower edge of the range of usability of GloFAS data.
L744: the difference in performance related to the use of different DEMs is impressive and would benefit from further investigation. It would be interesting to know if it’s related mainly to the resolution or to the product. For instance one could resample the 3DEP DEM to progressively larger grids from 1 to 30m and test the differences in the SFINCS output and related performance.
L793: “covering approximately three to 35 events”. These numbers are quite specific. Add supporting references.
Why the GEB model was not used more extensively, given the better performance? Is it available only for India? Please include some additional detail to clarify.
L903: For the Global Flood Monitoring I think that more appropriate citations are Salamon et al (2021) or Wagner et al (2026)
Again, I think that adding higher resolution SAR based inundation maps would greatly benefit the analysis, thanks to resolutions comparable or higher than that used for the SFINCS model simulations, being less affected by cloud coverage and helping to fill the temporal gaps of days without acquisitions during floods. Indeed, resampling the model output from 30m to 250m sounds like a major limitation in the evaluation of the model performance, as also noted in the text (L579 onwards).
A summary figure of the distribution of the 499 events would be welcome.

References
Baugh, Calum; Colonese, Juan; D'Angelo, Claudia; Dottori, Francesco; Neal, Jeffrey; Prudhomme, Christel; Salamon, Peter (2024): Global river flood hazard maps. European Commission, Joint Research Centre (JRC) [Dataset]
Grimaldi, S., Salamon, P., Disperati, J., Zsoter, E., Russo, C., Ramos, A., Carton De Wiart, C., Barnard, C., Hansford, E., Gomes, G. and Prudhomme, C., GloFAS v4.0 hydrological reanalysis, European Commission, 2022, JRC131349.
Salamon, P., McCormick, N., Reimer, C., Clarke, T., Bauer-Marschallinger, B., Wagner, W., Martinis, S., Chow, C., Böhnke, C., Matgen, P., Chini, M., Hostache, R., Molini, L., Fiori, E., and Walli, A.: The New, Systematic Global Flood Monitoring Product of the Copernicus Emergency Management Service, in: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 1053–1056, https://doi.org/10.1109/IGARSS47720.2021.9554214, 2021.

Wagner, W., Bauer-Marschallinger, B., Roth, F., Raiger-Stachl, T., Reimer, C., McCormick, N., Matgen, P., Chini, M., Li, Y., Martinis, S., Wieland, M., Kraft, F., Festa, D., Hassaan, M., Tupas, M. E., Zhao, J., Seewald, M., Riffler, M., Molini, L., Kidd, R., Briese, C., and Salamon, P.: The fully-automatic Sentinel-1 Global Flood Monitoring service: Scientific challenges and future directions, Remote Sensing of Environment, 333, 115108, https://doi.org/10.1016/j.rse.2025.115108, 2026.

Citation: https://doi.org/10.5194/egusphere-2025-4387-RC2

Tarun Sadana, Jeroen C. J. H. Aerts, Dirk Eilander, Bruno Merz, Hans de Moel, Tim Busker, Veerle Bril, and Jens de Bruijn

Viewed

Total article views: 830 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
453	356	21	830	21	21

HTML: 453
PDF: 356
XML: 21
Total: 830
BibTeX: 21
EndNote: 21

Views and downloads (calculated since 06 Nov 2025)

Month	HTML	PDF	XML	Total
Nov 2025	189	62	9	260
Dec 2025	58	102	4	164
Jan 2026	116	123	6	245
Feb 2026	90	69	2	161

Cumulative views and downloads (calculated since 06 Nov 2025)

Month	HTML	PDF	XML	Total
Nov 2025	189	62	9	260
Dec 2025	58	102	4	164
Jan 2026	116	123	6	245
Feb 2026	90	69	2	161

Viewed (geographical distribution)

Total article views: 809 (including HTML, PDF, and XML) Thereof 809 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 28 Feb 2026

Short summary

We evaluated a global flood model using satellite data from 499 historical flood events across 96 countries. Our study shows that larger upstream river basins are modelled more accurately, while using observed river gauges and high-resolution elevation data can improve results. Our findings highlight the importance of large-scale validation and sensitivity analyses to enhance future global flood hazard assessments and prediction accuracy.


Total:	0
HTML:	0
PDF:	0
XML:	0