the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Review article: Hydrologically Enhanced Machine Learning Framework for Urban Flood Inundation Mapping Using Multi-Sensor Remote Sensing Data: A Case Study of Mumbai, India
Abstract. The complicated terrain, highly populated building surfaces and insufficient credible ground observations make urban flood mapping difficult in urbanizing megacities that rapidly develop in coastal areas. This study suggests that a hydrologically improved machine learning architecture can be utilized to perform automated urban flood inundation mapping by combining multi-sensor satellite data with a scalable decision support system (DSS). The Google Earth engine used Sentinel-1 SAR, Sentinel-2 optical imagery, SRTM digital elevation data, and CHIRPS precipitation data to create a comprehensive predictor stack.
To explicitly model flood propagation controls that most data-driven models tend to omit, two new hydrologic-topographic predictors were created:-the Relative Elevation Model (REM) and River Network Index (RNI), to model local terrain depressions and hydraulic connectivity. A consensus-based combination of SAR backscatter change, optical water indices, and topographic constraints produced flood labels with approximately 2.6x105 pixels of floods in the Mumbai Metropolitan Region during the 2019 monsoon season. A representative training set was formed using balanced stratified sampling for use in the supervised classification. Random Forest, optimized XGBoost and ensemble models were created and tested in Python using official classification measures. The tuned XGBoost model had the best performance with an overall accuracy of 71.7 percent and an area under the receiver operating characteristic curve (AUC) of 0.803, which performed better than the Random Forest and ensemble configurations. The statistical significance of the improvement in model discrimination was at the 95 percent confidence level. The analysis of ablation revealed that the model discrimination of REM and RNI increased by approximately 5–6 percent in AUC, which proves their importance in urban flood detection. There is high spatial congruency between the predicted inundation pattern and known flood-prone regions along the major drainage patterns.
The proposed framework provides a reproducible, scalable, and hydrologically informed framework for urban flood inundation mapping and has high potential for operational flood monitoring and decision support in data limited tropical cities.
- Preprint
(992 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 24 Jun 2026)
-
CC1: 'Comment on egusphere-2026-1275', Anupal Baruah, 26 Apr 2026
reply
-
AC1: 'Reply on CC1', Gayatri M Phade, 26 Apr 2026
reply
We sincerely thank the commenter for the insightful and constructive feedback on our manuscript. The points raised regarding the use of SAR data in urban environments and the implications of spatial resolution are highly relevant, and we appreciate the opportunity to clarify these aspects.
-
Regarding the use of SAR data in urban flood mapping, we agree that double-bounce scattering in built-up areas can introduce uncertainties and may lead to misclassification of flooded regions. Despite this limitation, SAR data remain a widely adopted and valuable source for flood mapping due to their all-weather, day-and-night imaging capability, which is particularly critical during flood events characterized by cloud cover.
In our study, we mitigate these limitations through a multi-source data integration framework. Specifically, SAR-derived features are not used in isolation but are combined with hydrologic–topographic indicators such as relative elevation and flow accumulation, as well as additional geospatial predictors. This integration allows the machine learning model to reduce reliance on any single data source and improves robustness against SAR-specific artefacts. Furthermore, the use of statistical descriptors and temporal variability features helps to distinguish true flood signals from urban backscatter effects. We will further clarify this mitigation strategy in the revised manuscript.
-
With respect to the use of 30 m spatial resolution, we acknowledge that this resolution may not fully capture fine-scale urban features such as roads, drainage networks, and individual buildings. However, the choice of 30 m resolution was guided by the need to maintain consistency across multiple datasets (e.g., DEM, precipitation, and derived hydrologic variables) and to ensure computational feasibility for regional-scale analysis.
Our objective is to provide a scalable and generalizable framework for urban flood susceptibility mapping rather than detailed street-level inundation modelling. The machine learning framework leverages terrain and hydrologic context, which remain meaningful at this spatial scale. Nevertheless, we agree that higher-resolution datasets could further enhance the accuracy of flood delineation in dense urban environments. This limitation and its implications will be more explicitly discussed in the revised manuscript, along with suggestions for future work using higher-resolution data.
We thank the commenter again for these valuable suggestions, which will help improve the clarity and robustness of our study.
Citation: https://doi.org/10.5194/egusphere-2026-1275-AC1 -
-
AC1: 'Reply on CC1', Gayatri M Phade, 26 Apr 2026
reply
-
RC1: 'Comment on egusphere-2026-1275', Anonymous Referee #1, 31 May 2026
reply
The manuscript addresses an important and timely topic in urban flood inundation mapping by integrating multi-sensor remote sensing data with machine learning and hydrologic-topographic predictors. The use of Sentinel-1 SAR, Sentinel-2 optical imagery, SRTM DEM, CHIRPS rainfall, and HydroRIVERS data provides a relevant basis for developing a scalable flood mapping framework. The inclusion of the Relative Elevation Model (REM) and River Network Index (RNI) is a promising attempt to improve the physical interpretability of machine learning-based flood detection. The reported performance of the tuned XGBoost model, with an accuracy of 71.7% and AUC of 0.803, suggests that the proposed framework has potential for operational urban flood monitoring. However, the manuscript requires substantial revision before it can be considered scientifically robust. First, the title describes the paper as a “Review article,” but the manuscript is clearly an original research article involving data processing, model development, model evaluation, and case study application. This should be corrected to avoid confusion.
The novelty of the study also needs to be clarified. The manuscript claims that REM and RNI are new hydrologic-topographic predictors, but it does not sufficiently explain how these indices differ from existing flood conditioning variables such as elevation, slope, distance to river, HAND, drainage proximity, topographic wetness index, or flow accumulation. The authors should clearly state whether REM and RNI are newly developed indices, modified versions of existing indices, or case-specific hydrologic features. A major concern is the formulation of the RNI. The text describes RNI as a measure of hydraulic proximity and connectivity to river networks, but the equation uses cumulative precipitation divided by the elevation difference from the minimum DEM. This formulation does not directly represent distance to river, drainage connectivity, or river network influence. The authors should revise the RNI equation so that it is mathematically consistent with its stated hydrological meaning.
The flood label generation process also requires stronger justification. The manuscript uses a consensus rule based on SAR backscatter ratio, backscatter difference, NDWI, and REM thresholds, but the selected threshold values are not adequately justified. The authors should explain why SAR ratio > 1.25, backscatter difference ≥ 3 dB, NDWI > 0.05, and REM < 5 m were selected. A threshold sensitivity analysis would strengthen the reliability of the generated flood labels. The validation strategy is another important limitation. Since the training and testing labels are generated from remote sensing-based consensus rules, the model may be learning the labeling assumptions rather than being validated against independent flood observations. The authors are encouraged to include independent validation data, such as official flood records, observed flood locations, high-resolution imagery, or historical flood-prone areas in Mumbai. If such data are unavailable, the manuscript should clearly state that the reported accuracy reflects agreement with consensus-generated labels rather than confirmed ground truth.
The statistical significance claims should also be improved. The manuscript states that the XGBoost model significantly outperforms other models based on the DeLong test, but the p-values, confidence intervals, and test statistics are not reported. Since the AUC difference between Random Forest and XGBoost is small, these values are necessary to support the claim of statistical significance. There is also an inconsistency in the reported ensemble performance. Table 3 reports the RF-XGB ensemble AUC as 0.794, while the ROC figure appears to show a different ensemble AUC value. The authors should carefully check and correct all performance values in the tables, figures, and discussion.
Finally, the manuscript requires substantial language editing. Several sentences are awkward or unclear, and the citation style is inconsistent between author-year and numbered formats. The figures, especially the workflow and spatial flood maps, should be improved for readability and publication quality. Figure 5 should include clearer legends, units, class definitions, and map elements. Overall, the study has potential, but the current version needs major revision. The authors should strengthen the novelty statement, correct the RNI formulation, justify the flood-label thresholds, improve validation, report full statistical testing results, resolve inconsistencies in model performance, and substantially revise the language and presentation.
Citation: https://doi.org/10.5194/egusphere-2026-1275-RC1 -
AC2: 'Reply on RC1', Gayatri M Phade, 01 Jun 2026
reply
We thank Referee #1 for the detailed and constructive review of our manuscript. We appreciate the positive assessment of the overall framework and the valuable suggestions for improvement.
We acknowledge the concerns regarding manuscript classification, the novelty and formulation of the hydrologic–topographic predictors, threshold selection, validation strategy, statistical significance testing, and presentation quality. We are currently revising the manuscript and will address each comment in detail in a point-by-point response and revised manuscript.
We particularly appreciate the referee's suggestions regarding clarification of the Relative Elevation Model (REM) and River Network Index (RNI), justification of flood-label thresholds, reporting of DeLong test statistics, and improvement of figures and language. These comments will substantially strengthen the manuscript.
We thank the referee again for the constructive feedback and will carefully incorporate all recommendations in the revised version.
Citation: https://doi.org/10.5194/egusphere-2026-1275-AC2
-
AC2: 'Reply on RC1', Gayatri M Phade, 01 Jun 2026
reply
Data sets
Data for Hydrologic–Topographic Enhanced Machine Learning for Urban Flood Inundation Mapping Ankush S. Pawar and Gayatri M. Phade https://doi.org/10.5281/zenodo.18486214
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 266 | 63 | 19 | 348 | 19 | 19 |
- HTML: 266
- PDF: 63
- XML: 19
- Total: 348
- BibTeX: 19
- EndNote: 19
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
1) I am somewhat skeptical about the use of SAR data for urban flood mapping, given the well-known double-bounce scattering effect in built-up areas, which can lead to misclassification of flooded regions. This limitation is already acknowledged in Table 1. In light of this, it would be helpful if the authors could further justify their decision to proceed with SAR data, and clarify how they mitigate or account for these uncertainties in their analysis.
2) Additionally, the use of 30 m spatial resolution may be too coarse for accurately capturing urban flood dynamics, where fine-scale features such as roads, drainage networks, and building footprints play a critical role