the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A highly generalizable data-driven model for spatiotemporal urban flood dynamics real-time forecasting based on coupled CNN and ConvLSTM
Abstract. Flooding has become one of the most severe natural hazards in urban areas. Real-time and accurate prediction of flood processes is a crucial approach to mitigate urban flood disasters. Data-driven models based on machine learning methods offer significantly higher computational efficiency than physics-based models and have been widely applied in real-time urban flood simulation. However, most data-driven models target the temporal process of inundation depths at specific sites or the spatial distribution of peak inundation depths, while some models capable of simulating spatiotemporal urban flood inundation often lack spatial generalization capabilities. In this study, we proposed a novel data-driven model to predict the spatiotemporal distribution dynamics of urban inundation depths. The model integrates a ConvLSTM-based component alongside a CNN-based component via a concatenation process, facilitating the extraction of information from both temporal sequences and static geospatial features concurrently. A tiling approach that divides the study area into distinct spatial sub-regions, which serve as independent training samples, was employed during model training to enhance the model’s generalization capability. The proposed model was applied to a flood-prone urban area in Macao and compared with a physics-based model. The results show that: (1) the proposed model effectively captures the inundation processes at specific sites, with NSE >0.80 for the majority events, as well as RMSE and MAE values <0.20. (2) The proposed data-driven model demonstrates robust generalization performance, with simulated inundation processes closely aligned with the results of the physics-based model in most regions (mean NSE >0.70, RMSE <0.10, MAE <0.10). Notable discrepancies persist only in localized zones of abrupt terrain variations, particularly near building edges.
- Preprint
(9437 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 07 Oct 2025)
- RC1: 'Comment on egusphere-2025-3171', Anonymous Referee #1, 19 Sep 2025 reply
-
RC2: 'Comment on egusphere-2025-3171', Anonymous Referee #2, 25 Sep 2025
reply
This manuscript develops a deep learning framework that integrates ConvLSTM and CNN to forecast the spatiotemporal dynamics of urban inundation depths, demonstrated on a small, flood-prone pilot area (~4.06 km²) and compared against a physics-based model. The paper is clearly structured and generally easy to follow. However, the contribution would be significantly strengthened by (1) a rational comparison design and a robustness check, (2) a deeper interpretation of results beyond reporting evaluation metrics, and (3) a richer Discussion that articulates mechanisms, limitations, comparative advantages, and generalizability. I recommend substantial revision to incorporate these analyses and clarifications before the work is suitable for publication.
Major comments
1. Study area and generalizability
The pilot area (4.06 km²) is small. Discuss scalability to larger, topographically complex basins and data requirements (computational cost, training data volume, transfer learning). Consider a short experiment or argument on tiling/patching strategies and edge effects.
2. Model comparison
Consider including an ablation study: ConvLSTM-only vs. CNN-only vs. the proposed hybrid, to demonstrate the incremental benefit of CNN–ConvLSTM coupling.
3. Depth of analysis and scientific insight
In Results, please move beyond “figure + short captioned numbers.” For example, for those key evaluation metrics (e.g., NSE, RMSE, MAE), discuss why performance varies across rainfall events. Where performance is very high (e.g., NSE > 0.95) or notably lower (e.g., NSE < 0.80), probe the hypothesized mechanisms and provide in-depth explanations.
Consider adding a brief uncertainty or robustness check (e.g., event-level cross-validation, bootstrapped confidence intervals, or sensitivity analysis).
In 4.1, the authors analyze five “randomly selected” rainfall events. I am curious about the other 7 events, maybe provide an overall table (mean/median NSE, RMSE, MAE; ranges) and show distribution across events.
4. Deepen discussion
Discussing limitations of deep learning for flood modeling is beneficial; this section is more important to: (1) Articulate the paper’s contributions relative to prior DL flood emulation/forecasting work (what’s new about your integration?). (2) Position your work against related literature (including ConvLSTM-based flood studies such as Liao et al., 2025, WRR) and highlight similarities/differences and added value. (3) Offer practical implications (e.g., real-time forecasting potential, co-design with drainage management) and clear future directions (e.g., larger regions, real rainfall radar, DEM/land-use multi-scale features).
Minor comments:
1. Figure 1b. Add elevation units to the legend.
2. In 3.2.1, add citations for CNN and a bit more information about CNN.
3. In 3.3.2, the authors mention the correlation coefficient (CC), but CC is absent from the Results. Either (i) report CC in the text/plots/tables, or (ii) remove it from 3.3.2 and justify its exclusion.
4. Figure 4. The flowchart is not sufficiently informative. Clarify the concatenation between ConvLSTM and CNN components. Also, replace “…” with the complete set of data used to provide a more intuitive understanding for readers.
5. Figure 7. Add column titles such as “Event 1” … “Event 5” for immediate readability. Clarify that “True” = physics-model simulation and state this consistently in the caption and text.
6. Figures 8 & 9. Consider combining them into a single figure: e.g., first row = former Fig. 8, second row = former Fig. 9, to streamline reading. Make titles explicit: indicate these are metrics for inundation water depth (e.g., “NSE of Inundation Depth,” etc.). For the boxplot summaries, also split by locations (LHK, HIS, LPM) and all grids, yielding four boxplots for each metric (NSE, RMSE, MAE).
7. In 4.3 & Figure 10 (bias vs. error). If the text uses “absolute and relative bias,” define them in 3.3.2 and use the same terms in figures. If you actually plot errors, rename to “absolute error” and “relative error.” Standardize relative error bins (e.g., 5%, 10%, …) and include the % unit in axes/legends. Ensure caption and main text use the same terminology.
Reference to consider
Bian, Wanchao, Jiayi Fang, Pin Wang, Qinke Sun, Jian Fang, Feng Kong, and Tangao Hu. "Deep learning surrogate models for spatiotemporal prediction of coastal flooding inundations in Tianjin, China." Journal of Hydrology: Regional Studies 60 (2025): 102593. https://doi.org/10.1016/j.ejrh.2025.102593
Liao, Y., Wang, Z., Yu, H., Gao, W., Zeng, Z., Li, X. and Lai, C., 2025. Accelerating urban flood inundation simulation under spatio‐temporally varying rainstorms using ConvLSTM deep learning model. Water Resources Research, 61(8), p.e2025WR040433. https://doi.org/10.1029/2025WR040433
Malik, Haider, Jun Feng, Pingping Shao, and Zaid Ameen Abduljabbar. "Improving flood forecasting using time-distributed CNN-LSTM model: a time-distributed spatiotemporal method." Earth Science Informatics 17, no. 4 (2024): 3455-3474. https://doi.org/10.1007/s12145-024-01354-y
Citation: https://doi.org/10.5194/egusphere-2025-3171-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,176 | 96 | 17 | 1,289 | 29 | 35 |
- HTML: 1,176
- PDF: 96
- XML: 17
- Total: 1,289
- BibTeX: 29
- EndNote: 35
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
This manuscript establishes a highly generalizable urban flood simulation model by combining ConvLSTM and CNN, which enables fast and accurate simulation of flood processes in the urban areas. The model structure is scientifically designed and can effectively learn the features of different types of data. The simulation results are promising, providing valuable insights and practical significance for understanding the spatiotemporal processes of urban flooding.
Although the manuscript is already quite comprehensive at its current stage, there are still several suggestions that could help make it even more polished.
The specific suggestions are as follows:
While the manuscript discusses the inference time of the trained model, it does not mention the time required for training the model.
In “Figure 7. Comparison of water water depth processes in three flood-prone locations”, the word “water” appears twice, which seems to be a typographical error.
The relative error shown in Figure 10 should be expressed with a percentage sign.
The RMSE and MAE values presented in Figure 8 are dimensional quantities, but the specific units are missing.
Section 4.1 analyzes the results of three stations, but the manuscript lacks descriptions, figures, or maps to indicate the locations of these stations.
In Section 3.3.1, it is stated that 80% of the dataset was used for training and 20% for testing. However, the manuscript does not clarify how this split was conducted. Was it an 8:2 split by rainfall events, or based on the constructed dataset as a whole? If it is the latter, there is a risk of data leakage from test events into the training set.
The manuscript does not discuss how the model performs when applied to unseen scenarios. It is suggested to include analysis related to underfitting or overfitting to better illustrate the model’s reliability and limitations.
Since the proposed model functions as a black box, it is recommended to introduce physical constraints into the loss function to enhance physical interpretability.