the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multi-Data Source Based Quantifying Urban Flood Severity in Major Chinese Cities (2000–2024) Using a Hybrid Machine-Learning Weighting Framework
Abstract. Urban flooding poses a major challenge to sustainable urban development, yet most existing assessments focus on single cities or river basins and rely on limited historical records. This study integrates multi-source data from 20 Chinese cities over 2000–2024 to develop a comparable long-term assessment of urban flood severity. To address the fragmentation and inconsistency of flood evidence across official records, news reports, and social media, we construct an event-level database and derive a Flood Severity Index (FSI) using an interpretable data-driven weighting and ensemble framework. Robustness is evaluated through repeated resampling and consistency checks across cities and years. The results show that southern cities experience more frequent and severe flooding, whereas northern cities are generally less affected but more vulnerable to abrupt extremes. These findings suggest distinct governance priorities: reducing chronic exposure in southern cities and strengthening preparedness for high-impact shocks in northern cities. The proposed framework is transferable to other regions and provides a basis for future cross-regional flood risk comparison and adaptive urban risk governance.
- Preprint
(2219 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 12 Jul 2026)
- RC1: 'Comment on egusphere-2026-2501', Anonymous Referee #1, 12 Jun 2026 reply
-
RC2: 'Comment on egusphere-2026-2501', Anonymous Referee #2, 18 Jun 2026
reply
General comments:
In this manuscript, Peng et al. tackle the complex challenge of assessing urban flood risks across diverse geographical and developmental landscapes. The authors attempt to bridge the gap between fragmented historical records and modern datasets by integrating official statistics, news reports and social media into an event-level database. While the study’s objective to construct an interpretable, data-driven Flood Severity Index is timely and highly relevant to urban resilience planning, the execution of this framework warrants careful examination. Specifically, a number of methodological uncertainties in the research design and a lack of granular transparency in the methodology undermine the robustness of the study’s otherwise compelling spatial and temporal claims.
In the methodology section, which frequently states what was done without adequately explaining how it was achieved, a thorough revision is necessary for the study to be truly replicable. For instance, the authors claim to resolve missing disaster attributes such as economic losses and affected populations by imputing them “using statistical methods”, yet they fail to specify the actual techniques employed, leaving the foundational dataset vulnerable to undisclosed biases. Similarly, the study heavily relies on a Python-based web crawler and NLP to filter 14 years of Sina Weibo posts, but the manuscript omits the specific algorithms or semantic recognition tools used to isolate valid flood events from the noise. Furthermore, while the authors deploy six different ML algorithms to build their hybrid weighting system, the text lacks critical details regarding hyperparameter tuning or dataset splitting beyond standard cross-validation. Again, this vagueness prevents replicability and makes it difficult to rigorously verify if the authors successfully implemented the data-driven framework they claim.
The authors identify the period from 2010-2020 as a “transitional phase of adjustment” in urban flood severity and attribute this shift to changes in infrastructure and flood management. However, the underlying flood database undergoes a major methodological change at approximately the same time. Prior to 2010, the database relies primarily on official records and documentary sources, whereas from 2010 onward it additionally incorporates large volumes of social media data from Sina Weibo. This change in data availability and reporting intensity may substantially alter event detection rates, event characterization, and the online-attention component of the FSI itself. Consequently, the apparent temporal transition identified by the authors may partly reflect an observational artifact rather than a genuine shift in flood severity. The manuscript should explicitly evaluate the sensitivity of the results to this data-source discontinuity and demonstrate that the observed temporal phases remain robust when social-media-derived information is excluded or otherwise standardized across the study period.
The FSI is constructed using absolute values of affected area, affected population economic losses, and fatalities. However, the study compares cities that differ substantially in population size, urban extent, and economic activity. Larger cities will naturally tend to report greater numbers of affected people and larger economic losses even when the relative severity of flooding is comparable. The manuscript does not explain whether these indicators were normalized by population, urban area, GDP, or other exposure metrics prior to index construction. Min-max normalization alone does not resolve the issue, as it rescales variables without accounting for differences in underlying exposure. As a result, it is difficult to determine whether the resulting rankings reflect flood severity or simply differences in city size and socioeconomic scale. The authors should justify the use of absolute indicators or evaluate the sensitivity of their results to appropriate normalization procedures.
Finally, the inclusion of online popularity as one of the five core components of the FSI requires substantially stronger theoretical justification. Unlike affected area, affected population, economic losses, and fatalities, online popularity is not a direct measure of flood impacts but rather a measure of public attention and information dissemination. Incorporating online popularity directly into the severity index risks conflating disaster impacts with reporting behaviour. The authors should clearly justify why public attention is treated as a component of flood severity rather than as an auxiliary explanatory variable and should evaluate how sensitive the resulting severity rankings are to the inclusion or exclusion of this indicator.
Specific comments:
The authors state that missing values for crucial impact metrics (e.g., economic losses, affected population) were “imputed using statistical methods”. They need to explicitly identify which methods were used (e.g., mean imputation, multiple imputation, KNN), as imputing extreme variables can severely skew the dataset.
The authors mention applying NLP techniques for semantic recognition and noise reduction. However, it completely omits the specific algorithms, models, or libraries used to achieve this, making the data cleaning process, again, irreproducible.
While six machine learning models are evaluated, the manuscript lacks critical details regarding hyperparameter tuning, optimization strategies, and train/test splitting beyond a standard 10-fold cross-validation.
In the main text the FSI is correctly defined as a linear combination of five indicators, however in the flowchart (Fig. 2) the formula is incorrectly typed as a summation of only three variables with incorrect subscripts. The Figure must be corrected to accurately reflect the five variables discussed in the text.
There is a typo in the indexing for the AHP index calculation. The summation index is defined as i = 1, but the variables inside the summation use the subscript l. This should be corrected so the subscripts match the index (i.e. using i throughout).
The manuscript provides conflicting descriptions of the role of ML. Section 3.3 defines continuous and categorical FSI variables for supervised learning, while Section 3.5 states that machine learning is not used to predict predefined labels but only to derive SHAP-based weights. The authors should explicitly clarify the target variable(s) used during model training and explain how SHAP values were derived without introducing dependence on the originally constructed FSI.
Citation: https://doi.org/10.5194/egusphere-2026-2501-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 126 | 77 | 11 | 214 | 12 | 13 |
- HTML: 126
- PDF: 77
- XML: 11
- Total: 214
- BibTeX: 12
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The review of this paper is relatively brief as I focus on the main methodological flaws.