the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Structured exploration of machine learning model complexity for spatio-temporal forecasting of urban flooding
Abstract. Urban flooding may lead to significant socio-economic impacts and loss of life. To afford preventative actions, researchers have implemented various modeling techniques to gain insight into urban flood occurrences. Using New York City (NYC) as the study area, data-driven techniques, specifically statistical and neural network models with increasing spatio-temporal complexity, are formulated and tested, assessing the potential relative contribution of different modeling constructs. Zones, based on flood characteristics, are first delineated using the unsupervised machine learning technique of spectral clustering. Then, the models are applied to each cluster, with comprehensive performance evaluation, as to understand which algorithmic, structural aspects contribute to the reduction of prediction errors. A chief discovery of this study is the emergence of the Graph Wavenet (GWN) as the most effective model due to its proficiency in capturing spatio-temporal aspects and implementing dynamic graph creation. Furthermore, it is seen that the enhancement of specific temporal and spatial components within a modeling technique proves beneficial, and a novel adoption of graph-based architectures is additive. Offering a unique exploration of spatio-temporal aspects, emphasizing the benefits of component enhancement and the adoption of graph-based architectures, this paper identifies modification techniques, which would allow for insights to prevail in urban flood modeling despite being confronted with limited data availability.
- Preprint
(1692 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 10 May 2024)
-
RC1: 'Comment on egusphere-2024-551', Anonymous Referee #1, 14 Apr 2024
reply
This manuscript attempted to address a crucial topic in urban hydrological modelling: the application of advanced ML algorithms for more accurate urban flood modelling. Although the subject aligns well with HESS's scope, this manuscript requires significant improvements to be considered for publication in HESS, particularly in the following areas:
- The writing is lengthy and lacks focus; much of the text in the methodology re-introduces well-known algorithms. This section could be condensed to allow more space for discussing graph-related algorithms. Currently, it reads more like a degree thesis than a research paper—a point that should be addressed if resubmission is planned.
- While the graph-based algorithm presents an interesting contribution from this work, its configuration is overly simplistic and thus questionable regarding its utility. Specifically, considering only six nodes limits its applicability in real-world settings—why not use the 59 DISs as basic nodes to construct the graph?
- The results presented are somewhat superficial: an essential aspect of such hydrological modelling—the performance during peak times and volumes of floods—is missing. Instead, only several R values from different models are provided—far from satisfactory.As such, I suggest rejecting this manuscript in its current form.
Citation: https://doi.org/10.5194/egusphere-2024-551-RC1 -
AC1: 'Reply on RC1', Candace Agonafir, 22 Apr 2024
reply
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-551/egusphere-2024-551-AC1-supplement.pdf
-
AC2: 'Reply on AC1', Candace Agonafir, 22 Apr 2024
reply
“The writing is lengthy and lacks focus; much of the text in the methodology re-introduces well-known algorithms. This section could be condensed to allow more space for discussing graph-related algorithms. Currently, it reads more like a degree thesis than a research paper—a point that should be addressed if resubmission is planned."
Thank you for your comments regarding the length and focus of the methodology section in our manuscript. We appreciate your suggestion to condense this section. We agree that the methodology section can be made more concise. However, I would like to clarify the rationale behind the detailed description of well-known algorithms such as FFN, CNN, and RNN in our methodology. Our intent was to provide a foundation that would not require reiteration when discussing the innovative aspects of the Graph WaveNet (GWN) model. This approach was aimed at keeping the subsequent sections focused primarily on the novel graph architecture and its integration with these base models.
“While the graph-based algorithm presents an interesting contribution from this work, its configuration is overly simplistic and thus questionable regarding its utility. Specifically, considering only six nodes limits its applicability in real-world settings—why not use the 59 DISs as basic nodes to construct the graph?"
Thank you for your comment regarding the configuration of the Graph WaveNet (GWN) model and its applicability. Your comment raises an important point about the scale and complexity of the model used in our research.
In our study, the decision to limit the model to six nodes rather than using the 59 District Indicator Species (DISs) as basic nodes was driven by several critical considerations. First, the GWN model, while complex, is also resource-intensive, requiring significant computational power, memory, and run-time. Implementing a more complex model with 59 nodes would dramatically increase these demands, potentially making the model impractical for the kind of timely and efficient analysis stakeholders aim to achieve.
Furthermore, the occurrence of street flooding, which is the focus of our study, is relatively infrequent. It does not flood every time it rains. Yet, when it does, it is dangerous and costly. This infrequency limits the amount of data available to train a model at the localized level of 59 DISs effectively. Our chosen approach balances the need for analytical depth with the practical aspects of model training and execution.
The main contribution of our work lies in its ability to make reasonably accurate predictions in an area that is underexplored in current literature. We focus on both chronic, pluvial flooding and large urban flooding, which pose significant human risks and economic strains. Our model addresses these issues effectively within the constraints imposed by available data and computational resources.
“The results presented are somewhat superficial: an essential aspect of such hydrological modelling—the performance during peak times and volumes of floods—is missing. Instead, only several R values from different models are provided—far from satisfactory.”
Thank you for your feedback regarding the depth of the results section in our study. We acknowledge your concern about the absence of detailed performance metrics during peak times and volumes of floods. There are no hydrological measurements in NYC and many metropolitan areas. The limitations in available hydrological data have indeed posed a significant challenge. In this context, it is important to highlight that our study relies on crowdsourced data due to the lack of traditional hydrological measurements such as peak times and volumes. Given these constraints, our approach has been to employ data-driven models rather than traditional physics-based models. This choice is driven by the nature of the available data, which does not support the detailed modeling of peak flood conditions. Instead, our model focuses on the relationship between predictor variables and the response variable, which represents areas affected by flooding, to estimate flooding impacts in urban environments. We present graph-based architectures as an innovative solution to address these challenges. This approach is particularly suited to urban terrain scenarios, where internal drainage systems play a significant role, yet detailed drainage data is often unavailable. Our methodology leverages the graph structure to capture the complex interactions within urban areas that influence flooding. The results, while initially seeming straightforward, provide foundational insights into the challenges and potential of modeling urban flooding using novel data sources and computational techniques.
There is utility in a predictive model for an urban area, of which, can create estimates based on predictor variables and their relationship to a response variable representing flooded areas. In the attached pdf, there is an output of the GWN, expressed as percent increase in SF reports (SF complaints on day / average SF complaints) on a rainy day, June 21, 2019. The max hourly rainfall on this day ranged from approximately 8 mm to 10 mm, and the total daily observed amount was roughly 13 mm. Hence, there is intense rainfall for a relatively short duration. On this day, flooding was reported in multiple news outlets.
Zone 1 experiences the greatest rate of increase, and in this zone, the R2 is 0.59. On the other hand, Zone 3, which has the second highest total SF complaints, Zone 3 experiences the lowest rate of increase, and in this zone, the R2 is 0.72. This is useful information. It may suggest that the areas of concern for that day [as expressed by the residents] are not necessarily the areas that have been experiencing the most overall flooding over the study period.
Citation: https://doi.org/10.5194/egusphere-2024-551-AC2
-
AC2: 'Reply on AC1', Candace Agonafir, 22 Apr 2024
reply
-
AC1: 'Reply on RC1', Candace Agonafir, 22 Apr 2024
reply
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
190 | 53 | 16 | 259 | 6 | 6 |
- HTML: 190
- PDF: 53
- XML: 16
- Total: 259
- BibTeX: 6
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1