the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Tracing online flood conversations across borders: A watershed level analysis of geo-social media topics during the 2021 European flood
Abstract. In the face of rapid population growth, urbanisation, and accelerating climate change, the need for rapid and accurate disaster detection has become critical to minimising human and material losses. In this context, geo-social media data has proven to be a sensible data source for tracing disaster-related conversations, especially during flood events. However, current research often neglects the relationship between information from social media posts and their corresponding geographical context. In this paper, we examine the emergence of disaster-related social media topics in relation with hydrological and socio-environmental features on watershed level during the 2021 Western European flood, while focusing on transboundary river basins. Building upon an advanced machine learning-based topic modelling approach, we show the emergence of flood-related geo-social media topics both in river-basin specific and cross-basin contexts. Our analysis reveals distinct spatio-temporal dynamics in the public discourse, showing that timely topics describing heavy rains or flood damages were closely tied to immediate environmental conditions in upstream areas, while post-disaster topics about helping victims or volunteering were more prevalent in less affected areas located in both upstream and downstream areas. These findings highlight how social media responses to disasters differ spatially across watersheds and underscore the importance of integrating geo-social media analysis into disaster coordination efforts, opening new opportunities for transboundary collaborations and the coordination of emergency response along border-crossing rivers.
- Preprint
(2335 KB) - Metadata XML
-
Supplement
(213 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3255', Samar Momin, 28 Jan 2025
General Comments:
The paper analyzes flood-related social media conversations using a watershed-based approach, uncovering spatio-temporal dynamics during the 2021 Western European flood. It highlights the importance of a transboundary river basin perspective for improving disaster management and cooperation across borders.
The manuscript is well-organized, with clear objectives, methods, and results. Its use of geo-social media data combined with watershed characteristics is innovative and supported by a solid methodology.
Strengths:
-
Innovative Methodology:
- The use of BERTopic for semantic classification and the integration of watershed-level characteristics are cutting-edge approaches.
- The methodology is well-documented and replicable.
-
Relevance and Applicability:
- The paper addresses critical challenges in flood risk management across national borders, providing practical insights for preparedness and response strategies.
-
Effective Use of TF-IDF:
- The application of Term Frequency-Inverse Document Frequency (TF-IDF) effectively extracts meaningful insights from social media data, demonstrating strong analytical rigor.
-
Comprehensive Data Analysis:
- The study incorporates a range of datasets, including precipitation models, flood mapping, and socio-environmental data, ensuring a robust and multi-dimensional analysis.
Specific Comments:
-
Terminology Update:
- Could the authors explain why the references to "Twitter" and "Tweets" are not inline with the platform's updated name, "X," and use "posts" instead of "Tweets" to align with the rebranding.
- A brief clarification can be added in the methodology section, such as: "In this paper, we refer to user-generated content on the platform formerly known as Twitter as 'X posts.'"
-
Choice of Translation Tool:
- The paper uses the Google Translate API for translating posts into English. However, there are several free and open-source tools available (e.g., MarianMT, DeepL). Could the authors justify this choice, explaining why Google Translate was preferred (e.g., for its accuracy or language coverage).
-
Data Biases:
- Social media data tends to underrepresent remote and less urbanized areas. The paper could elaborate on how this bias may affect the interpretation of dominant topics and the generalizability of the findings.
-
Comparison with Traditional Data Sources:
- While the study effectively demonstrates the value of social media data, including a comparison with traditional sources like official reports or surveys could enhance its relevance and contextualize its strengths and limitations.
-
Policy Implications:
- Expanding on how the identified topics can inform flood preparedness and cross-border coordination efforts would improve the study’s practical relevance, especially for policymakers and emergency responders.
Citation: https://doi.org/10.5194/egusphere-2024-3255-RC1 -
-
RC2: 'Comment on egusphere-2024-3255', Anonymous Referee #2, 05 Feb 2025
The authors expand existing methodologies for the analysis of spatiotemporal relationships between flood-related topics in social media posts, flood, and basin characteristics during major flooding events. The proposed analysis is interesting and potentially useful to flood-response authorities. The manuscript is well written.
I have some recommendations to enhance clarity and facilitate reproducibility across other case-study regions, before publication:1. At the end of Sections 2.2.1 and 2.2.2, I would include a table summarizing meteorological, flood, and watershed data used in the study.
2. Lines 177-180: briefly describe the zonal statistical approach and the coverage fraction method used to assign precipitation values from the raster dataset to the watersheds.
3. Lines 167-170: river catchment areas most affected by flooding are identified as those with more than 100 mm of precipitation. How do the authors consider any major effects in downstream locations that were not directly involved by high precipitation?
4. Lines 178-180: knowing the packages used to perform the analysis may not be sufficient to reproduce the analysis. I would invite the authors to share a code streamlining (at least parts of) the methodological steps. That would help local authorities and stakeholders take advantage of geo-social data during flood hazard management. I suggest the authors include a code availability statement in their article. If the authors used code from other sources, then they could include the full list of those sources in that statement.
5. Lines 192-198: briefly explain what “embedding” and “vectorization” mean in the context of natural language processing
6. Line 202: is the number of topics (30, in the specific case) a parameter of the k-means clustering algorithm? E.g., a predetermined number of clusters that the algorithm is asked to find.
7. Lines 224-227: in those cases where more than one topic was equally dominant in some days, how did the authors decide what topic to retain and what others to discard? How often does this happen? Is there any risk of introducing subjectivity?
8. Fig. 3: I suggest mentioning in the caption the HydroBASIN watershed level considered.
9. Fig. 4: for each daily bar, what are the empty portions on top of the shaded portions? In other words, what are daily percentages calculated on? Is 100% the total amount of Tweets, including those unrelated to flooding?
10. Fig. 7 is very useful to outline the spatial distribution of social-media topics. However, some additional clarifications are necessary. Does each distribution really represent the frequency of dominant topics (as stated in the caption)? Or else, since there is a distribution for all considered topics, what the figure really shows is the frequency (in space) of each topic one at a time? Given that some topics were more frequent than others, is it the case that different distributions are associated with different overall numbers of topic occurrences in space? Also, what about the temporal variability? Does the figure show the cumulative values of topic occurrences at any basin locations, cumulated over time? Or something else? The authors should include a more detailed explanation to clarify these aspects.
11. Fig. 7 is very effective in showing the spatial variability in flood-related topics discussed in social media. However, as it is now, it does not consider how these distributions may change across different macro regions. Given the emphasis that the authors give to the trans-boundary character of their study (e.g., lines 17, 24, 38), I think it would be interesting to include three more figures like Fig. 7 but referred to the individual Escaut, Meuse, and Rhine river basins, to see if any significant differences emerge.
12. Lines 503-504: Under what label were the “Damage”-centered tweets clustered at those iterations where the “Damage” topic did not emerge? Were they clustered with the Rhine and Meuse flood topics? Would the results change remarkably if the “Damage” topic were not considered?Minor comments:
1. Line 123: correct “extend” to “extent”
2. Figure 1: the part of study area within Belgian boundaries presents an abrupt jump in the gradient of greys representing elevations.
3. Figure 1: the lower bound of the elevation color bar is negative, and equal to -179 m; what is the reference elevation associated with 0 m?
4. Line 101: do not use parentheses inside parentheses for the citations’ years.
5. Lines 194-195: explain what the acronym UMAP stands for, exactly.
6. Caption of Fig. 3 (line 290) says that thick blue is the color used to represent rivers most impacted by precipitation. However, this is not in agreement with the legend, which adopts a light orange instead.
7. Table 1: there are some words (e.g., “maas”, “venlo”, “dinant”, “nrw”, etc.) that are not in English. If they are names of towns, cities, rivers, etc., I would specify that somehow at the bottom of the table. Otherwise, some readers that are not familiar with the local geography might be left wondering whether they are untranslated words.
8. Fig. 7: clarify in the caption what watershed scale was used.
9. Line 453: “show” should be “shows” for third person singular
10. Line 461: correct “use” to “used”Citation: https://doi.org/10.5194/egusphere-2024-3255-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
165 | 53 | 9 | 227 | 39 | 4 | 5 |
- HTML: 165
- PDF: 53
- XML: 9
- Total: 227
- Supplement: 39
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1