the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Content Analysis of Multi-Annual Time Series of Flood-Related Twitter (X) Data
Abstract. Social media can provide insights into natural hazard events and people's emergency responses. In this study, we present a natural language processing analytic framework to extract and categorize information from of 43,287 Twitter (X) posts in German since 2014. We implement Bidirectional Encoder Representations from Transformers in combination with unsupervised clustering techniques (BERTopic) to automatically extract social media content, addressing transferability issues that arise from commonly used bag-of-word representations. We analyze the temporal evolution of topic patterns, reflecting behaviors and perceptions of citizens before, during, and after flood events. Topics related to low-impact riverine flooding contain descriptive hazard-related content, while the focus shifts to catastrophic impacts and responsibilities during high-impact events. Our analytical framework enables analyzing temporal dynamics of citizens’ behaviors and perceptions which can facilitate lessons learned analyses and improve risk communication and management.
- Preprint
(1004 KB) - Metadata XML
-
Supplement
(470 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2024-2556', Knut Seip, 02 Sep 2024
Dear Authors
The study was impressive, and must have required much work. I was just curious if you have compared your results to information from Google trends. I just tried Google trends on the term "hydrology" in Germany and got a peak on October 2010. Maybe the information in Google trends will be quite irrelevant?
Best wishes with your studies
Knut L. Seip
Citation: https://doi.org/10.5194/egusphere-2024-2556-CC1 -
AC1: 'Reply on CC1', Nadja Veigel, 03 Sep 2024
Dear Knut L. Seip,
We appreciate you reading the preprint and giving our work such a positive review. Since Google Trends already provides the themes and aggregates them based on search phrases, we haven't directly compared the results to Google Trends. As per your recommendation, I examined the Google Trends data for the specified timeframe and the search phrases (German terms for "flooding": "Hochwasser," "Flut," "Überflutung") that were included in our preprint. The comparison may be seen in the attached file, which is an extended version of Figure 2 from the preprint with Google Trends added in red. Local events are underreported on Google, with the exception of events receiving national media attention, such as those that occurred in 2010 and 2021. Many of the events we found in the Twitter data are not identifiable in the Google Trends data or the monthly time step in which Google delivers the data.
Kind Regards
Nadja Veigel
-
CC2: 'REGUSPHERE-2024-2556', Knut Seip, 03 Sep 2024
Dear Nadja
I very much appreciate your response, and it was interesting to see the comparison between the two methods. To me, the Google trend results were surprisingly good (but yours were better). Maybe it is done before, but I have never seen a comparison of twitter (X)data and Google trend data before. I have a second question, is there any policy implications of your results? (I once wrote in a policy journal, and they asked me to have a final section: "policy implications". I thought that was a good idea. Please note, you do not have to bother with responding to this question. I am just curious.
Best wishes Knut
Citation: https://doi.org/10.5194/egusphere-2024-2556-CC2
-
CC2: 'REGUSPHERE-2024-2556', Knut Seip, 03 Sep 2024
-
AC1: 'Reply on CC1', Nadja Veigel, 03 Sep 2024
-
RC1: 'Comment on egusphere-2024-2556', Samar Momin, 04 Oct 2024
This review is concerned with the article titled "Content Analysis of Multi-Annual Time Series of Flood-Related
Twitter (X) Data". It is divided into three categories, namely, general comments, specific comments and technical comments.General comments: The title of the article "Content Analysis of Multi-Annual Time Series of Flood-Related Twitter (X) Data" clearly reflects the contents of the paper, and the abstract provides a concise, complete, and unambiguous summary of the work done and the results obtained. Both these sections are pertinent and easy to understand. The manuscript is well-written and well-structured, delivering the idea, methodology, and results clearly and concisely. The figures are descriptive and of high quality, and the tables are informative. It is well-referenced with proper credit attributed to previous and/or related works, and the authors indicate each of their contributions and competing interests. Crediting the use of AI tools such as ChatGPT is fantastic, we are conducting research in the age of the AI revolution. The paper presents a comprehensive and innovative approach to using social media data from Twitter (X) to understand human behaviour and perceptions during several types of flooding events in Germany. The study develops an approach using advanced natural language processing (NLP) techniques, leveraging pre-existing and accessible tools, including transformer-based models like SBERT and clustering algorithms such as HDBSCAN, to automatically extract flood-related topics from large social media datasets. Several steps to clean and filter the data have been presented. This allows for a nuanced analysis of public response to various flood events. The paper’s relevance is clear, given the increasing reliance on real-time social media data for disaster risk management and the potential to enhance flood preparedness and response strategies. Thus, this manuscript has good scientific significance, scientific quality, and presentation quality.
Specific Comments:-
Clarification on Data Filtering: The process for removing irrelevant tweets is well-explained. However, more detail on the limitations of this filtering process could be helpful.
-
Interpretation of Topic Groups: The clustering approach is well-explained. However, further discussion on the specific implications of the topics identified (such as "disaster management" or "fatalities") could be more elaborated.
-
Comparisons with Traditional Data Sources: The paper highlights Twitter (X) data as an alternative to traditional flood impact assessments. What would be the difference between the results from social media and conventional data sources?
Technical Comments:
-
Grammar and Style:
- Line 98: "The The Second" should be "The second."
-
Figure Labels and Descriptions: The figures provide valuable visual insights, but some (esp. Fig 3 & 4) would benefit from clearer labels or captions, particularly where technical details like clustering results or topic distributions are involved.
-
In-text Citations Formatting: Ensure that citations within the text follow a consistent format. There are some minor inconsistencies in how sources are referenced throughout the manuscript.
Citation: https://doi.org/10.5194/egusphere-2024-2556-RC1 -
AC2: 'Reply on RC1', Nadja Veigel, 12 Nov 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2556/egusphere-2024-2556-AC2-supplement.pdf
-
-
RC2: 'Comment on egusphere-2024-2556', Anonymous Referee #2, 05 Oct 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2556/egusphere-2024-2556-RC2-supplement.pdf
-
AC3: 'Reply on RC2', Nadja Veigel, 12 Nov 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2556/egusphere-2024-2556-AC3-supplement.pdf
-
AC3: 'Reply on RC2', Nadja Veigel, 12 Nov 2024
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
236 | 89 | 124 | 449 | 27 | 3 | 3 |
- HTML: 236
- PDF: 89
- XML: 124
- Total: 449
- Supplement: 27
- BibTeX: 3
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1