Content Analysis of Multi-Annual Time Series of Flood-Related Twitter (X) Data

Veigel, Nadja; Kreibich, Heidi; de Bruijn, Jens A.; Aerts, Jeroen C. J. H.; Cominola, Andrea

doi:https://doi.org/10.5194/egusphere-2024-2556

Preprints

https://doi.org/10.5194/egusphere-2024-2556

Preprints

30 Aug 2024

| 30 Aug 2024

Content Analysis of Multi-Annual Time Series of Flood-Related Twitter (X) Data

Nadja Veigel, Heidi Kreibich, Jens A. de Bruijn, Jeroen C. J. H. Aerts, and Andrea Cominola

Abstract. Social media can provide insights into natural hazard events and people's emergency responses. In this study, we present a natural language processing analytic framework to extract and categorize information from of 43,287 Twitter (X) posts in German since 2014. We implement Bidirectional Encoder Representations from Transformers in combination with unsupervised clustering techniques (BERTopic) to automatically extract social media content, addressing transferability issues that arise from commonly used bag-of-word representations. We analyze the temporal evolution of topic patterns, reflecting behaviors and perceptions of citizens before, during, and after flood events. Topics related to low-impact riverine flooding contain descriptive hazard-related content, while the focus shifts to catastrophic impacts and responsibilities during high-impact events. Our analytical framework enables analyzing temporal dynamics of citizens’ behaviors and perceptions which can facilitate lessons learned analyses and improve risk communication and management.

Received: 16 Aug 2024 – Discussion started: 30 Aug 2024

Competing interests: One of the co-authors is part of the NHESS editorial board.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1004 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1004 KB)

Supplement (470 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

26 Feb 2025

Content analysis of multi-annual time series of flood-related Twitter (X) data

Nadja Veigel, Heidi Kreibich, Jens A. de Bruijn, Jeroen C. J. H. Aerts, and Andrea Cominola

Nat. Hazards Earth Syst. Sci., 25, 879–891, https://doi.org/10.5194/nhess-25-879-2025,https://doi.org/10.5194/nhess-25-879-2025, 2025

Short summary

Nadja Veigel, Heidi Kreibich, Jens A. de Bruijn, Jeroen C. J. H. Aerts, and Andrea Cominola

Interactive discussion

Status: closed

CC1:
'Comment on egusphere-2024-2556', Knut Seip, 02 Sep 2024

Dear Authors
The study was impressive, and must have required much work. I was just curious if you have compared your results to information from Google trends. I just tried Google trends on the term "hydrology" in Germany and got a peak on October 2010. Maybe the information in Google trends will be quite irrelevant?
Best wishes with your studies
Knut L. Seip

Citation: https://doi.org/10.5194/egusphere-2024-2556-CC1
- AC1:
  'Reply on CC1', Nadja Veigel, 03 Sep 2024
  
  Dear Knut L. Seip,
  We appreciate you reading the preprint and giving our work such a positive review. Since Google Trends already provides the themes and aggregates them based on search phrases, we haven't directly compared the results to Google Trends. As per your recommendation, I examined the Google Trends data for the specified timeframe and the search phrases (German terms for "flooding": "Hochwasser," "Flut," "Überflutung") that were included in our preprint. The comparison may be seen in the attached file, which is an extended version of Figure 2 from the preprint with Google Trends added in red. Local events are underreported on Google, with the exception of events receiving national media attention, such as those that occurred in 2010 and 2021. Many of the events we found in the Twitter data are not identifiable in the Google Trends data or the monthly time step in which Google delivers the data.
  Kind Regards
  Nadja Veigel
  
  Citation: https://doi.org/10.5194/egusphere-2024-2556-AC1
  - CC2: 'REGUSPHERE-2024-2556', Knut Seip, 03 Sep 2024
    
    Dear Nadja
    
    I very much appreciate your response, and it was interesting to see the comparison between the two methods. To me, the Google trend results were surprisingly good (but yours were better). Maybe it is done before, but I have never seen a comparison of twitter (X)data and Google trend data before. I have a second question, is there any policy implications of your results? (I once wrote in a policy journal, and they asked me to have a final section: "policy implications". I thought that was a good idea. Please note, you do not have to bother with responding to this question. I am just curious.
    
    Best wishes Knut
    
    Citation: https://doi.org/10.5194/egusphere-2024-2556-CC2
RC1:
'Comment on egusphere-2024-2556', Samar Momin, 04 Oct 2024
This review is concerned with the article titled "Content Analysis of Multi-Annual Time Series of Flood-Related

Twitter (X) Data". It is divided into three categories, namely, general comments, specific comments and technical comments.
General comments: The title of the article "Content Analysis of Multi-Annual Time Series of Flood-Related Twitter (X) Data" clearly reflects the contents of the paper, and the abstract provides a concise, complete, and unambiguous summary of the work done and the results obtained. Both these sections are pertinent and easy to understand. The manuscript is well-written and well-structured, delivering the idea, methodology, and results clearly and concisely. The figures are descriptive and of high quality, and the tables are informative. It is well-referenced with proper credit attributed to previous and/or related works, and the authors indicate each of their contributions and competing interests. Crediting the use of AI tools such as ChatGPT is fantastic, we are conducting research in the age of the AI revolution. The paper presents a comprehensive and innovative approach to using social media data from Twitter (X) to understand human behaviour and perceptions during several types of flooding events in Germany. The study develops an approach using advanced natural language processing (NLP) techniques, leveraging pre-existing and accessible tools, including transformer-based models like SBERT and clustering algorithms such as HDBSCAN, to automatically extract flood-related topics from large social media datasets. Several steps to clean and filter the data have been presented. This allows for a nuanced analysis of public response to various flood events. The paper’s relevance is clear, given the increasing reliance on real-time social media data for disaster risk management and the potential to enhance flood preparedness and response strategies. Thus, this manuscript has good scientific significance, scientific quality, and presentation quality.

Specific Comments:

Clarification on Data Filtering: The process for removing irrelevant tweets is well-explained. However, more detail on the limitations of this filtering process could be helpful.

Interpretation of Topic Groups: The clustering approach is well-explained. However, further discussion on the specific implications of the topics identified (such as "disaster management" or "fatalities") could be more elaborated.

Comparisons with Traditional Data Sources: The paper highlights Twitter (X) data as an alternative to traditional flood impact assessments. What would be the difference between the results from social media and conventional data sources?

Technical Comments:

Grammar and Style:

Line 98: "The The Second" should be "The second."

Figure Labels and Descriptions: The figures provide valuable visual insights, but some (esp. Fig 3 & 4) would benefit from clearer labels or captions, particularly where technical details like clustering results or topic distributions are involved.

In-text Citations Formatting: Ensure that citations within the text follow a consistent format. There are some minor inconsistencies in how sources are referenced throughout the manuscript.
Citation: https://doi.org/10.5194/egusphere-2024-2556-RC1
- AC2: 'Reply on RC1', Nadja Veigel, 12 Nov 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2556/egusphere-2024-2556-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2556-AC2
RC2:
'Comment on egusphere-2024-2556', Anonymous Referee #2, 05 Oct 2024

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2556/egusphere-2024-2556-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2024-2556-RC2
- AC3: 'Reply on RC2', Nadja Veigel, 12 Nov 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2556/egusphere-2024-2556-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2556-AC3

Interactive discussion

Status: closed

CC1:
'Comment on egusphere-2024-2556', Knut Seip, 02 Sep 2024

Dear Authors
The study was impressive, and must have required much work. I was just curious if you have compared your results to information from Google trends. I just tried Google trends on the term "hydrology" in Germany and got a peak on October 2010. Maybe the information in Google trends will be quite irrelevant?
Best wishes with your studies
Knut L. Seip

Citation: https://doi.org/10.5194/egusphere-2024-2556-CC1
- AC1:
  'Reply on CC1', Nadja Veigel, 03 Sep 2024
  
  Dear Knut L. Seip,
  We appreciate you reading the preprint and giving our work such a positive review. Since Google Trends already provides the themes and aggregates them based on search phrases, we haven't directly compared the results to Google Trends. As per your recommendation, I examined the Google Trends data for the specified timeframe and the search phrases (German terms for "flooding": "Hochwasser," "Flut," "Überflutung") that were included in our preprint. The comparison may be seen in the attached file, which is an extended version of Figure 2 from the preprint with Google Trends added in red. Local events are underreported on Google, with the exception of events receiving national media attention, such as those that occurred in 2010 and 2021. Many of the events we found in the Twitter data are not identifiable in the Google Trends data or the monthly time step in which Google delivers the data.
  Kind Regards
  Nadja Veigel
  
  Citation: https://doi.org/10.5194/egusphere-2024-2556-AC1
  - CC2: 'REGUSPHERE-2024-2556', Knut Seip, 03 Sep 2024
    
    Dear Nadja
    
    I very much appreciate your response, and it was interesting to see the comparison between the two methods. To me, the Google trend results were surprisingly good (but yours were better). Maybe it is done before, but I have never seen a comparison of twitter (X)data and Google trend data before. I have a second question, is there any policy implications of your results? (I once wrote in a policy journal, and they asked me to have a final section: "policy implications". I thought that was a good idea. Please note, you do not have to bother with responding to this question. I am just curious.
    
    Best wishes Knut
    
    Citation: https://doi.org/10.5194/egusphere-2024-2556-CC2
RC1:
'Comment on egusphere-2024-2556', Samar Momin, 04 Oct 2024
This review is concerned with the article titled "Content Analysis of Multi-Annual Time Series of Flood-Related

Twitter (X) Data". It is divided into three categories, namely, general comments, specific comments and technical comments.
General comments: The title of the article "Content Analysis of Multi-Annual Time Series of Flood-Related Twitter (X) Data" clearly reflects the contents of the paper, and the abstract provides a concise, complete, and unambiguous summary of the work done and the results obtained. Both these sections are pertinent and easy to understand. The manuscript is well-written and well-structured, delivering the idea, methodology, and results clearly and concisely. The figures are descriptive and of high quality, and the tables are informative. It is well-referenced with proper credit attributed to previous and/or related works, and the authors indicate each of their contributions and competing interests. Crediting the use of AI tools such as ChatGPT is fantastic, we are conducting research in the age of the AI revolution. The paper presents a comprehensive and innovative approach to using social media data from Twitter (X) to understand human behaviour and perceptions during several types of flooding events in Germany. The study develops an approach using advanced natural language processing (NLP) techniques, leveraging pre-existing and accessible tools, including transformer-based models like SBERT and clustering algorithms such as HDBSCAN, to automatically extract flood-related topics from large social media datasets. Several steps to clean and filter the data have been presented. This allows for a nuanced analysis of public response to various flood events. The paper’s relevance is clear, given the increasing reliance on real-time social media data for disaster risk management and the potential to enhance flood preparedness and response strategies. Thus, this manuscript has good scientific significance, scientific quality, and presentation quality.

Specific Comments:

Clarification on Data Filtering: The process for removing irrelevant tweets is well-explained. However, more detail on the limitations of this filtering process could be helpful.

Interpretation of Topic Groups: The clustering approach is well-explained. However, further discussion on the specific implications of the topics identified (such as "disaster management" or "fatalities") could be more elaborated.

Comparisons with Traditional Data Sources: The paper highlights Twitter (X) data as an alternative to traditional flood impact assessments. What would be the difference between the results from social media and conventional data sources?

Technical Comments:

Grammar and Style:

Line 98: "The The Second" should be "The second."

Figure Labels and Descriptions: The figures provide valuable visual insights, but some (esp. Fig 3 & 4) would benefit from clearer labels or captions, particularly where technical details like clustering results or topic distributions are involved.

In-text Citations Formatting: Ensure that citations within the text follow a consistent format. There are some minor inconsistencies in how sources are referenced throughout the manuscript.
Citation: https://doi.org/10.5194/egusphere-2024-2556-RC1
- AC2: 'Reply on RC1', Nadja Veigel, 12 Nov 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2556/egusphere-2024-2556-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2556-AC2
RC2:
'Comment on egusphere-2024-2556', Anonymous Referee #2, 05 Oct 2024

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2556/egusphere-2024-2556-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2024-2556-RC2
- AC3: 'Reply on RC2', Nadja Veigel, 12 Nov 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2556/egusphere-2024-2556-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2556-AC3

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (02 Dec 2024) by Vassiliki Kotroni

AR by Nadja Veigel on behalf of the Authors (30 Dec 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (03 Jan 2025) by Vassiliki Kotroni

AR by Nadja Veigel on behalf of the Authors (07 Jan 2025)

Journal article(s) based on this preprint

26 Feb 2025

Content analysis of multi-annual time series of flood-related Twitter (X) data

Nadja Veigel, Heidi Kreibich, Jens A. de Bruijn, Jeroen C. J. H. Aerts, and Andrea Cominola

Nat. Hazards Earth Syst. Sci., 25, 879–891, https://doi.org/10.5194/nhess-25-879-2025,https://doi.org/10.5194/nhess-25-879-2025, 2025

Short summary

Nadja Veigel, Heidi Kreibich, Jens A. de Bruijn, Jeroen C. J. H. Aerts, and Andrea Cominola

Supplement

https://doi.org/10.5194/egusphere-2024-2556-supplement

Nadja Veigel, Heidi Kreibich, Jens A. de Bruijn, Jeroen C. J. H. Aerts, and Andrea Cominola

Viewed

Total article views: 615 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
285	101	229	615	33	11	14

HTML: 285
PDF: 101
XML: 229
Total: 615
Supplement: 33
BibTeX: 11
EndNote: 14

Views and downloads (calculated since 30 Aug 2024)

Month	HTML	PDF	XML	Total
Aug 2024	41	5	0	46
Sep 2024	100	29	7	136
Oct 2024	60	35	2	97
Nov 2024	31	14	91	136
Dec 2024	15	12	90	117
Jan 2025	25	4	38	67
Feb 2025	13	2	1	16

Cumulative views and downloads (calculated since 30 Aug 2024)

Month	HTML	PDF	XML	Total
Aug 2024	41	5	0	46
Sep 2024	100	29	7	136
Oct 2024	60	35	2	97
Nov 2024	31	14	91	136
Dec 2024	15	12	90	117
Jan 2025	25	4	38	67
Feb 2025	13	2	1	16

Viewed (geographical distribution)

Total article views: 659 (including HTML, PDF, and XML) Thereof 659 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Feb 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1004 KB)
Metadata XML

Short summary

This study explores how social media, specifically Twitter (X), can help understand public reactions to floods in Germany from 2014 to 2021. Using large language models, we extract topics and patterns of behavior from flood-related tweets. The findings offer insights to improve communication and disaster management. Topics related to low-impact flooding contain descriptive hazard-related content, while the focus shifts to catastrophic impacts and responsibilities during high-impact events.


Total:	0
HTML:	0
PDF:	0
XML:	0