From text to geoinformation &ndash; A modular approach for extraction of disaster information from web text data

Rittlinger, Vanessa; Mast, Johannes; Voigt, Stefan; Geiß, Christian; Taubenböck, Hannes

doi:10.5194/egusphere-2026-2361

Preprints

https://doi.org/10.5194/egusphere-2026-2361

Preprints

05 Jun 2026

| 05 Jun 2026

Status: this preprint is open for discussion and under review for Natural Hazards and Earth System Sciences (NHESS).

From text to geoinformation – A modular approach for extraction of disaster information from web text data

Vanessa Rittlinger, Johannes Mast, Stefan Voigt, Christian Geiß, and Hannes Taubenböck

Abstract. The implementation of effective disaster management measures requires comprehensive information about a given flooding situation. Text data from web news offer potentially large volumes of information for this purpose. However, the extraction and spatiotemporal analysis of flood event-related information is inherently demanding due to the immense volume of unstructured text. Addressing this challenge, we present a modular and scalable method that allows the extraction of disaster-relevant information from a large text corpus. This is accomplished by combining domain specific entity extraction with dictionaries, a machine learning model for toponym identification, and hand-crafted rules for entity linking in a modular workflow. The extracted information is augmented with geolocations in order to support spatial analysis. Using the West Germany flooding event 2021 as a case study, we evaluate the capacity of our approach to extract relevant geospatial information at a variety of spatial granularity levels and in the form of various thematic descriptors. By doing so, we outline the capabilities and limitations of this approach for text extraction and analysis. Furthermore, we demonstrate the potential for systematic utilization of text data for improved situational awareness and for disaster management support.

Received: 24 Apr 2026 – Discussion started: 05 Jun 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Vanessa Rittlinger, Johannes Mast, Stefan Voigt, Christian Geiß, and Hannes Taubenböck

Status: open (until 17 Jul 2026)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2026-2361', Anonymous Referee #1, 24 Jun 2026 reply

This paper presents a workflow for extracting disaster related information from Web text. The authors demonstrated this workflow in a case study on the 2021 flooding event in West Germany. Overall, this paper is more like a project report and lacks a clear research contribution.
- Web text is known to have various biases. Depending on the disaster, some aspects of the disaster may be reported while some other aspects may be ignored. This study specifically used Web news from the GDELT project, and the extracted information will inherit all these biases. This is not to mention the NLP and other methods used to process the text, which have their own algorithmic biases.
- How long is the time delay in the Web news in GDELT? If the news in GDELT is largely delayed, then the extracted information is unlikely going to be useful for disaster response.
- Related to the previous points, how would the information extracted from news be useful for disaster managers? Can the authors provide some concrete examples in which the information extracted from the news is something unknown to disaster managers?
- This paper proposes a general workflow for extracting multiple types of information from Web text. However, there is no comparison with previous methods. A concrete research paper would focus on extracting one or two types of information (e.g., topics or locations) and compare the proposed new method with existing methods.
- The methods used in the workflow seem to be outdated, such as TF-IDF which is an old information retrieval technique. It is unclear what the methodological innovation of this paper is.
- The performance of the workflow as reported in Figure 5 is low, with F1-score being about 0.4. It is unclear whether this workflow can extract information accurately.
- This paper is based on a single case study on a flooding event. A stronger paper would have another case study, ideally on another type of disaster, to demonstrate the generalizability of this workflow.

Reply

Citation: https://doi.org/10.5194/egusphere-2026-2361-RC1

Vanessa Rittlinger, Johannes Mast, Stefan Voigt, Christian Geiß, and Hannes Taubenböck

Viewed

Total article views: 55 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
39	9	7	55	5	3

HTML: 39
PDF: 9
XML: 7
Total: 55
BibTeX: 5
EndNote: 3

Views and downloads (calculated since 05 Jun 2026)

Month	HTML	PDF	XML	Total
Jun 2026	32	8	7	47
Jul 2026	7	1	0	8

Cumulative views and downloads (calculated since 05 Jun 2026)

Month	HTML	PDF	XML	Total
Jun 2026	32	8	7	47
Jul 2026	7	1	0	8

Viewed (geographical distribution)

Total article views: 50 (including HTML, PDF, and XML) Thereof 50 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 13 Jul 2026

Short summary

This paper presents a modular approach for extracting disaster-relevant data from web news using machine learning, rules, and dictionaries. Extracted entities are geolocated and classified into a flood disaster cycle framework, enabling fine-grained spatiotemporal analysis in synergy with geospatial data. The West Germany flooding event 2021 serves as a case study. While rule-based limitations persist, the approach demonstrates strong potential for disaster management applications.


Total:	0
HTML:	0
PDF:	0
XML:	0