Preprints
https://doi.org/10.5194/egusphere-2026-2361
https://doi.org/10.5194/egusphere-2026-2361
05 Jun 2026
 | 05 Jun 2026
Status: this preprint is open for discussion and under review for Natural Hazards and Earth System Sciences (NHESS).

From text to geoinformation – A modular approach for extraction of disaster information from web text data

Vanessa Rittlinger, Johannes Mast, Stefan Voigt, Christian Geiß, and Hannes Taubenböck

Abstract. The implementation of effective disaster management measures requires comprehensive information about a given flooding situation. Text data from web news offer potentially large volumes of information for this purpose. However, the extraction and spatiotemporal analysis of flood event-related information is inherently demanding due to the immense volume of unstructured text. Addressing this challenge, we present a modular and scalable method that allows the extraction of disaster-relevant information from a large text corpus. This is accomplished by combining domain specific entity extraction with dictionaries, a machine learning model for toponym identification, and hand-crafted rules for entity linking in a modular workflow. The extracted information is augmented with geolocations in order to support spatial analysis. Using the West Germany flooding event 2021 as a case study, we evaluate the capacity of our approach to extract relevant geospatial information at a variety of spatial granularity levels and in the form of various thematic descriptors. By doing so, we outline the capabilities and limitations of this approach for text extraction and analysis. Furthermore, we demonstrate the potential for systematic utilization of text data for improved situational awareness and for disaster management support.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Vanessa Rittlinger, Johannes Mast, Stefan Voigt, Christian Geiß, and Hannes Taubenböck

Status: open (until 17 Jul 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Vanessa Rittlinger, Johannes Mast, Stefan Voigt, Christian Geiß, and Hannes Taubenböck
Vanessa Rittlinger, Johannes Mast, Stefan Voigt, Christian Geiß, and Hannes Taubenböck
Metrics will be available soon.
Latest update: 05 Jun 2026
Download
Short summary
This paper presents a modular approach for extracting disaster-relevant data from web news using machine learning, rules, and dictionaries. Extracted entities are geolocated and classified into a flood disaster cycle framework, enabling fine-grained spatiotemporal analysis in synergy with geospatial data. The West Germany flooding event 2021 serves as a case study. While rule-based limitations persist, the approach demonstrates strong potential for disaster management applications.
Share