Preprints
https://doi.org/10.5194/egusphere-2026-722
https://doi.org/10.5194/egusphere-2026-722
12 Feb 2026
 | 12 Feb 2026
Status: this preprint is open for discussion and under review for Natural Hazards and Earth System Sciences (NHESS).

Unravelling information on impactful geo-hydrological hazard events with HazMiner, a multilingual text mining method developed through a global scale coverage application

Bram Valkenborg, Olivier Dewitte, and Benoît Smets

Abstract. The incidence and impacts from geo-hydrological hazards (GH) such as floods, flash floods and landslides are changing globally due to anthropogenic environmental changes and increased exposure driven by population growth. Reliable datasets on GH are essential to deepen our understanding of these hazards and their impacts. However, existing GH datasets contain data gaps leading to biased interpretations, especially in the Global South where populations are commonly the most impacted. Text mining offers new opportunities in documenting GH by automatically extracting information from large text corpora. Despite its potential, current methodologies are not adapted to improve documentation in data-scared contexts. We present HazMiner, a paragraph-based text mining method designed to document the location, timing and impact of GH through large language models across multiple languages and at various scales. Applied here globally on 6,366,905 news articles published from 2017 through 2024 in 58 languages, HazMiner extracted 21,411 flood, 7,659 landslide and 3,606 flash flood events with known location and time information and, in some cases, impact data. Compared to existing hazard datasets, HazMiner significantly improved hazard documentation, reducing the data gaps in many regions, especially in the Global South. The new versatile multilingual method and its dataset advances both text mining and natural hazard research.

Competing interests: At least one of the (co-)authors is a member of the editorial board of Natural Hazards and Earth System Sciences.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Bram Valkenborg, Olivier Dewitte, and Benoît Smets

Status: open (until 26 Mar 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Bram Valkenborg, Olivier Dewitte, and Benoît Smets
Bram Valkenborg, Olivier Dewitte, and Benoît Smets
Metrics will be available soon.
Latest update: 12 Feb 2026
Download
Short summary
Data gaps in datasets on floods, landslides, and flash floods influence our understanding of these hazards. We present HazMiner, a new method to extract their location, timing, and impacts from online news articles. We applied HazMiner at the global scale to 6,366,905 news articles published from 2017 through 2024. This resulted in the detection of 21,411 floods, 7,659 landslides, and 3,606 flash floods. HazMiner outperforms current hazard datasets, especially in data-scarce regions.
Share