Unravelling information on impactful geo-hydrological hazard events with HazMiner, a multilingual text mining method developed through a global scale coverage application

Valkenborg, Bram; Dewitte, Olivier; Smets, Benoît

doi:10.5194/egusphere-2026-722

Preprints

https://doi.org/10.5194/egusphere-2026-722

Preprints

12 Feb 2026

| 12 Feb 2026

Status: this preprint is open for discussion and under review for Natural Hazards and Earth System Sciences (NHESS).

Unravelling information on impactful geo-hydrological hazard events with HazMiner, a multilingual text mining method developed through a global scale coverage application

Bram Valkenborg, Olivier Dewitte, and Benoît Smets

Abstract. The incidence and impacts from geo-hydrological hazards (GH) such as floods, flash floods and landslides are changing globally due to anthropogenic environmental changes and increased exposure driven by population growth. Reliable datasets on GH are essential to deepen our understanding of these hazards and their impacts. However, existing GH datasets contain data gaps leading to biased interpretations, especially in the Global South where populations are commonly the most impacted. Text mining offers new opportunities in documenting GH by automatically extracting information from large text corpora. Despite its potential, current methodologies are not adapted to improve documentation in data-scared contexts. We present HazMiner, a paragraph-based text mining method designed to document the location, timing and impact of GH through large language models across multiple languages and at various scales. Applied here globally on 6,366,905 news articles published from 2017 through 2024 in 58 languages, HazMiner extracted 21,411 flood, 7,659 landslide and 3,606 flash flood events with known location and time information and, in some cases, impact data. Compared to existing hazard datasets, HazMiner significantly improved hazard documentation, reducing the data gaps in many regions, especially in the Global South. The new versatile multilingual method and its dataset advances both text mining and natural hazard research.

Received: 06 Feb 2026 – Discussion started: 12 Feb 2026

Competing interests: At least one of the (co-)authors is a member of the editorial board of Natural Hazards and Earth System Sciences.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Bram Valkenborg, Olivier Dewitte, and Benoît Smets

Status: open (until 26 Mar 2026)

Post a comment Subscribe to comment alert

Bram Valkenborg, Olivier Dewitte, and Benoît Smets

Viewed

Total article views: 414 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
266	139	9	414	33	31

HTML: 266
PDF: 139
XML: 9
Total: 414
BibTeX: 33
EndNote: 31

Views and downloads (calculated since 12 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	252	107	8	367
Mar 2026	14	32	1	47

Cumulative views and downloads (calculated since 12 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	252	107	8	367
Mar 2026	14	32	1	47

Viewed (geographical distribution)

Total article views: 412 (including HTML, PDF, and XML) Thereof 412 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 07 Mar 2026

Short summary

Data gaps in datasets on floods, landslides, and flash floods influence our understanding of these hazards. We present HazMiner, a new method to extract their location, timing, and impacts from online news articles. We applied HazMiner at the global scale to 6,366,905 news articles published from 2017 through 2024. This resulted in the detection of 21,411 floods, 7,659 landslides, and 3,606 flash floods. HazMiner outperforms current hazard datasets, especially in data-scarce regions.


Total:	0
HTML:	0
PDF:	0
XML:	0