Unravelling information on impactful geo-hydrological hazard events with HazMiner, a multilingual text mining method developed through a global scale coverage application
Abstract. The incidence and impacts from geo-hydrological hazards (GH) such as floods, flash floods and landslides are changing globally due to anthropogenic environmental changes and increased exposure driven by population growth. Reliable datasets on GH are essential to deepen our understanding of these hazards and their impacts. However, existing GH datasets contain data gaps leading to biased interpretations, especially in the Global South where populations are commonly the most impacted. Text mining offers new opportunities in documenting GH by automatically extracting information from large text corpora. Despite its potential, current methodologies are not adapted to improve documentation in data-scared contexts. We present HazMiner, a paragraph-based text mining method designed to document the location, timing and impact of GH through large language models across multiple languages and at various scales. Applied here globally on 6,366,905 news articles published from 2017 through 2024 in 58 languages, HazMiner extracted 21,411 flood, 7,659 landslide and 3,606 flash flood events with known location and time information and, in some cases, impact data. Compared to existing hazard datasets, HazMiner significantly improved hazard documentation, reducing the data gaps in many regions, especially in the Global South. The new versatile multilingual method and its dataset advances both text mining and natural hazard research.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Natural Hazards and Earth System Sciences.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.