Preprints
https://doi.org/10.5194/egusphere-2025-7
https://doi.org/10.5194/egusphere-2025-7
21 Jan 2025
 | 21 Jan 2025
Status: this preprint is open for discussion and under review for Natural Hazards and Earth System Sciences (NHESS).

An automated approach for developing geohazard inventories using news: Integrating NLP, machine learning, and mapping

Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Abstract. Spatiotemporal inventories of natural hazards are essential for comprehending the building of resilient societies; yet, restricted access to global inventories hinders the advancement of mitigation strategies. Consequently, we developed an approach that enhances the capability of online newspapers in the creation of natural hazard inventory by utilizing web scraping, natural language processing (NLP), clustering, and geolocation of textual data. Here, we use the online newspapers from 1997 to 2023 in Türkiye to employ our approach. In the first stage, we retrieved 15,569 news by using our tr-news-scraper tool considering wildfire, flood, landslide, and sinkhole-related natural hazard news. Further, we utilized NLP preprocessing approaches to refine the raw texts obtained from newspaper sources, which were subsequently clustered into 4 natural hazard groups resulting in 3928 news. In the final stage of the approach, we developed a method, which geolocates the news using the Open Street Map (OSM) Nominatim tool, ending up with a total of 13940 natural hazard incidents derived from news comprising multiple incidents across various locations. As a result, we mapped 9609 floods, 1834 wildfires, 1843 landslides, and 654 sinkhole formation incidents from online newspaper sources, showing spatiotemporally consistent distribution with existing literature. Consequently, we illustrated the potential of online newspaper articles in the development of natural hazard inventories with our approach from the web sources as text data to map by leveraging the capabilities of web scraping, NLP, and mapping techniques.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Status: open (until 04 Mar 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Model code and software

tr-news-scraper: Scrape Turkish news articles Ogün Demir and Aydoğan Avcıoğlu https://github.com/demirogun/tr-news-scraper

Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm
Metrics will be available soon.
Latest update: 21 Jan 2025
Download
Short summary
Here we demonstrate an approach for the development of inventories from internet sources to geolocalized geohazard incidents. We created a tool that autonomously gets news, processes it using NLP and machine learning, and maps using Open Street Map. Consequently, we present spatiotemporal inventories for geohazards resulting in a total of 13940 incidents between 1997 and 2023 in Türkiye. Our alternative and easy-to-implement development inventory method aids geohazard management and resilience.