Preprints
https://doi.org/10.5194/egusphere-2025-7
https://doi.org/10.5194/egusphere-2025-7
21 Jan 2025
 | 21 Jan 2025

An automated approach for developing geohazard inventories using news: Integrating NLP, machine learning, and mapping

Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Abstract. Spatiotemporal inventories of natural hazards are essential for comprehending the building of resilient societies; yet, restricted access to global inventories hinders the advancement of mitigation strategies. Consequently, we developed an approach that enhances the capability of online newspapers in the creation of natural hazard inventory by utilizing web scraping, natural language processing (NLP), clustering, and geolocation of textual data. Here, we use the online newspapers from 1997 to 2023 in Türkiye to employ our approach. In the first stage, we retrieved 15,569 news by using our tr-news-scraper tool considering wildfire, flood, landslide, and sinkhole-related natural hazard news. Further, we utilized NLP preprocessing approaches to refine the raw texts obtained from newspaper sources, which were subsequently clustered into 4 natural hazard groups resulting in 3928 news. In the final stage of the approach, we developed a method, which geolocates the news using the Open Street Map (OSM) Nominatim tool, ending up with a total of 13940 natural hazard incidents derived from news comprising multiple incidents across various locations. As a result, we mapped 9609 floods, 1834 wildfires, 1843 landslides, and 654 sinkhole formation incidents from online newspaper sources, showing spatiotemporally consistent distribution with existing literature. Consequently, we illustrated the potential of online newspaper articles in the development of natural hazard inventories with our approach from the web sources as text data to map by leveraging the capabilities of web scraping, NLP, and mapping techniques.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-7', Anonymous Referee #1, 13 Feb 2025
    • AC1: 'Reply on RC1', Aydogan Avcioglu, 03 Mar 2025
      • RC3: 'Reply on AC1', Anonymous Referee #1, 04 Mar 2025
        • AC3: 'Reply on RC3', Aydogan Avcioglu, 05 Mar 2025
  • RC2: 'Comment on egusphere-2025-7', Anonymous Referee #2, 17 Feb 2025
    • AC2: 'Reply on RC2', Aydogan Avcioglu, 03 Mar 2025

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-7', Anonymous Referee #1, 13 Feb 2025
    • AC1: 'Reply on RC1', Aydogan Avcioglu, 03 Mar 2025
      • RC3: 'Reply on AC1', Anonymous Referee #1, 04 Mar 2025
        • AC3: 'Reply on RC3', Aydogan Avcioglu, 05 Mar 2025
  • RC2: 'Comment on egusphere-2025-7', Anonymous Referee #2, 17 Feb 2025
    • AC2: 'Reply on RC2', Aydogan Avcioglu, 03 Mar 2025
Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Model code and software

tr-news-scraper: Scrape Turkish news articles Ogün Demir and Aydoğan Avcıoğlu https://github.com/demirogun/tr-news-scraper

Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Viewed

Total article views: 377 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
219 146 12 377 46 7 11
  • HTML: 219
  • PDF: 146
  • XML: 12
  • Total: 377
  • Supplement: 46
  • BibTeX: 7
  • EndNote: 11
Views and downloads (calculated since 21 Jan 2025)
Cumulative views and downloads (calculated since 21 Jan 2025)

Viewed (geographical distribution)

Total article views: 374 (including HTML, PDF, and XML) Thereof 374 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 11 May 2025
Download
Short summary
Here we demonstrate an approach for the development of inventories from internet sources to geolocalized geohazard incidents. We created a tool that autonomously gets news, processes it using NLP and machine learning, and maps using Open Street Map. Consequently, we present spatiotemporal inventories for geohazards resulting in a total of 13940 incidents between 1997 and 2023 in Türkiye. Our alternative and easy-to-implement development inventory method aids geohazard management and resilience.
Share