Preprints
https://doi.org/10.5194/egusphere-2025-7
https://doi.org/10.5194/egusphere-2025-7
21 Jan 2025
 | 21 Jan 2025

An automated approach for developing geohazard inventories using news: Integrating NLP, machine learning, and mapping

Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Abstract. Spatiotemporal inventories of natural hazards are essential for comprehending the building of resilient societies; yet, restricted access to global inventories hinders the advancement of mitigation strategies. Consequently, we developed an approach that enhances the capability of online newspapers in the creation of natural hazard inventory by utilizing web scraping, natural language processing (NLP), clustering, and geolocation of textual data. Here, we use the online newspapers from 1997 to 2023 in Türkiye to employ our approach. In the first stage, we retrieved 15,569 news by using our tr-news-scraper tool considering wildfire, flood, landslide, and sinkhole-related natural hazard news. Further, we utilized NLP preprocessing approaches to refine the raw texts obtained from newspaper sources, which were subsequently clustered into 4 natural hazard groups resulting in 3928 news. In the final stage of the approach, we developed a method, which geolocates the news using the Open Street Map (OSM) Nominatim tool, ending up with a total of 13940 natural hazard incidents derived from news comprising multiple incidents across various locations. As a result, we mapped 9609 floods, 1834 wildfires, 1843 landslides, and 654 sinkhole formation incidents from online newspaper sources, showing spatiotemporally consistent distribution with existing literature. Consequently, we illustrated the potential of online newspaper articles in the development of natural hazard inventories with our approach from the web sources as text data to map by leveraging the capabilities of web scraping, NLP, and mapping techniques.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share

Journal article(s) based on this preprint

21 Jul 2025
An automated approach for developing geohazard inventories using news: integrating natural language processing (NLP), machine learning, and mapping
Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm
Nat. Hazards Earth Syst. Sci., 25, 2421–2435, https://doi.org/10.5194/nhess-25-2421-2025,https://doi.org/10.5194/nhess-25-2421-2025, 2025
Short summary
Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-7', Anonymous Referee #1, 13 Feb 2025
    • AC1: 'Reply on RC1', Aydogan Avcioglu, 03 Mar 2025
      • RC3: 'Reply on AC1', Anonymous Referee #1, 04 Mar 2025
        • AC3: 'Reply on RC3', Aydogan Avcioglu, 05 Mar 2025
  • RC2: 'Comment on egusphere-2025-7', Anonymous Referee #2, 17 Feb 2025
    • AC2: 'Reply on RC2', Aydogan Avcioglu, 03 Mar 2025

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-7', Anonymous Referee #1, 13 Feb 2025
    • AC1: 'Reply on RC1', Aydogan Avcioglu, 03 Mar 2025
      • RC3: 'Reply on AC1', Anonymous Referee #1, 04 Mar 2025
        • AC3: 'Reply on RC3', Aydogan Avcioglu, 05 Mar 2025
  • RC2: 'Comment on egusphere-2025-7', Anonymous Referee #2, 17 Feb 2025
    • AC2: 'Reply on RC2', Aydogan Avcioglu, 03 Mar 2025

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
ED: Reconsider after major revisions (further review by editor and referees) (21 Mar 2025) by Vassiliki Kotroni
AR by Aydogan Avcioglu on behalf of the Authors (24 Mar 2025)  Author's response   Author's tracked changes   Manuscript 
ED: Referee Nomination & Report Request started (24 Mar 2025) by Vassiliki Kotroni
RR by Anonymous Referee #2 (01 Apr 2025)
RR by Anonymous Referee #3 (20 Apr 2025)
ED: Publish subject to minor revisions (review by editor) (22 Apr 2025) by Vassiliki Kotroni
AR by Aydogan Avcioglu on behalf of the Authors (23 Apr 2025)  Author's response   Author's tracked changes   Manuscript 
ED: Publish as is (24 Apr 2025) by Vassiliki Kotroni
AR by Aydogan Avcioglu on behalf of the Authors (24 Apr 2025)  Manuscript 

Journal article(s) based on this preprint

21 Jul 2025
An automated approach for developing geohazard inventories using news: integrating natural language processing (NLP), machine learning, and mapping
Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm
Nat. Hazards Earth Syst. Sci., 25, 2421–2435, https://doi.org/10.5194/nhess-25-2421-2025,https://doi.org/10.5194/nhess-25-2421-2025, 2025
Short summary
Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Model code and software

tr-news-scraper: Scrape Turkish news articles Ogün Demir and Aydoğan Avcıoğlu https://github.com/demirogun/tr-news-scraper

Aydoğan Avcıoğlu, Ogün Demir, and Tolga Görüm

Viewed

Total article views: 519 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
279 221 19 519 48 13 28
  • HTML: 279
  • PDF: 221
  • XML: 19
  • Total: 519
  • Supplement: 48
  • BibTeX: 13
  • EndNote: 28
Views and downloads (calculated since 21 Jan 2025)
Cumulative views and downloads (calculated since 21 Jan 2025)

Viewed (geographical distribution)

Total article views: 517 (including HTML, PDF, and XML) Thereof 517 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 24 Jul 2025
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Here we demonstrate an approach for the development of inventories from internet sources to geolocalized geohazard incidents. We created a tool that autonomously gets news, processes it using NLP and machine learning, and maps using Open Street Map. Consequently, we present spatiotemporal inventories for geohazards resulting in a total of 13940 incidents between 1997 and 2023 in Türkiye. Our alternative and easy-to-implement development inventory method aids geohazard management and resilience.
Share