Preprints
https://doi.org/10.5194/egusphere-2025-4891
https://doi.org/10.5194/egusphere-2025-4891
29 Oct 2025
 | 29 Oct 2025
Status: this preprint is open for discussion and under review for Natural Hazards and Earth System Sciences (NHESS).

Wikimpacts 1.0: A new global climate impact database based on automated information extraction from Wikipedia

Ni Li, Wim Thiery, Shorouq Zahra, Mariana Madruga de Brito, Koffi Worou, Murathan Kurfalı, Seppe Lampe, Paul Muñoz, Clare Flynn, Camila Trigoso, Joakim Nivre, Jakob Zscheischler, and Gabriele Messori

Abstract. Climate extremes like storms, heatwaves, wildfires, droughts and floods significantly threaten society and ecosystems. However, comprehensive data on the socio-economic impacts of climate extremes remains limited. Here we present Wikimpacts 1.0, a global climate impact database built by extracting information from Wikipedia using natural language processing. Our method identifies relevant articles, extracts the information using GPT4o, post-processes the information and consolidates the database. Impact data is stored at the event, national, and sub-national levels, covering 2,928 events from 1034 to 2024, with 20,186 national and 36,394 sub-national entries. The database shows low error scores (range from 0 to 1) for event-level information like timing (0.05), deaths (0.03), and economic damage (0.12), and slightly higher error scores for injuries (0.21), homelessness (0.25), displacement (0.29), and damaged buildings (0.28) compared to manually annotated data from 156 events. Wikimpacts 1.0 provides broader impact coverage on storms than EM-DAT at the sub-national level. In comparing impact values, 38 out of 234 matched events have identical data for deaths, and 7 of 94 for injuries. However, there are notable discrepancies in information on homelessness and damage. Our public database highlights the potential of natural language processing to complement existing impact datasets and to provide robust information on climate impacts.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Ni Li, Wim Thiery, Shorouq Zahra, Mariana Madruga de Brito, Koffi Worou, Murathan Kurfalı, Seppe Lampe, Paul Muñoz, Clare Flynn, Camila Trigoso, Joakim Nivre, Jakob Zscheischler, and Gabriele Messori

Status: open (until 10 Dec 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Ni Li, Wim Thiery, Shorouq Zahra, Mariana Madruga de Brito, Koffi Worou, Murathan Kurfalı, Seppe Lampe, Paul Muñoz, Clare Flynn, Camila Trigoso, Joakim Nivre, Jakob Zscheischler, and Gabriele Messori
Ni Li, Wim Thiery, Shorouq Zahra, Mariana Madruga de Brito, Koffi Worou, Murathan Kurfalı, Seppe Lampe, Paul Muñoz, Clare Flynn, Camila Trigoso, Joakim Nivre, Jakob Zscheischler, and Gabriele Messori
Metrics will be available soon.
Latest update: 29 Oct 2025
Download
Short summary
Climate extremes threaten society and ecosystems. Understanding impacts is critical, despite open databases like EM-DAT and DesInventar, reliable impact data remain scattered across various text sources. Wikimpacts 1.0, using GPT4o, provides comprehensive socio-economic impact data on 2,928 events from 1034 to 2024. It offers broader storm coverage and finer spatial resolution impact data than EM-DAT, showcasing the potential of natural language processing to enhance climate impact datasets.
Share