the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Wikimpacts 1.0: A new global climate impact database based on automated information extraction from Wikipedia
Abstract. Climate extremes like storms, heatwaves, wildfires, droughts and floods significantly threaten society and ecosystems. However, comprehensive data on the socio-economic impacts of climate extremes remains limited. Here we present Wikimpacts 1.0, a global climate impact database built by extracting information from Wikipedia using natural language processing. Our method identifies relevant articles, extracts the information using GPT4o, post-processes the information and consolidates the database. Impact data is stored at the event, national, and sub-national levels, covering 2,928 events from 1034 to 2024, with 20,186 national and 36,394 sub-national entries. The database shows low error scores (range from 0 to 1) for event-level information like timing (0.05), deaths (0.03), and economic damage (0.12), and slightly higher error scores for injuries (0.21), homelessness (0.25), displacement (0.29), and damaged buildings (0.28) compared to manually annotated data from 156 events. Wikimpacts 1.0 provides broader impact coverage on storms than EM-DAT at the sub-national level. In comparing impact values, 38 out of 234 matched events have identical data for deaths, and 7 of 94 for injuries. However, there are notable discrepancies in information on homelessness and damage. Our public database highlights the potential of natural language processing to complement existing impact datasets and to provide robust information on climate impacts.
- Preprint
(32354 KB) - Metadata XML
-
Supplement
(639 KB) - BibTeX
- EndNote
Status: open (until 10 Dec 2025)