From manual classification to large language models: assessing the quality and consistency of historical convective event records
Abstract. Historical text sources represent a central, yet methodologically challenging basis for the reconstruction of convective weather events. This study examines the extent to which historical reports on thunderstorms and hailstorms contain reliable climatological information, despite heterogeneous sources, varying degrees of detail and linguistic diversity. Based on a corpus prepared using source criticism, qualitative descriptions are converted into structured evidence levels and intensity classes and analysed using statistical methods and a multilingual BERT language model.
The reconstructed time series show a distinctly stable seasonal signal with a dominant summer maximum that occurs independently of fluctuations in source density and is consistent both in the overall series and in a dense observation window. A comparison with modern observation data from the German Weather Service and with independent historical reconstructions shows a high degree of agreement in seasonal patterns despite different survey methods and time periods. Analysis of the intensity classes also shows that historical sources do not primarily document extreme events, but rather reflect a physically plausible ranking of event strengths.
The results of the automated classification prove that the language model reliably reproduces seasonal and intensity-related patterns and implicitly captures source-specific reporting patterns without levelling them. Overall, the study shows that AI-supported methods can extract robust climatological information from historical texts when processed using rigorous methods, thus opening up new perspectives for quantitative historical climate research.