Evaluation of nine gridded daily weather reconstructions for the European heatwave summer of 1807
Abstract. Recent research of early instrumental measurements combined with numerical-statistical techniques has contributed to global atmospheric reanalysis as well as regional products that cover pre-1850 weather. The advent of machine learning (ML) raises the question of how well we can reconstruct weather from the distant past using both established and emerging approaches. Here, we evaluate nine such approaches to reproduce the daily weather during Europe's hot summer of 1807. The datasets examined include the Twentieth Century Reanalysis (20CR) and enhanced versions (via additional assimilation, dynamical downscaling), an analog resampling product, as well as ML reconstructions that use neural networks (along with video-inpainting methods or variational auto-encoders). Validation is based on early station measurements, documentary information, statistical diagnostics, and a semi-quantitative assessment of atmospheric flow.
We find that the summer of 1807 can be considered a prototype, pre-industrial heatwave summer, with three extremely hot episodes and maximum temperatures exceeding 30 – 35 °C in Central Europe. Most approaches achieve mean correlations (anomalies form the seasonal cycle) above 0.75 for temperature and centered Root Mean Square Error values below 3 °C, though variability tends to be underestimated. This speaks for overall robust reconstructions given the distant past and scarce underlying weather information. Skill scores for almost all reconstructions indicate that they are reliable in discriminating very hot from cooler (high-pressure from lower-pressure) conditions. Improved spatial skill with respect to 20CR for stations in Central and Northeastern Europe can be attributed to the increased influence of newly ingested weather information on the atmospheric reconstructions.
The atmospheric flow-aware approaches reproduce plausible large-scale features such as ridges of high pressure and associated belts of hot air, whereas data-driven ML approaches excel statistically in replicating station variability but often produce less realistic circulation patterns. The analog method yields balanced but less intense reconstructions, and the high-resolution dataset aligns best with heat intensities in the Alpine region.
Such trade-offs leave users choose between computational efficiency, statistical performance, and physically coherent circulation. Future developments need to address uncertainties in the early measurements. In turn, the analyses also emphasize the value of high-quality early weather records to produce and validate gridded reconstructions.