Forecasting European temperature-related mortality in Summer 2024: data-driven vs physics-based forecast approaches

Holmberg, Emma; Olivetti, Leonardo

doi:10.5194/egusphere-2026-1144

Preprints

https://doi.org/10.5194/egusphere-2026-1144

Preprints

11 Mar 2026

| 11 Mar 2026

Forecasting European temperature-related mortality in Summer 2024: data-driven vs physics-based forecast approaches

Emma Holmberg and Leonardo Olivetti

Abstract. Heat has emerged as a major public health concern. Over 62,000 heat-related deaths were estimated to have occurred during the European summer of 2024, exemplifying the pressing need to develop effective early warning systems. Such systems depend critically on the quality of the underlying forecasts, and recent work has focused on developing impact-based forecasts for heat-related mortality, which provide impact-oriented information. To date, heat-related mortality forecasts have been based on the output of numerical weather prediction models, or physics-based forecasts. The field of weather forecasting is undergoing a rapid transformation with the advent of skillful data-driven forecasts. This study compares European temperature-related mortality forecasts for summer 2024 based on physics-based weather forecasts with those based on data-driven weather forecasts. Our results highlight that both the physics-based and data-driven forecasts systematically underestimate temperature-related mortality, more pronouncedly so in the latter. Both types of forecasts appear sensitive to forecast errors at hot temperatures, due to the non-linear relationship between temperature and mortality. Nevertheless, temperature-related mortality forecasts based on data-driven weather forecasts appear to be a promising alternative to traditional physics-based weather forecasts, and targeted improvement of the representation of hot temperatures through bias correction or adjustment of the loss function to give greater weighting to hot temperatures would be beneficial for temperature-related mortality forecasting. We suggest the application of this approach to both data-driven and physics-based forecast ensembles as an important next step in the continued development of informative, impact-oriented forecasts.

Received: 27 Feb 2026 – Discussion started: 11 Mar 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Emma Holmberg and Leonardo Olivetti

Status: final response (author comments only)

RC1:
'Comment on egusphere-2026-1144', Anonymous Referee #1, 13 Mar 2026
The manuscript addresses an important and timely topic, namely the development of impact-based forecasts for heat-related mortality. Overall, the manuscript clearly achieves its stated objectives and is well written, with a logical structure that makes the analysis easy to follow. I suggest some minor revisions, mainly related to clarification of certain aspects, as outlined below.
The introduction clearly provides a clear overview of the motivation and research gap, it may benefit from a brief explanation of what is meant by “data-driven” versus “physics-based” weather forecasts. Since the journal attracts an interdisciplinary audience, including researchers working on climate impacts and hazards, a short clarification of these approaches could improve accessibility for readers who may be less familiar with weather forecasting.

Section2.1: the source of the reanalysis is not specified when first introduced (line 55), and only later in the paragraph it becomes clear that ERA5 is used. For clarity, the specific dataset (ERA5) could be mentioned at the beginning of the paragraph.

Section2.1: The authors state that the meteorological data were aggregated to daily resolution. It would be useful to clarify the choice of daily mean temperature rather than daily maximum temperature (Tmax), which is commonly used in studies examining heat-related health impacts.

Section2.1: The authors also state:
"We also use 2m temperature from archived forecasts from two different types of weather prediction models, one physical model (IFS HRES cycle 48r1), and one data-driven model (AIFS single v1).” It would be helpful if the authors briefly explained why these specific models were selected.

Section 2.1: The manuscript states that the analysis focuses on a selection of European cities. However, the specific cities included are not clearly listed or illustrated. Providing a table or a map showing their spatial distribution across Europe would improve transparency and help readers understand the geographical area of the analysis.

Section 2.2:
The epidemiological framework relies on exposure–response functions from the multi-city analysis of Masselot et al. (2023), which is based on all-cause mortality data. It would be helpful to state this explicitly, as the estimated impacts therefore correspond to temperature-related mortality across all causes rather than specific cause-of-death categories (e.g., cardiovascular or respiratory mortality which are the most common causes that are used and studied in health imapct studies). A brief clarification regarding this would help.

Section 3: In lines 111–112 the authors state that the continental-scale assessment uses population-weighted averages. However, the population dataset used to compute these weights is not specified. For transparency and reproducibility, the authors could clarify the population data source (dataset and spatial aggregation).

Section 3.2: Figure 6 (“QQ plot of population-weighted mean AF forecast bias vs temperature…”) appears to be either mislabeled or incorrectly described. The figure is referred to as a QQ plot in both the caption and the text (lines 151–159), yet the axes represent temperature and AF forecast bias rather than quantiles of two distributions. Consequently, the reference to deviations from a diagonal line is unclear. The authors may wish to clarify or revise the description of the figure.

Section 3.2: It could also be informative to distinguish heatwave days from typical summer days and assess forecast performance separately, as this may provide additional insight into how well physics-based and data-driven forecasts capture the attributable fraction of heat-related mortality during extreme heat conditions. I understand that this may not be possible in this type which focuses only on the summer of 2024 with limited data, so this is just a suggestion for future work including multi-year analysis.

Section 4: The continental-scale population-weighted analysis provides a useful summary of forecast performance across Europe. However, this approach may also hide some regional differences in forecast skill. Given that temperature–mortality relationships vary substantially across Europe (Masselot et al. (2023)), future work could explore city-level or regional patterns of forecast performance. In addition, examining cause-specific mortality rather than all-cause mortality may help identify where data-driven and physics-based forecasts perform better or worse under different climatic and vulnerability conditions. While this point is briefly mentioned in the discussion, it could benefit from stronger emphasis.
Citation: https://doi.org/10.5194/egusphere-2026-1144-RC1
- AC1: 'Reply on RC1', Emma Holmberg, 16 Jun 2026
  
  Please see the attached file for our response.
  
  Citation: https://doi.org/10.5194/egusphere-2026-1144-AC1
RC2:
'Comment on egusphere-2026-1144', Anonymous Referee #2, 28 Apr 2026
The paper addresses an important current issue regarding the applicability of AI-generated weather forecasts for impact-oriented predictions of mortality.

The analysis is presented in a clear and structured manner throughout, making it easy to follow the line of reasoning.

Overall, the authors achieve the paper’s objectives with their presentation.

Nevertheless, I recommend making a few minor adjustments to the technical and visual presentation, primarily to facilitate the interpretation of the results.
Both the title and the abstract give the impression of a systematic comparison of data-driven and physics-based forecasts. However, the paper compares only two deterministic models for one summer, so the generalisability of the findings is not obvious. In the interest of transparency, it would therefore be advisable to address this limitation already in the abstract.

The authors may add their reasons for the choice of the two particular models.

It might be helpful for the reader if the ordering in section 2.1 would be changed such that the used models are mentioned before the specifics are explained.

There is no mentioning of the population data that is used for the weighting. This should be added to increase transparency. Additionally, a map or a list of the used cities would help the reader to assess the geographical distribution of the data.

Since several results are visually on the edge of significance, a more thorough use of statistical methods could be helpful for the interpretation of the stated results. In particular this holds for the following points:
The means of figures 1c-f are compared (lines 117-118) without stating the uncertainty of the calculated means, which should be accessible from the mentioned bootstrapping procedure. The results of a statistical test on the significance of the difference would then help to interpret the result.

The same is true for the interpretation of figure 7c-d where the significance of the difference would improve the reported observations in line 167-170.

The comparison in line 156 mentions a Kolmogorov-Smirnov test whose results should be included for transparency.

The performed fits in figure A4 and A5 are subject to uncertainties that could visually be shown by plotting the confidence band. Since the results of figure A4 are a central result discussed in section 4 (lines 175-179), the significance of the difference in the fits, especially at high temperatures, is important for the interpretation of the stated conclusion. Also testing whether the fitted slope differs significantly from zero would be an interesting information.

Lines 151-159 describe the results of figure 6, which compares the distribution of the daily averaged AF forecast biases with the distribution of daily averaged temperatures in form of a QQ plot. However I do not see what information can be obtained from such a comparison, which is also not used in any further discussion. The authors should clarify why these two distributions are expected to be related and what the intended interpretation is, or replace the figure with the ones in figure A4 (see next point).

The central result that the forecasts underestimate AF at hot temperatures (lines 175-179) is visible only in figure A4, which is relegated to the appendix. This should at least be mentioned via a reference and/or by moving the figure to the main part of the paper.

Several plots contain inconsistencies which should be corrected.
The values of the mean bias in figures 1c-f and figures A2a-d are different, however they should be equal. Further, the y axes in figures A2a-d do not show the whole data range which is from -0.4 to 0.4 as shown in figures 1c-f.

The text explanation to figure 6 (line 152) mentions a deviation from the diagonal, however, the plot shows a horizontal line. If this really is a QQ-plot, a diagonal line should be added.

Figure A3 gives mean MAE values, which I assume are the mean values over all temperatures, however, from the shown data it is obvious that these values do not match the mean of the shown data. Is there data missing in the plot? If so, the axes should be changed such that all data is included.

The data in figures A4a-d should be the same as in figures 3c-f. Therefore, the magnitude of the values should match, which is not the case. For example figure A4d has values between -0.1 and 0.25 while figure 3f has values between -0.05 and 0.12.

Finally, the following list contains several purely technical issues.
In line 93-94 the order should be reversed to "difference between forecast and reference" to match the given formula 4.

Formula 4 uses j as index while the text and the following formula use i. This should be aligned.

Figures 1, 3, A2 and A4 describes the mean of the time series as "mean error", a name which is already used as alternative to "bias". It should thus be "mean mean error" or better "mean bias".

The caption of figure 1 misses a "line" in the last sentence.

The caption of figure 3 has a wrong sub-figure labelling a-d -> c-f

Figure 5 has different y-scales for the four plots. For a better comparison these should be aligned.

For a better comparison the histograms in figure A1 should use the same width of the bins not the same number of bins for both forecasts.

The caption to figure A2 mentions a time series, but it is a bias-temperature scatter-plot.
Citation: https://doi.org/10.5194/egusphere-2026-1144-RC2
- AC2: 'Reply on RC2', Emma Holmberg, 16 Jun 2026
  
  Please see the attached file for our response.
  
  Citation: https://doi.org/10.5194/egusphere-2026-1144-AC2

Emma Holmberg and Leonardo Olivetti

Viewed

Total article views: 1,671 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,021	559	91	1,671	65	62

HTML: 1,021
PDF: 559
XML: 91
Total: 1,671
BibTeX: 65
EndNote: 62

Views and downloads (calculated since 11 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	847	419	80	1,346
Apr 2026	84	77	3	164
May 2026	59	36	2	97
Jun 2026	10	8	3	21
Jul 2026	21	19	3	43

Cumulative views and downloads (calculated since 11 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	847	419	80	1,346
Apr 2026	84	77	3	164
May 2026	59	36	2	97
Jun 2026	10	8	3	21
Jul 2026	21	19	3	43

Viewed (geographical distribution)

Total article views: 1,640 (including HTML, PDF, and XML) Thereof 1,640 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2026

Short summary

The health burden of extreme heat necessitates the development of effective early warning systems. These systems depend critically on the underlying weather forecasts. We compare forecasts of temperature-related mortality based on data-driven weather forecasts with those based on physics-based weather forecasts for summer 2024 in Europe. We find that forecasts based on data-driven weather forecasts could represent a promising avenue for the development of heat-related health impact forecasts.


Total:	0
HTML:	0
PDF:	0
XML:	0