the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Understanding European Heatwaves with Variational Autoencoders
Abstract. Understanding the dynamics of heatwaves is critical for accurate climate risk assessment. Traditional definitions, based solely on surface temperature thresholds, often overlook the complex, multivariate nature of heatwaves. This study uses a spatiotemporal Variational Autoencoder (VAE), an unsupervised machine learning method, to identify compact representations of multivariate, year-round heatwave patterns. Focusing on key atmospheric variables (e.g., circulation, humidity, temperature, geopotential height, cloud cover, stream function, and radiation), we extract eleven-day heatwave samples from ERA5 reanalysis data over the North Atlantic, centered on near-surface temperature extremes in Western Europe. The VAE was trained on data from 1941–1990 and evaluated using 2001–2022 samples, and effectively clustered heatwave events by season, revealing known dynamical regimes such as summer blocking highs and winter omega blocks. The VAE model captures the interplay and temporal evolution between different atmospheric variables in their contributions to heatwaves over Western Europe. Notably, recent summer heatwaves form a distinct cluster within the latent space, pointing to a shift in atmospheric dynamics consistent with climate change. Composite anomaly maps further show coherent pre-onset patterns across variables. These results demonstrate the potential of VAEs to uncover meaningful structure in complex heatwave dynamics from data, and promise advances in understanding heatwaves.
- Preprint
(13282 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2460', Anonymous Referee #1, 11 Jul 2025
- AC1: 'Reply on RC1', Aytaç Paçal, 05 Sep 2025
-
RC2: 'Comment on egusphere-2025-2460', Anonymous Referee #2, 14 Jul 2025
Review of “Understanding European Heatwaves with Variational Autoencoders”
This research analyses heatwaves in western Europe, during the entire year, from a spatio-temporal multi-variate perspective. To this end the authors use ML and DL techniques. They find four clusters of heatwave patterns throughout the entire year, with dynamics consistent with previous literature. Notably, they use ERA5 and extended the variables to characterize heatwaves.
While this avenue of work (Deep Learning for heatwave understanding) is very interesting, both from the methodological and climate-scientific perspective, I have my concerns regarding the novelty of the presented research. From the current work it seems that most of the methods are one-to-one copied from Happé et al. (2024), including the heatwave selection method, VAE, and the GMM clustering, including their respective (hyper)parameters. It needs to be clear throughout the entire manuscript what the novelty is of the current work and what has been reproduced or based on previous studies. Currently, the authors cite Happé et al. (2024) in some places but they do not contextualize their work as an application of the framework developed by Happé et al. (2024). If the authors see their work not as an application but rather an extension of the framework, additional developments need to be made to the current AI framework. Generally, the Abstract, Introduction, and Discussion & Conclusion need to properly reflect which part of this research is novel and which follow the framework from Happé et al. (2024). Please find below more detailed comments.
Major points of discussion
- Introduction, L60-70 Here it reads as if this is the first study that uses the framework of VAE+Clustering to characterize climate extremes (especially line 68-70). Since this is not the case, it needs to be framed clearly what the novelty is of this work with respect to previous works, and how this study is either an application or extension of previous works. Please also have a look at:
- Spuler FR, Kretschmer M, Kovalchuk Y, Balmaseda MA, Shepherd TG. Identifying probabilistic weather regimes targeted to a local-scale impact variable. Environmental Data Science. 2024;3:e25. doi:10.1017/eds.2024.29
- Methods 2.1; Why do the authors take this exact grid area? Or the 15d moving window? Crucially, why do the authors take a grid of 0.7 degrees spatial resolution if ERA5 has 0.25? If these parameters are chosen because those were used in Happé et al. (2024), that needs to be stated as such. Happé et al worked with 0.7 degrees because it is the native resolution of EC-Earth, and hence appropriate for that study. It is unclear why one would work with that resolution for ERA5, instate of the native 0.25 degrees.
- Heatwave identification – the authors take the “1941-1980 daily” percentile, which will inherently cause more heatwaves in the last 4 decades, as thermodynamics lead to an increase in temperature everywhere. This is important to consider when studying dynamics of heat extremes – how meaningful are the dynamical types that are then found? Furthermore, the test-set also consists of heatwaves from the last two decades – how do the authors deal with this non-stationarity?
- Methods 2.2; Indeed, here the authors mention following the methods proposed by Happé et al. (2024). It would benefit the entire methods section if it would be very clear which parts of the methodology deviate from Happé et al. (2024).
- Methods 2.3; As these methods as well follow Happé et al. (2024), it would be transparent to mention something like ‘following Happé et al. (2024) we use a 3d VAE …”. Then continue explaining where your methods deviate and why the authors made those choices (e.g. improvement of training/framework/…). For example, the use of t-SNE is also done in Happé et al. (2024), yet this is not mentioned in your section 153-160). Additionally, the choice of 100 closest heatwaves to each centroid is also not cited as following Happé et al. (2024) – L161.
- Methods 2.3 the r2 scores; As this section talks about reconstruction errors, I would suggest this section fits better in the result. Apart from that – are these r2 scores based on a latent dimension size 128? Is this chosen because of Happé et al. (2024)? Why didn’t the authors take a higher latent space size, since the dimensions went from 2 to 9 variables and from 5 to 11 days? The latent dimension size should be properly justified and tested. Furthermore, I have my concerns with these low r2 scores and would be curious to see the reconstructed maps for these variables. What happens if one goes to higher latent dimension sizes? Lastly, table 2 only shows the r2 scores for the test-subset – my suggestion would be to also include the scores of the train set; to show how well the authors’ model is able to generalize. I’m especially curious to this last point, as Happé et al. (2024) showed that data augmentation was needed to avoid overfitting.
- Results; I’m curious as to why the authors apply PCA to go down to 50 components in the latent space – why not use PCA directly on the heatwave data? Or why not go down to 50 dimensions in the VAE latent space? What happens to the r2 scores after doing this step?
- Results; I find it interesting that the authors find 4 clusters that correspond with each season. What does this mean for interpretation – did the latent dimensions clusters actually find dynamically different heatwaves or rather the dynamics of the different seasons? Would it be possible to plot composite maps within a cluster of summer-only and winter-only heatwaves? Perhaps that could show us whether these patterns are indeed found year-round or whether you find the seasonal dynamics. This would also underpin your speculative (“hints”) conclusion in L383-385 better. Answering this is not trivial, as dynamics leading to heatwaves in summer (e.g. blocking) do not necessarily lead to warm anomalies in winter. Rather blocking like systems cause cold anomalies in winter. I find it therefore interesting that cluster #1 is a blocking pattern in winter, while the authors compare this cluster to UK High pattern in Happé et al. (2024) and the omega block in Rouges et al. (2023) which occur in summer [L328-241]. This as the authors show in Figure 4 that there are 0 summer heatwaves part of their cluster #1. Could it be that the fact that the authors find this pattern in winter is merely a result of the non-stationarity of the dataset? Could the authors explain this more?
- In the Discussion the authors state that the VAE/GMM is sensitive to hyperparameters; it would be good to see some of these experiments in this research. Especially the latent dimension size is essential for this research as ensuring that the latent representations are representative of your heatwave samples is not trivial – otherwise the clusters might be meaningless.
- Discussion & Conclusion; Again, it needs to be contextualized which parts of the framework is based on previous work and which parts are novel. Using phrases such as “We confirm the results from Happé et al. (2024), by showing XYZ.” Or “As opposed to Happé et al. (2024), we do/find XYZ.” This helps guide the reader and highlights the novelty of the authors’ work. E.g. in sentences 356-363, 392-394, and 369-400. It needs to be clear in the conclusion what the main scientific output is of your contribution.
Minor points
- “This trend is projected to continue even at the lowest projected global warming scenario, and the intensity of extremes will increase proportionally with the amount of warming.” L19-20 Is there a reference for this? My understanding was that this is not necessarily a proportional increase.
- The motivation in the introduction seems to cover all types of extreme events and all over the globe, yet the focus of the manuscript is heatwaves over western Europe only.
- In the introduction the authors motivate that heat extremes cause mortality and increased costs, in summer mostly. Then why does the study focus on year-round heatwaves? I think this is important to motivate, as heatwaves in western Europe don’t cause impacts in winter.
- Methods 2.4; is the model trained using r2? Or MSE? Lines 144-152; is this not better fitted in the result section? It is also mentioned here that it is difficult to capture the local surface conditions because of the course spatial resolution, but then why did the authors decide to re-grid from 0.25 to 0.7 degrees in spatial resolution?
- Why do the authors choose MSLP, Z500, and STREAM250? Rather than different levels of Z or stream?
- Figure 4. If I understand correctly these samples are from all year round? Is the t-SNE trained only on train-data or on all samples?
- Section 3.4 can use some more literature comparison, especially when discussing the dynamics leading to heatwaves (the causal pathways).
- Some sentences need rephrasing, for example:
- L195-297 “they show a negative tendency” --> positive? Tendency towards what?
- L342-344 “the key difference in their study …” --> our study? Now it reads as if they (Happé et al.) used 11d multivariate data instead of you.
Citation: https://doi.org/10.5194/egusphere-2025-2460-RC2 -
AC2: 'Reply on RC2', Aytaç Paçal, 05 Sep 2025
We are grateful for the reviewer’s constructive comments and suggestions. We carefully addressed each point in our response letter, with particular attention to clarifying the novelty of our work. Please find our detailed responses in the attached PDF.
- Introduction, L60-70 Here it reads as if this is the first study that uses the framework of VAE+Clustering to characterize climate extremes (especially line 68-70). Since this is not the case, it needs to be framed clearly what the novelty is of this work with respect to previous works, and how this study is either an application or extension of previous works. Please also have a look at:
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
636 | 80 | 13 | 729 | 13 | 28 |
- HTML: 636
- PDF: 80
- XML: 13
- Total: 729
- BibTeX: 13
- EndNote: 28
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Summary:
This work uses a non-linear dimensionality reduction method to study heat wave characteristics in western Europe. More specifically, they train a 3D variational autoencoder to reconstruct 11-day windows of multiple atmospheric variables around historical heat wave onset dates. Afterwards, the trained VAE is used to embed heat waves from a test period temporally after the training period. Then, the embeddings are clustered, and a shift in frequency in these clusters between training and testing is observed.
Strengths:
Major comments:
Minor comments: