the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Learning predictable and informative dynamical drivers of extreme precipitation using variational autoencoders
Abstract. Large-scale atmospheric dynamics modulate the occurrence of extreme precipitation events and provide sources of predictability of these events on timescales ranging from days to decades. In the midlatitudes, these dynamical drivers are frequently represented as discrete, persistent and recurrent circulation regimes. However, available methods identify circulation regimes which are either predictable but not necessarily informative of the relevant local-scale impact studied, or targeted to a local-scale impact but no longer as predictable. In this paper, we introduce a generative machine learning method based on variational autoencoders for identifying probabilistic circulation regimes targeted to spatial patterns of precipitation. The method, CMM-VAE, combines targeted dimensionality reduction and probabilistic clustering in a coherent statistical model and extends a previous architecture published by the authors to allow for categorical target variables. We investigate the trade-off between regime informativeness of local precipitation extremes and predictability of the regimes at subseasonal lead times. In an application to study drivers of extreme precipitation over Morocco, we find that the targeted CMM-VAE regimes are more informative of the impact variable of interest, compared to two well-established linear approaches, while maintaining the predictability of conventional non-targeted circulation regimes in subseasonal hindcasts, hence resolving the trade-off identified in previous studies. Furthermore, the targeted regimes and their predictability are physically interpretable in terms of known subseasonal teleconnections relevant to the region, the Madden-Julian Oscillation and variability of the stratospheric polar vortex. The proposed method therefore allows to identify predictable, interpretable and locally relevant representations of regional dynamical drivers given a target variable of interest. These results highlight the potential of the method for a variety of applications, ranging from subseasonal forecasting to attribution and statistical downscaling.
- Preprint
(10739 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2024-4115', Joshua Dorrington, 24 Feb 2025
-
RC2: 'Comment on egusphere-2024-4115', Anonymous Referee #2, 13 Apr 2025
The paper presents a novel generative machine learning- based variational autoencoder method (CMM-VAE) designed to identify atmospheric circulation regimes associated with precipitation patterns over Morocco. The study explores the trade-off between information content of regimes of local precipitation extremes and their sub-seasonal predictability. The proposed CMM-VAE approach is compared against more traditional linear methods (PCA+k-means and CCA+k-means), demonstrating that the autoencoder-based method retains more information about precipitation while preserving the predictability offered by conventional techniques. Furthermore, the analysis examines connections with established subseasonal teleconnections, such as the MJO and stratospheric vortex variability.
Overall, this is an interesting and well-structures study that illustrates the potential of the CMM-VAE framework in identifying circulation regimes with regional impact – an important goal in regime decomposition research. The paper is clearly written, the figures are of high quality, and the results are compelling. However, I have several specific comments and questions that I recommend the authors address before the paper is suitable for publication.
Title:
The title includes the term “informative”, which in the manuscript is given a technical definition based on entropy. However, the term also carries a more general, informal connotation. I suggest reconsidering the phrasing of the title to avoid ambiguity and ensure the readers do not misinterpret the intended meaning.
Abstract:
L3: Rossby wave packages can also act as drivers of extreme rainfall, but they are not circulation regimes per se. Consider rephrasing or clarifying.
Which large-scale drivers are predictable but not informative?
Is the physical interpretability an inherent outcome of the method, or is it more a feature of this specific application? Could other methods also yield interpretable regimes?
Introduction:
L46: The claim that regional regimes do not modulate local extremes is not fully convincing. Two studies alone may not be sufficient to generalise this conclusion. Is it plausible that appropriate downscaling techniques (e.g., Baker et al., 2018) could reveal such relationships? Please elaborate or qualify the statement.
L75: Specify which mechanisms in terms of distinct patterns are not captured in the regimes?
L90: Clarify the comparative advantage over Spuler et al (2024a). How does using categorical target variables enable the use of CHIRPS rather than reanalysis precipitation? Also, explain what “categorical variables” mean in this context.
Data and Methods:
Fig 2: Please explain the vertical axis in panel b more clearly. What does it represent?
Could Fig 2 also indicate the subregions or district-level divisions referenced in Table 1, especially for the CCA+k-means method?
L154: When citing “the eofs package (Dawson, 2016)”, a brief description would be helpful.
L164: Explain the role of the ridge regularization parameter
Predictability metrics:
L213: The stated start dates of 22/03 likely refers to 22/02, as forecasts for April fall outside the defined extended winter period Nov- March. Please correct if this is a typo.
L225: An ensemble of 11 members is relatively small, potentially affecting probability estimates and skill scores. Have you considered using fair scores to mitigate bias? How might ensemble size impact the results? Which of the findings are robust in light of this limitation?
Results:
L335: Given the confidence intervals, the difference in BSS drop-off between PCA and CMM-VAE (17 vs 19 days) is likely not statistically significant. The text should acknowledge that their performance of BSS and ROC AUC is generally similar, with only CCA clearly underperforming.
L340: Clarify what is meant by “slightly” – does this imply that differences are statistically insignificant?
L361: Sentence structure appears incomplete; a verb may be missing.
L389: While conditional entropy is clearly lowest for CMM-VAE at longer leads, the mutual information metrics shows less pronounced differences, given the uncertainty. Please mention this and consider discussing potential reasons for the divergence between mutual information and entropy metrics for PCA vs CMM-VAE at long lead times.
Discussion and Conclusions:
Why does the targeted CCA+k-means method underperform relative to the non-targeted PCA+k-means? Please provide a hypothesis or possible explanation.
Could your conclusions be sensitive to the chosen target domain size? A brief discussion on this would be valuable.
L437: While you note that the study disentangles the role of the two drivers influencing Moroccan precipitation, can you also infer any potential amplifying effects when both drivers are active? Even if a detailed quantification is beyond the scope here, some speculative discussion would enhance the conclusions.
Reference:
Baker, L., L. Shaffrey & A. Scaife (2018). Improved seasonal prediction of UK regional precipitation using atmospheric circulation. Int. J. Climatology, 38, e347-e453.
Citation: https://doi.org/10.5194/egusphere-2024-4115-RC2 - AC1: 'Response to reviewer comments', Fiona Spuler, 11 May 2025
Data sets
Data for 'Learning predictable and informative dynamical drivers of extreme precipitation using variational autoencoders' Fiona Spuler https://doi.org/10.5281/zenodo.14534651
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
293 | 63 | 10 | 366 | 8 | 9 |
- HTML: 293
- PDF: 63
- XML: 10
- Total: 366
- BibTeX: 8
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1