Learning predictable and informative dynamical drivers of extreme precipitation using variational autoencoders
Abstract. Large-scale atmospheric dynamics modulate the occurrence of extreme precipitation events and provide sources of predictability of these events on timescales ranging from days to decades. In the midlatitudes, these dynamical drivers are frequently represented as discrete, persistent and recurrent circulation regimes. However, available methods identify circulation regimes which are either predictable but not necessarily informative of the relevant local-scale impact studied, or targeted to a local-scale impact but no longer as predictable. In this paper, we introduce a generative machine learning method based on variational autoencoders for identifying probabilistic circulation regimes targeted to spatial patterns of precipitation. The method, CMM-VAE, combines targeted dimensionality reduction and probabilistic clustering in a coherent statistical model and extends a previous architecture published by the authors to allow for categorical target variables. We investigate the trade-off between regime informativeness of local precipitation extremes and predictability of the regimes at subseasonal lead times. In an application to study drivers of extreme precipitation over Morocco, we find that the targeted CMM-VAE regimes are more informative of the impact variable of interest, compared to two well-established linear approaches, while maintaining the predictability of conventional non-targeted circulation regimes in subseasonal hindcasts, hence resolving the trade-off identified in previous studies. Furthermore, the targeted regimes and their predictability are physically interpretable in terms of known subseasonal teleconnections relevant to the region, the Madden-Julian Oscillation and variability of the stratospheric polar vortex. The proposed method therefore allows to identify predictable, interpretable and locally relevant representations of regional dynamical drivers given a target variable of interest. These results highlight the potential of the method for a variety of applications, ranging from subseasonal forecasting to attribution and statistical downscaling.