Technical note: Expanding ensemble forecasts with generative AI: application to volcanic clouds
Abstract. Ensemble-based modelling of the atmospheric dispersal of volcanic clouds enables more realistic forecasts by explicitly accounting for uncertainties in eruption source parameters, meteorological data, and systematic errors in transport models. Many ensemble applications, including quantification of forecast uncertainties, data assimilation, or probabilistic hazard assessments, require a large number of members to mitigate sampling errors and to properly capture probability distributions. However, running large ensembles with Volcanic Ash Transport and Dispersal (VATD) models can be computationally demanding, even for high-performance computing clusters. As a result, operational forecasting is typically restricted to smaller ensembles in order to fit time-to-solution requirements. In contrast, generative AI models can produce large volumes of physically-consistent data samples with minimal computational cost. In this work, a convolutional Variational AutoEncoder (VAE) is trained on an ensemble of 256 forecasts simulated with the FALL3D model and subsequently used to generate larger ensembles, effectively augmenting physics-based ensemble modelling capacity. Ensembles with up to 8192 members were generated nearly instantaneously using the trained neural network, with no reliance on HPC resources. The statistical properties of the expanded ensembles are characterised in detail, and the VAE performance is evaluated against a test dataset composed of 2048 numerical simulations. The VAE-generated ensembles closely approximate the actual (target) probability distribution as well as key sample statistics, such as ensemble mean and spread, with minimal degradation in the evaluation metrics. Finally, we discuss possible future applications of this work, including latent space data assimilation via deep learning.