A framework to holistically investigate processes controlling the aerosol lifecycle using explainable AI techniques
Abstract. General circulation models (GCMs) face significant uncertainties in estimating Earth's radiative budget due to aerosol-cloud interactions (ACI). To improve the representation of ACI in GCMs it is crucial to constrain processes controlling the aerosol lifecycle and the resulting size distribution. This is challenging due to the complexity and number of competing atmospheric processes that interact over large spatial and temporal scales which require untangling to elucidate dominant processes controlling aerosol properties. This study aims to (a) develop a generic explainable AI framework from air-mass history to build an accurate representation of processes controlling aerosol properties, from this, (b) identify key relationships between aerosol processes and their impacts on observed aerosol number concentrations, and (c) provide robust process-based observational constraints to aid in the isolation of GCM structural uncertainties. This is achieved by developing XGBoost regression models to simulate Aitken and accumulation mode number concentrations for receptor surface stations and application of TreeSHAP to identify key processes from explanatory variables describing meteorological and aerosol processes collocated to Lagrangian air-mass trajectories. The fidelity of this framework is demonstrated for the Antarctic station Trollhaugen, situated in a pristine region in which GCMs exhibit significant biases. Aerosol number concentrations at Trollhaugen were shown to be dominated by marine sources as well as transport from the free troposphere. The contribution from aloft dominates aerosol burden of the Aitken mode in the transitions between summer and winter, in contrast to a larger contribution in the summer from local marine sources from transport in the boundary layer.