An Ensemble Machine Learning Method to Retrieve Aerosol Parameters from Ground-based Sun-sky Photometer Measurements
Abstract. Ground-based Sun-sky photometers have been widely used to measure aerosol optical and microphysical properties, yet the conventional numerical inversion schemes are often computationally expensive. In this study, we developed an explainable Ensemble Machine Learning (EML) model that simultaneously retrieves aerosol single scattering albedo (SSA), scattering asymmetry parameter (g), effective radius (reff), and fine-mode fraction (FMF) from direct and diffuse solar radiation measurements, with feature importance quantified using SHapley Additive exPlanations (SHAP). The EML model was trained and validated on a dataset of 110,000 samples simulated using the T-matrix particle scattering model and the VLIDORT radiative transfer model, encompassing diverse aerosol, atmospheric, and surface conditions. The algorithm demonstrated robustness through ten-fold cross validation, achieving correlation coefficients of 0.94, 0.95, 0.92, and 0.90 for SSA, g, reff, and FMF on the validation set, respectively. SHAP-based feature importance analysis confirmed the physical interpretability of the model, highlighting its effective use of multi-band radiance information and the stronger dependence of SSA retrieval on aerosol optical depth (AOD) relative to g and reff. Retrieval uncertainties estimated from repeated noise perturbation experiments were 0.03 for SSA, 0.02 for g, 0.08 for reff, and 0.09 for FMF. Applied to 132,067 sets of raw photometer measurements, the EML-based retrieval produced forward radiance fitting residuals comparable to those of the AERONET official inversion products. Moreover, compared with numerical algorithms, the EML model eliminates the need for a priori assumptions and smoothness constraints, while improving computational efficiency by more than five orders of magnitude.