Preprints
https://doi.org/10.5194/egusphere-2025-4936
https://doi.org/10.5194/egusphere-2025-4936
15 Oct 2025
 | 15 Oct 2025
Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

An Ensemble Machine Learning Method to Retrieve Aerosol Parameters from Ground-based Sun-sky Photometer Measurements

Qiurui Li, Zhongxia Sun, Meijing Liu, Huizheng Che, Yu Zheng, and Jing Li

Abstract. Ground-based Sun-sky photometers have been widely used to measure aerosol optical and microphysical properties, yet the conventional numerical inversion schemes are often computationally expensive. In this study, we developed an explainable Ensemble Machine Learning (EML) model that simultaneously retrieves aerosol single scattering albedo (SSA), scattering asymmetry parameter (g), effective radius (reff), and fine-mode fraction (FMF) from direct and diffuse solar radiation measurements, with feature importance quantified using SHapley Additive exPlanations (SHAP). The EML model was trained and validated on a dataset of 110,000 samples simulated using the T-matrix particle scattering model and the VLIDORT radiative transfer model, encompassing diverse aerosol, atmospheric, and surface conditions. The algorithm demonstrated robustness through ten-fold cross validation, achieving correlation coefficients of 0.94, 0.95, 0.92, and 0.90 for SSA, g, reff, and FMF on the validation set, respectively. SHAP-based feature importance analysis confirmed the physical interpretability of the model, highlighting its effective use of multi-band radiance information and the stronger dependence of SSA retrieval on aerosol optical depth (AOD) relative to g and reff. Retrieval uncertainties estimated from repeated noise perturbation experiments were 0.03 for SSA, 0.02 for g, 0.08 for reff, and 0.09 for FMF. Applied to 132,067 sets of raw photometer measurements, the EML-based retrieval produced forward radiance fitting residuals comparable to those of the AERONET official inversion products. Moreover, compared with numerical algorithms, the EML model eliminates the need for a priori assumptions and smoothness constraints, while improving computational efficiency by more than five orders of magnitude.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Qiurui Li, Zhongxia Sun, Meijing Liu, Huizheng Che, Yu Zheng, and Jing Li

Status: open (until 20 Nov 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Qiurui Li, Zhongxia Sun, Meijing Liu, Huizheng Che, Yu Zheng, and Jing Li
Qiurui Li, Zhongxia Sun, Meijing Liu, Huizheng Che, Yu Zheng, and Jing Li
Metrics will be available soon.
Latest update: 15 Oct 2025
Download
Short summary
We present a fast, interpretable machine learning method to retrieve key aerosol parameters from ground-based Sun-sky photometer measurements. Trained on simulated data covering diverse aerosol and atmospheric conditions, ensuring robustness and physical consistency. Applied to real observations, it agrees well with AERONET products and reduces computation time by orders of magnitude, offering a practical tool for monitoring aerosols and their effects on air quality and climate.
Share