Assessment of Aerosol Iron Solubility using Global Dataset, Part II: Machine Learning and Deep Neural Network Coupled with SHapley Additive exPlanation Combined with Independent Component Analysis (SHAP-ICA)
Abstract. The supply of dissolved iron (d-Fe) can enhance marine CO2 fixation. Aerosols are one source of d-Fe to the ocean surface, but aerosol iron solubility (Fesol%) depends on emission sources and atmospheric alteration processes that remain poorly reproduced by global climate and chemical transport models. Although recent advances in machine and deep learning models can capture nonlinear relationships in observational datasets, applications to environmental samples are still limited and approaches for improving interpretability require further development. This study trained XGBoost and a deep neural network (DNN) using East Asian aerosol data and tested whether Fesol% and d-Fe concentrations in marine aerosols can be reproduced. The effects of individual features on Fesol% and d-Fe were quantified using SHapley Additive exPlanations (SHAP), and independent component analysis (ICA) was applied to SHAP values to extract independent components representing dominant controlling processes of Fesol%. East Asian Fesol% was reproduced well by both XGBoost and DNN. For marine aerosols, higher reproducibility was achieved by the DNN than by XGBoost, likely because deeper relationships among features can be learned. SHAP indicated that variability in Fesol% and d-Fe is primarily driven by chemical alteration of Fe in mineral dust and anthropogenic aerosols. ICA further suggested that additional processes, including heavy oil combustion, influence a subset of samples. Spatial variations in process contributions were visualized by mapping the influence of each independent component. This DNN-based framework can improve interpretation of both current results and future observational datasets.