Multi-Machine Learning Approaches to Modeling Small-Scale Source Attribution of Ozone Formation
Abstract. Accurate source apportionment of ozone (O3) precursors is crucial for implementing scientific O3 control strategies. While traditional approaches rely on complex calculations of volatile organic compounds (VOCs) and meteorological parameters, their applicability in real-time scenarios remains limited. Taking the Shanghai chemical industrial park as an example, we propose a novel two-step machine learning (ML) approach that integrates positive matrix factorization (PMF) with other ML methods to systematically quantify the spatiotemporal impacts of VOCs on O3 formation. Analysis of high- frequency data from 12 VOC monitoring stations (2021–2023) using six ML models revealed XGBoost as the optimal predictor (R2=0.644) for local VOC emissions. By combining SHapley Additive exPlanations (SHAP) with ML modeling, we precisely evaluated VOC-O3 relationships and located emission sources. Results identified solvent use (SU) and fuel evaporation (FE) as primary O3 formation contributors, followed by combustion sources (CS) and vehicle emissions (VE). PMF analysis further distinguished six VOC sources: petrochemical processes (PP), FE, CS, SU, polymer fabrication (PF), and VE. Temporal analysis revealed seasonal variations, with CS and FE dominant in spring/summer, while PF prevailed in autumn. This innovative framework demonstrates exceptional capability for rapid source identification and precise contribution quantification, establishing a new paradigm for high-resolution O3 source apportionment.