10 Aug 2023
 | 10 Aug 2023
Status: this preprint is open for discussion.

Diagnosing drivers of PM2.5 simulation biases from meteorology, chemical composition, and emission sources using an efficient machine learning method

Shuai Wang, Mengyuan Zhang, Yueqi Gao, Peng Wang, Qingyan Fu, and Hongliang Zhang

Abstract. Chemical transport models (CTMs) are widely used for air pollution modeling, which suffer from significant biases due to uncertainties in simplified parameterization, meteorological fields, and emission inventories. Accurate diagnosis of simulation biases is critical for improvement of models, interpretation of results, and efficient air quality management, especially for the simulation of fine particulate matter (PM2.5). In this study, an efficient method based on machine learning (ML) was designed to diagnose the drivers of the Community Multiscale Air Quality (CMAQ) model biases in simulating PM2.5 concentrations from three perspectives of meteorology, chemical composition, and emission sources. The source-oriented CMAQ were used to diagnose influences of different emission sources on PM2.5 biases. The ML models showed good fitting ability with small performance gap between training and validation. The CMAQ model underestimates PM2.5 by -19.25 to -2.66 μg/m3 in 2019, especially in winter and spring and high PM2.5 events. Secondary organic components showed the largest contribution to PM2.5 simulation bias for different regions and seasons (13.8–22.6 %) among components. Relative humidity, cloud cover, and soil surface moisture were the main meteorological factors contributing to PM2.5 bias in the North China Plain, Pearl River Delta, and northwestern, respectively. Both primary and secondary inorganic components from residential sources showed the largest contribution (12.05 % and 12.78 %), implying large uncertainties in this sector. The ML-based methods provide valuable complements to traditional mechanism-based methods for model improvement, with high efficiency and low reliance on prior information.

Shuai Wang et al.

Status: open (until 05 Oct 2023)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CEC1: 'Comment on egusphere-2023-1531', Juan Antonio Añel, 05 Sep 2023 reply
  • RC1: 'Comment on egusphere-2023-1531', Anonymous Referee #1, 06 Sep 2023 reply
  • RC2: 'Comment on egusphere-2023-1531', Anonymous Referee #2, 15 Sep 2023 reply

Shuai Wang et al.

Model code and software

Machine learning code and training datasets Shuai Wang

Shuai Wang et al.


Total article views: 255 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
175 70 10 255 24 3 3
  • HTML: 175
  • PDF: 70
  • XML: 10
  • Total: 255
  • Supplement: 24
  • BibTeX: 3
  • EndNote: 3
Views and downloads (calculated since 10 Aug 2023)
Cumulative views and downloads (calculated since 10 Aug 2023)

Viewed (geographical distribution)

Total article views: 240 (including HTML, PDF, and XML) Thereof 240 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 30 Sep 2023
Short summary
Numerical models are widely used for air pollution modeling, but suffer from significant biases. Machine learning model designed in this study shows highly efficiency in identifying such biases. Meteorology (relative humidity and cloud cover), chemical composition (secondary organic components and dust aerosol), and emission sources (residential activities) are diagnosed as the main drivers of bias in modeling PM2.5, a typical air pollutant. The results will help to numerical model improvements.