Preprints
https://doi.org/10.5194/egusphere-2026-474
https://doi.org/10.5194/egusphere-2026-474
05 Mar 2026
 | 05 Mar 2026
Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

Improving Imputation of Missing PM2.5 Speciation Data Using PMF-Informed Source–Receptor Relationships

Wubin Zhu, Mingjie Xie, Qili Dai, Xiaohui Bi, Yufen Zhang, and Yinchang Feng

Abstract. Missing values are ubiquitous in atmospheric monitoring due to instrument drift, calibration cycles, operational interruptions, and other random malfunctions. Such gaps can undermine the reliability of subsequent analyses and introduce systematic biases. Conventional imputation methods, such as K-nearest neighbor (KNN), Bayesian principal component analysis (BPCA), and deep learning architectures, rely primarily on statistical correlations, requiring auxiliary inputs, and offer limited physical interpretability. To address this issue, we propose a novel source–receptor informed Positive Matrix Factorization Reconstruction (PMFr) method that leverages PMF-derived source–receptor relationships, rather than purely statistical interpolation, to impute missing PM2.5 speciation data without requiring auxiliary data. Benchmarking against commonly used imputation techniques KNN, BPCA, and deep learning predictive model demonstrates that PMFr achieves superior accuracy and robustness under all real-world missing scenarios, with a mean coefficient of determination (R2) of 0.81, index of agreement (IoA) of 0.92, and mean absolute percentage error (MAPE) of 22.8 %, reducing MAPE by 25.5–29.1 %, particularly for key PM2.5 species, highlighting its potential as a robust tool for recovering reliable data in air quality studies.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Wubin Zhu, Mingjie Xie, Qili Dai, Xiaohui Bi, Yufen Zhang, and Yinchang Feng

Status: open (until 10 Apr 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Wubin Zhu, Mingjie Xie, Qili Dai, Xiaohui Bi, Yufen Zhang, and Yinchang Feng
Wubin Zhu, Mingjie Xie, Qili Dai, Xiaohui Bi, Yufen Zhang, and Yinchang Feng
Metrics will be available soon.
Latest update: 05 Mar 2026
Download
Short summary
Missing values are common in air quality measurements and can lead to biased environmental conclusions if not properly addressed. We developed a new method to reconstruct missing data by leveraging inherent physical relationships between contributed emission sources and measured concentrations. Unlike existing statistical imputation approaches, this method yields more accurate particulate matter speciation data, providing a robust foundation for data-driven atmospheric research.
Share