A Hybrid STL-Ensemble Framework for Multivariate Time-Series Forecasting of Source-Specific PM2.5 Emissions
Abstract. Forecasting the evolution of source-specific particulate emissions is central to modern air quality management strategies. Existing source identification methods result in non-uniqueness and instability in source profiles, leading to uncertainties in source identification and quantification. In this work, we present an approach that integrates receptor modeling with supervised machine learning to overcome this limitation. The hybrid model integrates statistical decomposition, feature-engineered multivariate learning, and ensemble regression techniques to predict the temporal trajectory of PM2.5 source contributions. The concentrations of elemental and organic species from high-resolution measurement systems were processed through source apportionment to identify the target sources. A time-series pipeline was developed, including temporal imputation, autocorrelation-guided feature engineering, Seasonal-Trend Decomposition using LOESS (STL), and multi-output ensemble regression. The proposed method demonstrated improved predictive performance across diverse emission categories, highlighting the importance of decomposition for interpretability and providing a robust foundation for the operational forecasting of air quality dynamics. Compared to the source-specific PM2.5 emission forecasting without STL, the proposed method is able to improve the R2 score from 0.22 to 0.95 in aggregate. The proposed comprehensive modeling framework is robust and can be adapted to various multi-source environmental datasets.