Machine learning data fusion for high spatio-temporal resolution PM2.5
Abstract. Understanding PM2.5 variability at fine scale is crucial to assess urban pollution impact on the population and to inform the policy-making process. PM2.5 in-situ measurements at ground level cannot offer gapless spatial coverage, while current satellite retrievals generally cannot offer both high-spatial and high-temporal resolution, with night-time estimation posing further challenges. This study tackles these difficulties, introducing an innovative deep learning data fusion method to estimate hourly PM2.5 maps at 100 m resolution on urban areas. We combine low resolution geophysical model data, high resolution geographical indicators, PM2.5 in-situ ground stations measurements and PM2.5 retrieved at satellite overpass. To simultaneously treat spatial and temporal correlations in our data, we deploy a 3D U-Net based neural network model. To evaluate the model, we select the city of Paris, France, in the year 2019 as our study region and time. Quantitative assessment of the model is carried out using the ground station data with a leave-one-out cross-validation approach. Our method outperforms MERRA-2 PM2.5 estimates, predicting PM2.5 hourly (R2 = 0.51, RMSE = 6.58 μg/m3), daily (R2 = 0.65, RMSE = 4.92 μg/m3), and monthly (R2 = 0.87, RMSE = 2.87 μg/m3). The proposed approach and its possible future developments can be highly beneficial for PM2.5 exposure and regulation studies at fine suburban scale.