the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine learning data fusion for high spatio-temporal resolution PM2.5
Abstract. Understanding PM2.5 variability at fine scale is crucial to assess urban pollution impact on the population and to inform the policy-making process. PM2.5 in-situ measurements at ground level cannot offer gapless spatial coverage, while current satellite retrievals generally cannot offer both high-spatial and high-temporal resolution, with night-time estimation posing further challenges. This study tackles these difficulties, introducing an innovative deep learning data fusion method to estimate hourly PM2.5 maps at 100 m resolution on urban areas. We combine low resolution geophysical model data, high resolution geographical indicators, PM2.5 in-situ ground stations measurements and PM2.5 retrieved at satellite overpass. To simultaneously treat spatial and temporal correlations in our data, we deploy a 3D U-Net based neural network model. To evaluate the model, we select the city of Paris, France, in the year 2019 as our study region and time. Quantitative assessment of the model is carried out using the ground station data with a leave-one-out cross-validation approach. Our method outperforms MERRA-2 PM2.5 estimates, predicting PM2.5 hourly (R2 = 0.51, RMSE = 6.58 μg/m3), daily (R2 = 0.65, RMSE = 4.92 μg/m3), and monthly (R2 = 0.87, RMSE = 2.87 μg/m3). The proposed approach and its possible future developments can be highly beneficial for PM2.5 exposure and regulation studies at fine suburban scale.
- Preprint
(3121 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2024-4056', Anonymous Referee #1, 31 Mar 2025
reply
This study integrates multi-source data, including satellite and ground-based station data, to construct a deep learning model for estimating 24-hour high-resolution PM2.5 data. High spatiotemporal resolution PM2.5 mapping is of significant importance for pollution control and decision-making, and this study represents a useful attempt in this field. However, the following issues need to be addressed:
The study aims to estimate 24-hourly PM2.5 maps at 100 m resolution in urban areas. However, as shown in Table A1, most of the input data have resolutions coarser than 100 m, except for OpenStreetMap roads and DEM data, which are not directly related to PM2.5. How do the authors justify that the estimated PM2.5 resolution truly reaches 100 m?
The paper presents a deep learning-based estimation approach, but the description of the methodology remains unclear. First, Lines 148–149 mention that "The output is a 3-dimensional array containing 24 hourly PM2.5 maps," but Lines 159–160 state that "the output layer is a 3D 1x1x1 convolution," which appears contradictory and should be clarified. Second, the construction of the loss function is confusing—it should ideally be constrained by PM2.5 measurements from ground stations and NOODLESALAD PM2.5, but its current formulation appears overly complex and difficult to understand.
The study aims to estimate 24-hour, 100 m resolution PM2.5 data, but most of the results presented are seasonal or monthly averages. We would like to see 24-hour PM2.5 mapping results. Additionally, the comparison with MERRA2 focuses mainly on accuracy. Could the authors also better illustrate PM2.5’s spatial distribution and gradient variations, or even capture specific pollution emissions?
The study applies explainable AI techniques to explore the importance of different features, showing that SHAP values identify 2-meter air temperature as the most important feature. However, this analysis could be further improved. First, the underlying reasons for why certain variables are important (or not) are not sufficiently explored. Second, a broader perspective could be considered—how much of the variability in PM2.5 can be explained by meteorological variables overall?
The description of NOODLESALAD PM2.5 and its role in this study is unclear. The authors should provide a more detailed explanation rather than merely citing previous studies.
The results and analysis section could be further improved. First, it is recommended to structure the results into separate subsections rather than mixing everything together. Second, the quality of Figures 3–6 should be improved—currently, the font size is too small, and the figure titles could be removed (since the descriptions are already included in the captions). Lastly, additional results, such as 24-hour high-resolution PM2.5 maps, could enhance the persuasiveness of the study.
The references in the paper are somewhat outdated, with few studies from the recent three years included. It is recommended to update and supplement them.
Some minor issues:
(1) Figure 1: Does the figure represent the road network? Please clarify.
(2) Line 134: "3D PM2.5 maps" could be misinterpreted as three-dimensional spatial maps (including altitude). Is this the correct terminology?
(3) Figure 2: The representation is somewhat abstract. It would be better if the inputs and outputs were explicitly illustrated.
(4) Line 279: "consistent with prior findings" should be supported with references.
Citation: https://doi.org/10.5194/egusphere-2024-4056-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
149 | 32 | 5 | 186 | 5 | 3 |
- HTML: 149
- PDF: 32
- XML: 5
- Total: 186
- BibTeX: 5
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 60 | 31 |
Finland | 2 | 35 | 18 |
China | 3 | 22 | 11 |
India | 4 | 9 | 4 |
Germany | 5 | 8 | 4 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 60