the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Improving Ground-Level NO2 Estimation in China Using GEMS Measurements and a Nested Machine Learning Model
Abstract. The major bridge linking satellite-derived vertical column densities (VCDs) of nitrogen dioxide (NO2) with ground-level concentration is theoretically the NO2 mixing height (NMH). Various meteorological parameters have been used as a proxy of NMH in existing studies. This study developed a nested machine learning model to convert VCDs of NO2 into ground-level NO2 concentrations across China using Geostationary Environmental Monitoring Spectrometer (GEMS) measurements. This nested model was designed to directly incorporate NMH into the methodological framework and explore its impact on performance. The inner machine learning model predicted the NMH from the meteorological parameters, which were then input into the main machine learning model to predict the ground-level NO2 concentrations from its VCDs. The inclusion of NMH significantly enhanced the accuracy of estimating ground-level NO2 concentration, reducing bias and improving R² values to 0.93 in 10-fold cross-validation and 0.99 in the fully-trained model. Furthermore, NMH was identified as the second most important predictor variable, following the VCDs of NO2. Subsequently, satellite-derived ground-level NO2 data were analyzed across subregions with varying geolocations and urbanization levels. Highly populated areas typically experienced peak NO2 concentrations during early morning rush hours, whereas areas categorized as lightly populated observed a slight increase in NO2 levels one or two hours later, likely due to regional pollutant dispersion from urban sources. This study underscores the importance of incorporating NMH in estimating ground-level NO2 from satellite column measurements and highlights the significant advantages of geostationary satellites in providing detailed air pollution information at an hourly resolution.
- Preprint
(2249 KB) - Metadata XML
-
Supplement
(1450 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-558', Anonymous Referee #1, 05 Apr 2024
This study leverages advanced satellite measurements and machine learning techniques to estimate ground-level NO2 concentrations in China. The use of the GEMS measurements combined with a nested machine learning model marks an advanced approach to addressing the challenge of translating satellite-derived VCDs of NO2 into actionable ground-level concentration data. Incorporating the NMH into the prediction model not only demonstrates a methodological advancement but also highlights the crucial role of meteorological conditions in the dispersion of atmospheric pollutants. As the study achieves remarkable accuracy and provides comprehensive analyses of NO2 distribution patterns, I recommend the publication of this paper for Atmospheric Chemistry and Physics after minor revisions.
Specific Comments:
1. The planetary boundary layer (PBL), represented as NMH in this study, is identified as a significant factor influencing the conversion of VCDs of NO2 to ground-level concentrations. Due to its importance as illustrated in Figure 5, there should be more discussions on the relationship between PBL and surface air pollution. I also suggest the authors acknowledge the previous study investigating the Relationships between the PBL and surface pollutants over China, as well as the influencing factors.
2. Section 2.6 is the key section for this paper, since it present the details of machine learning model for this study. While the nested machine learning model demonstrates superior performance in estimating ground-level NO2 concentrations, the methodology section could benefit from a more clear discussion of the advantage of XGBoost regression model, as well as feature selection process, and the rationale behind choosing specific meteorological parameters as predictors.
3. The study mentions the challenges posed by cloudy conditions and the lack of nighttime data in interpreting GEMS measurements. While correction factors were applied to mitigate these issues, a more detailed discussion on the limitations and potential biases introduced by these factors would be beneficial. This discussion of limitations can be also included or mentioned in the conclusion section.
Citation: https://doi.org/10.5194/egusphere-2024-558-RC1 -
RC2: 'Comment on egusphere-2024-558', Anonymous Referee #2, 10 Apr 2024
Overview:
This paper introduced a machine learning model to estimate ground-level NO2 concentrations from geostationary satellite-derived NO2 vertical column densities (VCDs). The overall conclusions are that utilizing NO2 mixing height (NMH) can improve the accuracy of ground-level NO2 concentration estimates, and that satellite-derived ground-level NO2 concentration presents a population-based gradient.
Although this manuscript provides a few pieces of information that I believe are suitable for publication, it is riddled with grammar and technical issues and requires major revisions. Extensive simple grammar corrections should not be on the peer reviewers to fix at this stage, and such issues did make it difficult to understand the authors’ justification behind their conclusions. I also found the present document more like a technical report rather than a research paper, as plenty of scientific discussions are missing.
Major Comments:
- The weakest point in the manuscript is the discussion of the results. More than two-thirds of the ‘Discussions’ section repeats what have already been presented in the ‘Results’ section. The authors should expand more on the scientific principles underlying the results in the ‘Discussions’ section.
- The title and abstract indicate that this paper aims at improving ground-level NO2 estimation. However, the only figures that present such improvements are Figures 4 and 12. The manuscript also keeps talking about different patterns of ground-level NO2 concentration between highly and lightly populated areas. But how the improvements differ between these regions (and at different hours of the day)? How the estimates perform at the grid points where ground-based observations are available?
Minor Comments:
- Line 122: What is the nominal spatial resolution of GEMS NO2 product used in this study?
- Line 124: Please provide some information on how NO2 VCDs are standardized. Line 160 mentioned bi-linear interpolation, but it is for meteorological variables.
- Line 135: … divided the study region into four areas … -> … divided the study area into four categories …
- Line 253: How is the month of the year numbered exactly? If 1 to 12 is used for January to December, then cold months would be around 12 to 2, which may affect SHAP values shown in Figure 6.
- Line 259: Figure 6 indicates that lower T corresponds to lower NO2. How does it relate to ‘worsened’ ground-level NO2 pollution? And your reasoning ‘air stagnation’ may be wrong here.
- Line 260: Figure 6 does not indicate this pattern. Please either quantify the impact of RH and dew point explicitly or remove this sentence.
- Line 265: In this and the following sections, are ground-level NO2 concentration from ground-based observations or satellite-based estimates? Please clarify.
- Line 266: Since this paragraph is talking about Fig. S1, I would suggest presenting the figure in the main text. Also, as the correction factor is important to the results of this study, how it is calculated should be presented in the main text or as an appendix. Related to the computation of correction factor, what is the possible maxima of m? Is it up to 24 (hours of a day)?
- Line 350: Since Fig. S6 is discussed here, considering presenting the figure in the main text.
- Line 425: Are NO2 and NO really in chemical equilibrium in the real atmosphere?
- Line 444: The reasoning given here is too general. Consider adding some details/analysis specific to your results.
- Line 470: The wording and the order of the sentence starting with ‘The average ground-measured NO2 concentrations’ is confusing, please revise.
- Figure 3: How model 1 (i.e., without NMH) differs from model 2 (with NMH) is not clearly shown in the diagram. Please either split the flowcharts or add some description in the caption.
- Figure 4: Please clarify the meaning of each figure element (dots with colors, lines, etc.).
- Figure 7: Is this figure corresponding to ground-based observations or satellite-based estimates? Is it an average of 8 AM to 3 PM local time or daily average? Please clarify. Also, mark the province if possible so that readers unfamiliar with China can have a better sense of the regions you are referring to.
- Figures 9 through 12: What are the vertical bars in each plot? Please clarify.
Citation: https://doi.org/10.5194/egusphere-2024-558-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
309 | 81 | 18 | 408 | 17 | 7 | 8 |
- HTML: 309
- PDF: 81
- XML: 18
- Total: 408
- Supplement: 17
- BibTeX: 7
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1