the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Explainable ensemble machine learning revealing enhanced anthropogenic emissions of particulate nitro-aromatic compounds in eastern China
Abstract. Nitro-aromatic compounds (NACs) are important atmospheric pollutants that impact air quality, atmospheric chemistry, and human health. Understanding the relationship between NACs formation and key environmental driving factors are crucial for mitigating their environmental and health impacts. In this work, we combined an ensemble machine learning (EML) model with the SHapley Additive exPlanation (SHAP) and positive matrix factorization (PMF) model to identify the key driving factors for ambient particulate NACs covering primary emissions, secondary formation, and meteorological conditions based on field observations at urban, rural, and mountain sites in eastern China. The EML model effectively reproduced ambient NACs and recognized that anthropogenic emissions (i.e., coal combustion, traffic emission, and biomass burning) were the most important driving factors, with the total contribution of 49.3 %, while significant influences from meteorology (27.4 %), and secondary formation (23.3 %) were also confirmed. Seasonal variations analysis showed that direct emissions presented positive responses to NACs concentrations in spring, summer, and autumn, while temperature had the largest impact in winter. By evaluating NACs formation and loss under various locations in winter, we found that anthropogenic sources played a dominant role in increasing NACs levels in urban and rural sites, while reduced ambient temperature along with secondary formation from gas-phase oxidation was the main reason for relatively high particulate NACs levels at the mountain site. This work provides a reliable modelling method for understanding the dominant sources and influencing factors for atmospheric NACs and highlights the necessity of strengthening emission sources controls to mitigate organic aerosol pollution.
- Preprint
(2204 KB) - Metadata XML
-
Supplement
(1853 KB) - BibTeX
- EndNote
Status: open (until 05 Apr 2025)
-
RC1: 'Comment on egusphere-2025-165', Anonymous Referee #1, 13 Mar 2025
reply
Overall evaluation:
Li et al. investigates the influencing factors of particulate nitro-aromatic compounds (NACs) in eastern China, including meteorological factors, primary and secondary sources of NACs. The machine learning combined with PMF model is a key feature. Also, machine learning has not been applied to study NACs before. So, this paper studies an important component of atmospheric aerosols (i.e., NACs) with an innovative approach. Results are presented in a logical and organized manner with thorough discussion. Conclusions are clear and reasonable. Nevertheless, there are still some places where clarifications are needed, which are not major or critical issues. Also, the language could be further improved. Overall, I would recommend a minor revision before this paper could be accepted.
Minor comments:
Line 24-25: The authors state that “temperature had the largest impact in winter”. It is still not clear whether higher temperatures impose a positive or negative impact on NAC abundances in winter. Please briefly elaborate here.
Line 48-49: (1) The “in-situ” used here sounds not necessary. (2) Specifically, only aromatic VOCs could be oxidized to produce NACs. (3) More recent references should also be cited here. A recommended reference is shown as follows.
Men Xia et al., 2023, JGR:A, Observations and Modeling of Gaseous Nitrated Phenols in Urban Beijing: Insights From Seasonal Comparison and Budget Analysis.
Line 55: How could solar radiation inhibit NAC photolysis? In my understanding, solar radiation should enhance NAC photolysis. Also, “NACs photolysis production and loss” lacks clarity. Please double check the expressions here.
Line 56: What does “their” refer to, the abundance of NACs or the influencing factors of NACs? Please clarify. Also, please check the potential abuse or overuse of “it” and “they” in other places.
Line 67: So far, it is inappropriate to judge that machine learning is a more advanced method than PMF or PCA analysis. As an emerging method that is only recently applied in atmospheric chemistry, some scholars also hold a conservative attitude toward the usage of machine learning.
Line 72-73: It is not clear whether Qin et al. and Peng et al. investigated NACs or other compounds.
Line 79: The authors mention “source apportionment”. Does that mean the authors also use methods like PMF or PCA? Please do clarify this key point.
Line 84-85: The combination of PMF and machine learning is a highlight in this paper, which should be emphasized more clearly and thoroughly here, and maybe emphasized again in other places, e.g., the last paragraph in the conclusion part.
Line 97-98: The authors honestly acknowledge that some data has been reported in previous studies, which is of course good manners. Nevertheless, it is more important to emphasize what data is newly reported here, if any.
Line 112: Check for typo of “filed campaigns”.
Line 115: The authors mention SO2, NO2, and O3. Was NO measured? Usually, NO and NO2 are measured together by gas analyzers.
Line 135: Was 2-nitrophenol detected? Why or why not?
Line 142-146: Please elaborate what is the overall/total uncertainty of measuring NACs?
Line 159-161: Although more details could be found in SI, it is still necessary to state other data or parameters input into the PMF. Also, the key message of PMF methods stated in SI should also be briefly summarized and mentioned in the main text.
Line 164: The expression “were considered firstly in this study” sounds misleading. The mentioned machine learning algorithms have already been applied in previous studies.
Line 182: Check for typo of “leaners”.
Line 195: Check the grammar for “for quantify”. Please also carefully check the grammar issues in other places.
About section 2.5 Aerosol surface area density (Sa) prediction. This section needs to be moved to SI. The prediction of Sa by machine learning is not a major scientific goal of this study.
In Table 1, at least the total NACs concentrations, which is key to this study, should be mentioned. Since the season has been mentioned, the detailed sampling period is less interesting and may be recorded in SI.
In Figure 2, it is not clear how to understand these box plots. Please clearly state what does the boxes and data dots mean in this figure. For example, in the box plot, which mark represents the mean and median value, which marks show the interquartile range.
Line 329: Check typo for “expect winter”. Check for grammar for “which with a little high contribution”.
Line 345: What do PE and SF mean? To help readers understand the figure clearly, please elaborate here even if they are defined elsewhere.
Citation: https://doi.org/10.5194/egusphere-2025-165-RC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
111 | 26 | 5 | 142 | 20 | 5 | 9 |
- HTML: 111
- PDF: 26
- XML: 5
- Total: 142
- Supplement: 20
- BibTeX: 5
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1