the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: Reconstructing surface missing aerosol elemental carbon data in long-term series with ensemble learning
Abstract. Ground-based measurements of elemental carbon (EC) – classified under thermal-optical methods and considered as a surrogate for black carbon – are essential for assessing air quality and evaluating climate impacts. However, data gaps caused by technical challenges impede comprehensive analyses of long-term trends. This study proposes an ensemble learning method to address these challenges. The model uses readily accessible ground observation air pollutant data as proxies for EC-related tracers, along with meteorological parameters, to enhance prediction accuracy. It integrates outputs from Gradient Boosting Regression Trees, eXtreme Gradient Boosting, and Random Forest models, combining them through ridge regression to produce robust predictions. We applied this approach to reconstruct hourly EC concentrations from 2013 to 2023 for four cities in Eastern China, filling 45–79 % of missing data and improving prediction performance by 8–17 % compared to individual models. Over the 11-year period, EC exhibited an overall decline (-0.20 to -0.14 µg m-3 a-1), with a more significant decline from 2013 to 2020 (-0.24 to -0.15 µg m-3 a-1) from 3.26 µg m-3 to 1.59 µg m-3, followed by a noticeable slowdown from 2020 to 2023 (-0.12 to -0.04 µg m-3 a-1). Additionally, a fixed emission approximation method based on ensemble learning is proposed to quantitatively analyze the drivers of long-term EC trends. The analysis reveals that anthropogenic emission controls were the predominant contributors, accounting for approximately 92 % of the changes in EC trends from 2013 to 2020. However, their influence weakened post-2020, contributing approximately 80 %. These findings highlight that while China's Clean Air Actions implemented since 2013 have significantly reduced black carbon concentrations, sustained and enhanced strategies are still necessary to further mitigate black carbon pollution in the country.
- Preprint
(1622 KB) - Metadata XML
-
Supplement
(627 KB) - BibTeX
- EndNote
Status: open (until 11 Jan 2025)
-
RC1: 'Comment on egusphere-2024-2776', Anonymous Referee #2, 20 Dec 2024
reply
Long-term in-situ observations of black carbon aerosols are crucial for studying their environmental and climatic effects. However, in real-world observational studies, there are several inevitable technical challenges, such as data gaps. This paper proposes a machine learning method that elegantly addresses this issue. The method is applied to reconstruct time-series data of elemental carbon (EC) aerosols from four cities in eastern China. The results are also validated by comparing them with other datasets. Furthermore, the paper introduces a novel method for assessing the driving factors of long-term trends in elemental carbon, as well as evaluating the uncertainty associated with this approach. I believe both methods hold significant value for the field of atmospheric monitoring. Overall, the paper is well designed and written. However, I have the following points that the authors should address:
The authors introduce MERRA-2 black carbon column concentration data as one of the predictor variables. They also compare MERRA-2 near-surface black carbon concentrations and find that the MERRA-2 data tends to overestimate the site's elemental carbon data. I suggest that the authors conduct a sensitivity test by training the machine learning model without using MERRA-2 black carbon column concentration as a predictor variable and compare the results with the current ones.
The trend changes in EC aerosols are influenced by both meteorological conditions and emissions. In eastern China, the sources of black carbon generally include vehicle emissions and industrial coal combustion. While the paper quantifies the overall anthropogenic emission trend drivers, there is relatively little information on specific emission sectors, which may be a limitation of the method employed. The paper analyzes the daily variation of EC over the years and suggests that the reduction of motor vehicle emissions may be a major factor driving the decline in EC levels. I suggest that the authors could try to extend this analysis by investigating the trend changes of EC during vehicle emission rush hours or by quantifying the driving factors for these peak periods. This could provide a more detailed understanding of the trend changes.
The authors use the ridge regression algorithm for the multivariate regression analysis but do not employ the traditional multiple linear regression algorithm. I recommend that the authors clarify this choice. Additionally, regarding Equation 1, the expression may cause confusion because GBRTs, XGBoost, and RF are abbreviations for different machine learning algorithms, yet they are presented as variables in the formula. I suggest the authors optimize the notation for clarity.
Line 148 – 149: Appropriate references should be cited to support the use of these pollutants as tracers for source characterization.
Line 239: The phrase "Reconstruction of missing data of EC and trend analysis" should be revised to "Reconstruction of missing data of EC and comparison".
Line 336 – 337: The discussion on the impact of COVID-19 lockdowns on EC trend changes is well noted as a factual observation. Could the authors further discussion or quantify such impact?
Citation: https://doi.org/10.5194/egusphere-2024-2776-RC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
130 | 20 | 6 | 156 | 19 | 2 | 1 |
- HTML: 130
- PDF: 20
- XML: 6
- Total: 156
- Supplement: 19
- BibTeX: 2
- EndNote: 1
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1