Technical note: Reconstructing surface missing aerosol elemental carbon data in long-term series with ensemble learning

Meng, Qingxiao; Zhang, Yunjiang; Zhong, Sheng; Fang, Jie; Tang, Lili; Rao, Yongcai; Zhou, Minfeng; Qiu, Jian; Xu, Xiaofeng; Petit, Jean-Eudes; Favez, Olivier; Ge, Xinlei

doi:https://doi.org/10.5194/egusphere-2024-2776

Preprints

https://doi.org/10.5194/egusphere-2024-2776

Preprints

25 Nov 2024

| 25 Nov 2024

Technical note: Reconstructing surface missing aerosol elemental carbon data in long-term series with ensemble learning

Qingxiao Meng, Yunjiang Zhang, Sheng Zhong, Jie Fang, Lili Tang, Yongcai Rao, Minfeng Zhou, Jian Qiu, Xiaofeng Xu, Jean-Eudes Petit, Olivier Favez, and Xinlei Ge

Abstract. Ground-based measurements of elemental carbon (EC) – classified under thermal-optical methods and considered as a surrogate for black carbon – are essential for assessing air quality and evaluating climate impacts. However, data gaps caused by technical challenges impede comprehensive analyses of long-term trends. This study proposes an ensemble learning method to address these challenges. The model uses readily accessible ground observation air pollutant data as proxies for EC-related tracers, along with meteorological parameters, to enhance prediction accuracy. It integrates outputs from Gradient Boosting Regression Trees, eXtreme Gradient Boosting, and Random Forest models, combining them through ridge regression to produce robust predictions. We applied this approach to reconstruct hourly EC concentrations from 2013 to 2023 for four cities in Eastern China, filling 45–79 % of missing data and improving prediction performance by 8–17 % compared to individual models. Over the 11-year period, EC exhibited an overall decline (-0.20 to -0.14 µg m^-3 a^-1), with a more significant decline from 2013 to 2020 (-0.24 to -0.15 µg m^-3 a^-1) from 3.26 µg m^-3 to 1.59 µg m^-3, followed by a noticeable slowdown from 2020 to 2023 (-0.12 to -0.04 µg m^-3 a^-1). Additionally, a fixed emission approximation method based on ensemble learning is proposed to quantitatively analyze the drivers of long-term EC trends. The analysis reveals that anthropogenic emission controls were the predominant contributors, accounting for approximately 92 % of the changes in EC trends from 2013 to 2020. However, their influence weakened post-2020, contributing approximately 80 %. These findings highlight that while China's Clean Air Actions implemented since 2013 have significantly reduced black carbon concentrations, sustained and enhanced strategies are still necessary to further mitigate black carbon pollution in the country.

Received: 04 Sep 2024 – Discussion started: 25 Nov 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1622 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1622 KB)

Supplement (627 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

15 Jul 2025

Technical note: Reconstructing missing surface aerosol elemental carbon data in long-term series with ensemble learning

Qingxiao Meng, Yunjiang Zhang, Sheng Zhong, Jie Fang, Lili Tang, Yongcai Rao, Minfeng Zhou, Jian Qiu, Xiaofeng Xu, Jean-Eudes Petit, Olivier Favez, and Xinlei Ge

Atmos. Chem. Phys., 25, 7485–7498, https://doi.org/10.5194/acp-25-7485-2025,https://doi.org/10.5194/acp-25-7485-2025, 2025

Short summary

Qingxiao Meng, Yunjiang Zhang, Sheng Zhong, Jie Fang, Lili Tang, Yongcai Rao, Minfeng Zhou, Jian Qiu, Xiaofeng Xu, Jean-Eudes Petit, Olivier Favez, and Xinlei Ge

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-2776', Anonymous Referee #2, 20 Dec 2024

Long-term in-situ observations of black carbon aerosols are crucial for studying their environmental and climatic effects. However, in real-world observational studies, there are several inevitable technical challenges, such as data gaps. This paper proposes a machine learning method that elegantly addresses this issue. The method is applied to reconstruct time-series data of elemental carbon (EC) aerosols from four cities in eastern China. The results are also validated by comparing them with other datasets. Furthermore, the paper introduces a novel method for assessing the driving factors of long-term trends in elemental carbon, as well as evaluating the uncertainty associated with this approach. I believe both methods hold significant value for the field of atmospheric monitoring. Overall, the paper is well designed and written. However, I have the following points that the authors should address:

The authors introduce MERRA-2 black carbon column concentration data as one of the predictor variables. They also compare MERRA-2 near-surface black carbon concentrations and find that the MERRA-2 data tends to overestimate the site's elemental carbon data. I suggest that the authors conduct a sensitivity test by training the machine learning model without using MERRA-2 black carbon column concentration as a predictor variable and compare the results with the current ones.

The trend changes in EC aerosols are influenced by both meteorological conditions and emissions. In eastern China, the sources of black carbon generally include vehicle emissions and industrial coal combustion. While the paper quantifies the overall anthropogenic emission trend drivers, there is relatively little information on specific emission sectors, which may be a limitation of the method employed. The paper analyzes the daily variation of EC over the years and suggests that the reduction of motor vehicle emissions may be a major factor driving the decline in EC levels. I suggest that the authors could try to extend this analysis by investigating the trend changes of EC during vehicle emission rush hours or by quantifying the driving factors for these peak periods. This could provide a more detailed understanding of the trend changes.

The authors use the ridge regression algorithm for the multivariate regression analysis but do not employ the traditional multiple linear regression algorithm. I recommend that the authors clarify this choice. Additionally, regarding Equation 1, the expression may cause confusion because GBRTs, XGBoost, and RF are abbreviations for different machine learning algorithms, yet they are presented as variables in the formula. I suggest the authors optimize the notation for clarity.

Line 148 – 149: Appropriate references should be cited to support the use of these pollutants as tracers for source characterization.

Line 239: The phrase "Reconstruction of missing data of EC and trend analysis" should be revised to "Reconstruction of missing data of EC and comparison".

Line 336 – 337: The discussion on the impact of COVID-19 lockdowns on EC trend changes is well noted as a factual observation. Could the authors further discussion or quantify such impact?

Citation: https://doi.org/10.5194/egusphere-2024-2776-RC1
- AC1: 'Reply on RC1', Yunjiang Zhang, 06 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2776/egusphere-2024-2776-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2776-AC1
RC2:
'Comment on egusphere-2024-2776', Anonymous Referee #3, 29 Jan 2025

The authors adopt one ensemble learning model by integrating three Machine Learning models, including Gradient Boosting Regression Trees (GBT), eXtreme Gradient Boosting (XGB) and Random Forest (RF), coupled with ridge regression to generate robust predictions, to fill the gap of the element carbon (EC) data from 2013 to 2023 in Yangtze River Delta, China. The reconstructed EC dataset is valid by the intercomparison of EC with other datasets. Lastly, ensemble learning was used to design a fixed emission approximation method to disentangle and quantify the contribution of anthropogenic drivers to EC reduction.
This work is well organized. The authors present sufficient evidence to prove their robust and good performance in terms of the ensemble learning method. However, I’m sceptical about certain results of this study, particularly on the fixed emission approximation method. The acceptance of this manuscript is contingent upon the authors thoroughly validating those results. In addition, several places in this manuscript require an improvement. I recommend the acceptance after the authors address the comments and concerns detailed below.
General comment:
After reading this manuscript, my initial impression is that the authors have a wide knowledge of Machine Learning. However, I have some concerns as follows: As you mentioned in the 2.4.3 section (Line: 225): the errors increase when 2018 and 2019 are used as baseline years. 1) I am confused by the reason you provided, which is due to the missing meteorological parameters. As far as I know, ERA5 is a continuously updated dataset. It should not have missing values in 2018 and 2019. Please clarify this point. 2) If possible, try to use the ground-based measurements of meteorological factors rather than ERA5; 3) Please clarify how you retrieved the meteorological factors from the ERA5 in four cities in the 2.1 section. 4) In principle, the choice of the baseline year is critical. Basically, the baseline year is representative of typical conditions. If the selected year is an anomaly (e.g. huge emission reduction in COVID year), it could lead to an overestimation/underestimation. Could you explain how you chose the baseline year?
Specific comment:
1）Line 27: Rephase the sentence: from 2013 to 2020 (-0.24 to -0.15 µg m⁻³ a⁻¹) from 3.26 µg m⁻³ to 1.59 µg m⁻³
2) When narrating, maintain consistency in sentence tenses. For example, we evaluated… in Line 199 and we propose… in Line 206
3）Line 214: If the FEA method were…. Please double-check the whole text and use the singular and plural correctly.

Citation: https://doi.org/10.5194/egusphere-2024-2776-RC2
- AC2: 'Reply on RC2', Yunjiang Zhang, 06 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2776/egusphere-2024-2776-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2776-AC2
RC3:
'Comment on egusphere-2024-2776', Anonymous Referee #4, 08 Feb 2025
The current manuscript aims to address the lack of continuous data for 4 cities gap-filled black carbon (BC) data to ultimately assess the trends in this pollutant as a result of the mitigation plans enforced in China in the 2013-2023 period. The reconstruction of these measurements is conducted by means of a machine learning (ML) ensemble of techniques validated upon the existent data, providing good agreement for all sites and years. Additionally, this manuscript provides a method to estimate weather and emission contributions to the reported concentrations, hence tackling the assessment of the effectivity of the abatement actions based solely on the anthropic drivers of BC.
The reviewer agrees to publish this article under minor revisions.
Overall Feedback
The presented manuscript is outstanding regarding the implementation of machine learning in atmospheric aerosol studies while maintaining the final purpose of it, evaluating the trends of the studied pollutant as a consequence of the implemented abatement plans. This paper consists on three main blocks: i. Gap filling of BC time series; ii. Differentiation of the anthropogenic and meteorological drivers of BC evolution; iii. Trend analysis of the outcoming i., ii., outcomes to evaluate China’s pollution mitigation actions. The manuscript is in general very well-written and structured. However, I list below certain aspects which should be addressed:
The BC, EC data used in the EL models are not clearly described, neither the conversion from one to the other. Firstly, please, state for which cities you have EC, for which you have rBC, and for which you have both (Nanjing only, I assume). I see how Figure 1 shows a 1:1 slope for the presented sites and the Nanjing dataset, but it can not be like this for every site (Jeong et al., 2004, Rigler et al., 2020). Since you cannot provide a 1:1 EC-BC scatterplot for your other sites, please at least state the risk of EC, BC not being interchangeable in the rest of your sites and indicate possible consequences of that.

You mention in 2.2, 2.3 the limitations of measurements and simulations and the substantial uncertainty these could drag to the EL model. Did you consider introducing uncertainties of both measurements and models as predictors in your EL? In case they became a strong predictor, you could narrow down which instrumental errors are more problematic for your data reconstruction, and maybe you could improve the predictions if filtering them out.

Line 134. Provide some explanation on the advantages of the ridge regression or a reference.

Please provide a list of all the “meteorological and emission indicator variables” (Line 151) that you feed the model with.

The proportion of data trained vs. reconstructed is concerning. Even if you get good reconstructive metrics, I feel a bit skeptical on how extrapolating these predictions learned to other years can be an oversimplification, especially if the years to be reconstructed are anterior to the mitigation policies, as for Xuzhou, Zhenjiang. You could be missing actual significant drivers of EC that were minimized after the abatement regulations. Please, consider evaluating such long-term trends for these two last cities if you don’t have any measurements/satellite information about the previous atmospheric composition. Also, provide the correlation with CO, NOx you gave for the whole period only in the reconstructed periods in addition to the overall long-term correlation.

Figure 4. Could be the comparison between EC (EL predictions), BCC (MERRA-2), and BC (TAP) misleading the interpretation of the plots here since these are not directly exchangeable variables? Please discuss the limits of the comparability.

I see the FEA method power to discern between meteorological and anthropic emissions, I consider this is a very well-conceived approach. However, I would restrict the is to be quite near the js, ks. Training with 2013 and predicting 2022, 2023 might be unrealistic, since the validity of the fixed emission hypothesis is less robust. This is specially concerning when training is performed with the reconstructed data with no measurements to validate these years, as I mentioned two points ago. I think being conservative here and acknowledging the limitations of your datasets would make your trend evaluations more sturdy, especially since some readers might be rather ML-skeptic.

Please indicate explicitly that C_MET(i,i) is the self-prediction for the year i based on the training _i. This can be understood from the text but stating it would help the reader to understand more easily since these nomenclatures might be new for them.

In the text (lines 222-223), you provide uncertainties for these Ys. Can you please explain how did you get those uncertainties?

About Figure 3, how do you explain that the uncertainties of your methods are for almost all cells positive? If I understood properly, the negative uncertainties should be as probable as the positive ones, since |ΔANT_i,j| ~ |ΔANT_j,i|.

About Figure 3, the fact that the lower uncertainties you get are from Xuzhouu, with less measurements availability, whilts Suzhou, with higher coverage has higher uncertainty. This, for me, is reinforcing the idea that predicting over no measurement-anchors in the Xuzhou, Zhenjiang early period can lead to an oversimplification of the BC concentrations which might be comfortable for the FEA method. I find more normal that the model struggles for the actual measurement-based 2018-2019 baseline periods than that it doesn’t for the predicted 2013-2015.

Please provide the trend-estimator method you use in the methods section (is it Senn’s slope) and provide the significance estimator of your results. Do you use “seasonal” Senn slopes, so that their effect is less taken into account?

The discussion on why MERRA-2 is not properly capturing trends is very interesting (lines 244-251). Please, could you further detail which (meteorological/emissions) situations are better/worse captured by MERRA and TAP?

Table S5. Why do you think the Zhenjiang city reconstruction is significantly worse than the others.

Please, provide a short explanation on the meteorological normalization method by Grange et al., 2018.

Figure S5e-h. It seems that ~2013, emission diels were rather flat whilst they become more marked in the last years. This could be because: i. Meterological influence was underestimated for those periods; ii. Emission patterns/sources changed. Please discuss this variance.

In the last paragraph of your results section you explain the reductions of the anthropogenic emissions in the period of study, which is the objective of the paper. Could you also gieve some insights on the trend of meteorological impacts on concetrations? Do you consider that the atmospheric influence should be static over the trend or do you expect steady changes?

Technical changes
Figure 3 is a bit difficulty posed. I understand that the FEA uncertainty shown here is the Y_i of equations 10, 11, please indicate this instead of “FEA uncertainty (%)” in the colourbar.

Figure 4, please play with the transparency or the wave order of the a-d time series so that we can see when the observations actually happen in a glance.

References
Jeong, C. H., Hopke, P. K., Kim, E., & Lee, D. W. (2004). The comparison between thermal-optical transmittance elemental carbon and Aethalometer black carbon measured at multiple monitoring sites. Atmospheric Environment, 38(31), 5193-5204.
Rigler, M., Drinovec, L., Lavrič, G., Vlachou, A., Prévôt, A. S., Jaffrezo, J. L., ... & Močnik, G. (2020). The new instrument using a TC–BC (total carbon–black carbon) method for the online measurement of carbonaceous aerosols. Atmospheric Measurement Techniques, 13(8), 4333-4351.
Citation: https://doi.org/10.5194/egusphere-2024-2776-RC3
- AC3: 'Reply on RC3', Yunjiang Zhang, 06 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2776/egusphere-2024-2776-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2776-AC3

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-2776', Anonymous Referee #2, 20 Dec 2024

Long-term in-situ observations of black carbon aerosols are crucial for studying their environmental and climatic effects. However, in real-world observational studies, there are several inevitable technical challenges, such as data gaps. This paper proposes a machine learning method that elegantly addresses this issue. The method is applied to reconstruct time-series data of elemental carbon (EC) aerosols from four cities in eastern China. The results are also validated by comparing them with other datasets. Furthermore, the paper introduces a novel method for assessing the driving factors of long-term trends in elemental carbon, as well as evaluating the uncertainty associated with this approach. I believe both methods hold significant value for the field of atmospheric monitoring. Overall, the paper is well designed and written. However, I have the following points that the authors should address:

The authors introduce MERRA-2 black carbon column concentration data as one of the predictor variables. They also compare MERRA-2 near-surface black carbon concentrations and find that the MERRA-2 data tends to overestimate the site's elemental carbon data. I suggest that the authors conduct a sensitivity test by training the machine learning model without using MERRA-2 black carbon column concentration as a predictor variable and compare the results with the current ones.

The trend changes in EC aerosols are influenced by both meteorological conditions and emissions. In eastern China, the sources of black carbon generally include vehicle emissions and industrial coal combustion. While the paper quantifies the overall anthropogenic emission trend drivers, there is relatively little information on specific emission sectors, which may be a limitation of the method employed. The paper analyzes the daily variation of EC over the years and suggests that the reduction of motor vehicle emissions may be a major factor driving the decline in EC levels. I suggest that the authors could try to extend this analysis by investigating the trend changes of EC during vehicle emission rush hours or by quantifying the driving factors for these peak periods. This could provide a more detailed understanding of the trend changes.

The authors use the ridge regression algorithm for the multivariate regression analysis but do not employ the traditional multiple linear regression algorithm. I recommend that the authors clarify this choice. Additionally, regarding Equation 1, the expression may cause confusion because GBRTs, XGBoost, and RF are abbreviations for different machine learning algorithms, yet they are presented as variables in the formula. I suggest the authors optimize the notation for clarity.

Line 148 – 149: Appropriate references should be cited to support the use of these pollutants as tracers for source characterization.

Line 239: The phrase "Reconstruction of missing data of EC and trend analysis" should be revised to "Reconstruction of missing data of EC and comparison".

Line 336 – 337: The discussion on the impact of COVID-19 lockdowns on EC trend changes is well noted as a factual observation. Could the authors further discussion or quantify such impact?

Citation: https://doi.org/10.5194/egusphere-2024-2776-RC1
- AC1: 'Reply on RC1', Yunjiang Zhang, 06 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2776/egusphere-2024-2776-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2776-AC1
RC2:
'Comment on egusphere-2024-2776', Anonymous Referee #3, 29 Jan 2025

The authors adopt one ensemble learning model by integrating three Machine Learning models, including Gradient Boosting Regression Trees (GBT), eXtreme Gradient Boosting (XGB) and Random Forest (RF), coupled with ridge regression to generate robust predictions, to fill the gap of the element carbon (EC) data from 2013 to 2023 in Yangtze River Delta, China. The reconstructed EC dataset is valid by the intercomparison of EC with other datasets. Lastly, ensemble learning was used to design a fixed emission approximation method to disentangle and quantify the contribution of anthropogenic drivers to EC reduction.
This work is well organized. The authors present sufficient evidence to prove their robust and good performance in terms of the ensemble learning method. However, I’m sceptical about certain results of this study, particularly on the fixed emission approximation method. The acceptance of this manuscript is contingent upon the authors thoroughly validating those results. In addition, several places in this manuscript require an improvement. I recommend the acceptance after the authors address the comments and concerns detailed below.
General comment:
After reading this manuscript, my initial impression is that the authors have a wide knowledge of Machine Learning. However, I have some concerns as follows: As you mentioned in the 2.4.3 section (Line: 225): the errors increase when 2018 and 2019 are used as baseline years. 1) I am confused by the reason you provided, which is due to the missing meteorological parameters. As far as I know, ERA5 is a continuously updated dataset. It should not have missing values in 2018 and 2019. Please clarify this point. 2) If possible, try to use the ground-based measurements of meteorological factors rather than ERA5; 3) Please clarify how you retrieved the meteorological factors from the ERA5 in four cities in the 2.1 section. 4) In principle, the choice of the baseline year is critical. Basically, the baseline year is representative of typical conditions. If the selected year is an anomaly (e.g. huge emission reduction in COVID year), it could lead to an overestimation/underestimation. Could you explain how you chose the baseline year?
Specific comment:
1）Line 27: Rephase the sentence: from 2013 to 2020 (-0.24 to -0.15 µg m⁻³ a⁻¹) from 3.26 µg m⁻³ to 1.59 µg m⁻³
2) When narrating, maintain consistency in sentence tenses. For example, we evaluated… in Line 199 and we propose… in Line 206
3）Line 214: If the FEA method were…. Please double-check the whole text and use the singular and plural correctly.

Citation: https://doi.org/10.5194/egusphere-2024-2776-RC2
- AC2: 'Reply on RC2', Yunjiang Zhang, 06 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2776/egusphere-2024-2776-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2776-AC2
RC3:
'Comment on egusphere-2024-2776', Anonymous Referee #4, 08 Feb 2025
The current manuscript aims to address the lack of continuous data for 4 cities gap-filled black carbon (BC) data to ultimately assess the trends in this pollutant as a result of the mitigation plans enforced in China in the 2013-2023 period. The reconstruction of these measurements is conducted by means of a machine learning (ML) ensemble of techniques validated upon the existent data, providing good agreement for all sites and years. Additionally, this manuscript provides a method to estimate weather and emission contributions to the reported concentrations, hence tackling the assessment of the effectivity of the abatement actions based solely on the anthropic drivers of BC.
The reviewer agrees to publish this article under minor revisions.
Overall Feedback
The presented manuscript is outstanding regarding the implementation of machine learning in atmospheric aerosol studies while maintaining the final purpose of it, evaluating the trends of the studied pollutant as a consequence of the implemented abatement plans. This paper consists on three main blocks: i. Gap filling of BC time series; ii. Differentiation of the anthropogenic and meteorological drivers of BC evolution; iii. Trend analysis of the outcoming i., ii., outcomes to evaluate China’s pollution mitigation actions. The manuscript is in general very well-written and structured. However, I list below certain aspects which should be addressed:
The BC, EC data used in the EL models are not clearly described, neither the conversion from one to the other. Firstly, please, state for which cities you have EC, for which you have rBC, and for which you have both (Nanjing only, I assume). I see how Figure 1 shows a 1:1 slope for the presented sites and the Nanjing dataset, but it can not be like this for every site (Jeong et al., 2004, Rigler et al., 2020). Since you cannot provide a 1:1 EC-BC scatterplot for your other sites, please at least state the risk of EC, BC not being interchangeable in the rest of your sites and indicate possible consequences of that.

You mention in 2.2, 2.3 the limitations of measurements and simulations and the substantial uncertainty these could drag to the EL model. Did you consider introducing uncertainties of both measurements and models as predictors in your EL? In case they became a strong predictor, you could narrow down which instrumental errors are more problematic for your data reconstruction, and maybe you could improve the predictions if filtering them out.

Line 134. Provide some explanation on the advantages of the ridge regression or a reference.

Please provide a list of all the “meteorological and emission indicator variables” (Line 151) that you feed the model with.

The proportion of data trained vs. reconstructed is concerning. Even if you get good reconstructive metrics, I feel a bit skeptical on how extrapolating these predictions learned to other years can be an oversimplification, especially if the years to be reconstructed are anterior to the mitigation policies, as for Xuzhou, Zhenjiang. You could be missing actual significant drivers of EC that were minimized after the abatement regulations. Please, consider evaluating such long-term trends for these two last cities if you don’t have any measurements/satellite information about the previous atmospheric composition. Also, provide the correlation with CO, NOx you gave for the whole period only in the reconstructed periods in addition to the overall long-term correlation.

Figure 4. Could be the comparison between EC (EL predictions), BCC (MERRA-2), and BC (TAP) misleading the interpretation of the plots here since these are not directly exchangeable variables? Please discuss the limits of the comparability.

I see the FEA method power to discern between meteorological and anthropic emissions, I consider this is a very well-conceived approach. However, I would restrict the is to be quite near the js, ks. Training with 2013 and predicting 2022, 2023 might be unrealistic, since the validity of the fixed emission hypothesis is less robust. This is specially concerning when training is performed with the reconstructed data with no measurements to validate these years, as I mentioned two points ago. I think being conservative here and acknowledging the limitations of your datasets would make your trend evaluations more sturdy, especially since some readers might be rather ML-skeptic.

Please indicate explicitly that C_MET(i,i) is the self-prediction for the year i based on the training _i. This can be understood from the text but stating it would help the reader to understand more easily since these nomenclatures might be new for them.

In the text (lines 222-223), you provide uncertainties for these Ys. Can you please explain how did you get those uncertainties?

About Figure 3, how do you explain that the uncertainties of your methods are for almost all cells positive? If I understood properly, the negative uncertainties should be as probable as the positive ones, since |ΔANT_i,j| ~ |ΔANT_j,i|.

About Figure 3, the fact that the lower uncertainties you get are from Xuzhouu, with less measurements availability, whilts Suzhou, with higher coverage has higher uncertainty. This, for me, is reinforcing the idea that predicting over no measurement-anchors in the Xuzhou, Zhenjiang early period can lead to an oversimplification of the BC concentrations which might be comfortable for the FEA method. I find more normal that the model struggles for the actual measurement-based 2018-2019 baseline periods than that it doesn’t for the predicted 2013-2015.

Please provide the trend-estimator method you use in the methods section (is it Senn’s slope) and provide the significance estimator of your results. Do you use “seasonal” Senn slopes, so that their effect is less taken into account?

The discussion on why MERRA-2 is not properly capturing trends is very interesting (lines 244-251). Please, could you further detail which (meteorological/emissions) situations are better/worse captured by MERRA and TAP?

Table S5. Why do you think the Zhenjiang city reconstruction is significantly worse than the others.

Please, provide a short explanation on the meteorological normalization method by Grange et al., 2018.

Figure S5e-h. It seems that ~2013, emission diels were rather flat whilst they become more marked in the last years. This could be because: i. Meterological influence was underestimated for those periods; ii. Emission patterns/sources changed. Please discuss this variance.

In the last paragraph of your results section you explain the reductions of the anthropogenic emissions in the period of study, which is the objective of the paper. Could you also gieve some insights on the trend of meteorological impacts on concetrations? Do you consider that the atmospheric influence should be static over the trend or do you expect steady changes?

Technical changes
Figure 3 is a bit difficulty posed. I understand that the FEA uncertainty shown here is the Y_i of equations 10, 11, please indicate this instead of “FEA uncertainty (%)” in the colourbar.

Figure 4, please play with the transparency or the wave order of the a-d time series so that we can see when the observations actually happen in a glance.

References
Jeong, C. H., Hopke, P. K., Kim, E., & Lee, D. W. (2004). The comparison between thermal-optical transmittance elemental carbon and Aethalometer black carbon measured at multiple monitoring sites. Atmospheric Environment, 38(31), 5193-5204.
Rigler, M., Drinovec, L., Lavrič, G., Vlachou, A., Prévôt, A. S., Jaffrezo, J. L., ... & Močnik, G. (2020). The new instrument using a TC–BC (total carbon–black carbon) method for the online measurement of carbonaceous aerosols. Atmospheric Measurement Techniques, 13(8), 4333-4351.
Citation: https://doi.org/10.5194/egusphere-2024-2776-RC3
- AC3: 'Reply on RC3', Yunjiang Zhang, 06 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2776/egusphere-2024-2776-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2776-AC3

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Yunjiang Zhang on behalf of the Authors (06 Apr 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (10 Apr 2025) by Dantong Liu

AR by Yunjiang Zhang on behalf of the Authors (16 Apr 2025) Manuscript

Journal article(s) based on this preprint

15 Jul 2025

Technical note: Reconstructing missing surface aerosol elemental carbon data in long-term series with ensemble learning

Qingxiao Meng, Yunjiang Zhang, Sheng Zhong, Jie Fang, Lili Tang, Yongcai Rao, Minfeng Zhou, Jian Qiu, Xiaofeng Xu, Jean-Eudes Petit, Olivier Favez, and Xinlei Ge

Atmos. Chem. Phys., 25, 7485–7498, https://doi.org/10.5194/acp-25-7485-2025,https://doi.org/10.5194/acp-25-7485-2025, 2025

Short summary

Qingxiao Meng, Yunjiang Zhang, Sheng Zhong, Jie Fang, Lili Tang, Yongcai Rao, Minfeng Zhou, Jian Qiu, Xiaofeng Xu, Jean-Eudes Petit, Olivier Favez, and Xinlei Ge

Supplement

https://doi.org/10.5194/egusphere-2024-2776-supplement

Qingxiao Meng, Yunjiang Zhang, Sheng Zhong, Jie Fang, Lili Tang, Yongcai Rao, Minfeng Zhou, Jian Qiu, Xiaofeng Xu, Jean-Eudes Petit, Olivier Favez, and Xinlei Ge

Viewed

Total article views: 432 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
332	70	30	432	60	22	33

HTML: 332
PDF: 70
XML: 30
Total: 432
Supplement: 60
BibTeX: 22
EndNote: 33

Views and downloads (calculated since 25 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	65	12	4	81
Dec 2024	70	8	2	80
Jan 2025	47	12	5	64
Feb 2025	44	9	5	58
Mar 2025	8	8	0	16
Apr 2025	34	15	6	55
May 2025	22	3	2	27
Jun 2025	30	1	6	37
Jul 2025	12	2	0	14
Aug 2025	0
Sep 2025	0
Oct 2025	0

Cumulative views and downloads (calculated since 25 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	65	12	4	81
Dec 2024	70	8	2	80
Jan 2025	47	12	5	64
Feb 2025	44	9	5	58
Mar 2025	8	8	0	16
Apr 2025	34	15	6	55
May 2025	22	3	2	27
Jun 2025	30	1	6	37
Jul 2025	12	2	0	14
Aug 2025	0
Sep 2025	0
Oct 2025	0

Viewed (geographical distribution)

Total article views: 417 (including HTML, PDF, and XML) Thereof 417 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 14 Oct 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1622 KB)
Metadata XML

Short summary

We developed a new method to reconstruct missing elemental carbon (EC) data in four Chinese cities from 2013 to 2023. Using machine learning, we accurately filled data gaps and introduced a new approach to analyze EC trends. Our findings reveal a significant decline in EC due to stricter pollution controls, though this slowed after 2020. This study provides a versatile framework for addressing data gaps and supports strategies to reduce urban air pollution and its climate impacts.


Total:	0
HTML:	0
PDF:	0
XML:	0