the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CLAQC v1.0 – Country Level Air Quality Calculator. An Empirical Modeling Approach
Abstract. The Country Level Air Quality Calculator (CLAQC) is an open-source modeling tool that utilizes national sectoral emissions and weather data to forecast monthly and annual concentrations of fine particulate matter (PM2.5) and tropospheric ozone (O3). CLAQC leverages the recent advancements in the CAMS system, employing CAMS global gridded emissions and CAMS reanalysis pollutant concentrations to improve the accuracy of its predictions. One of the notable strengths of CLAQC is its ability to provide country-specific and sectoral information. We have developed two methodological approaches, namely elastic net modeling and extreme gradient boosting regressor, that can effectively predict annual average concentrations for nearly all countries. Although both methods show good performance for the country's yearly average, the sectoral contributions are not robust enough for the elastic net models. The tool can simulate a vast range of policy scenarios and can be integrated into national policy assessment and optimization frameworks. Finally, we present a method selection framework for each country to optimize performance, and an online tool displaying model results.
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-995', Anonymous Referee #1, 18 Jul 2024
General comment:
This article aims at proposing two methodologies for assessing impact of policy scenarios on monthly and annual pollutant concentrations (PM2.5 and O3) at country level on a global scale. The first approach is based on Elastic Net models and the second one on machine learning models (XGboost). Those methodologies used several datasets as inputs such as pollutant emissions, weather data, and concentrations data that are harmonized to a common a grid of 0.5° x 0.5° from 2003 to 2021. Results show that EN models are performing well for annual total exposure, but ML models are better for evaluating the contribution of individual sector of emissions.
I would like to congratulate the authors of the paper for their interesting work. The article is very well written and clearly structured. The scientific topic is of a great interest and the methodologies proposed in the paper are quite innovative considering the published approaches. However, some clarifications still need to be made in the text as highlighted in the specific comments concerning the consideration of secondary inorganic aerosols in the models and the composition of the “Other” sector of emissions for PM2.5 estimations. Some sensitivity tests could also be performed to address the impact of these features on the final estimation as well as the effect of the proportion of the train and test sets when applying the models.
Specific comments:
Page 1, line 33: The following change should be made: “fine particulate matter (particles with a diameter less than 2.5 µm, PM2.5)”
Page 2, line 4: The following change should be made: “chemistry-transport models (CTMs) are tools for calculating the impact of emissions on pollutant concentration levels”
Page 2, lines 7 to 10: ACT tool (Air Control Toolbox, Colette et al., 2022) should me mentioned here. ACT is a surrogate model to explore mitigation scenarios in air quality forecasts. This is designed for estimation at the European level on a daily basis.
Colette, A., Rouïl, L., Meleux, F., Lemaire, V., and Raux, B.: Air Control Toolbox (ACT_v1.0): a flexible surrogate model to explore mitigation scenarios in air quality forecasts, Geosci. Model Dev., 15, 1441–1465, https://doi.org/10.5194/gmd-15-1441-2022, 2022.
Page 2, line 19: “The latter one is the most detailed, up-to-date reduced form air pollution model”, the following clarification should be made: “on a global scale”.
Page 3, line 9: Please explain further what you mean by “factors” and the impact of the monitoring of pollutant concentrations in ambient air. Perhaps, it should be mentioned that monitoring stations are used for air quality assessment and the lack of measurement points in an area is a strong constraint in that objective.
Page 3, line 13: “Global, gridded reanalysis data combine and harmonize satellite air pollution measurements with ground-level monitors.” The following clarification should be made: CTM estimates are also used.
Page 3, lines 24 and 25: The following clarification should be made: “the need to homogenize different grids in terms of spatial resolution”
Page 4, line 24: Is the odd-road transportation corresponds to shipping and aviation? Could you please clarify.
Page 5, line 4: Are the natural emissions included in the “Other” sector? Could you please clarify.
Page 5, line 19: How did you manage to downscale the concentrations data to 0.5°? Please explain further.
Page 5, lines 20 and 21: I don’t understand why you must change the unit here. Aren't ECMWF data already in µg/m3? Please clarify.
In addition, Figure 2.3 is called in the main text, but the figure numbering is Figure 1. In order to improve reading of the figure, the ticks should be added in the two colorbars to make the correspondence with the tick labels in panels a) and b).
Figure 2 title: The expression “weighted by the population” should be mentioned in the title.
Page 7, line 25 to 30: It is a choice of simplicity to not considered secondary aerosols in the model. Have sensitivity tests carried out on the impact of this choice on the final estimation?
Page 9, line 13: Have sensitivity tests been carried out on the splitting of the training and test data sets? If not, these tests should be considered.
Page 9, line 13: It is not clear if the whole period of the dataset (from 2003 to 2021) is used to train de model? Please clarify in the main text what is the periods of the train set and the test set. It is mentioned that the perturbations of emissions are applied to the last 5 years of data (page 9 line 27), thus is the model trained from 2017 to 2021?
Page 10, line 23: If the Other sector includes Natural emissions, it could have a significant impact of PM2.5 concentrations (from desert dust and sea salt). That may probably bias the estimate of the machine learning model. This sector should be considered.
Page 11, line 7: I’m a bit confused with the consideration of the secondary inorganic aerosol’s formation in the model. It is explained in section 3.2: “It is crucial to understand that in situations where secondary reactions substantially affect the overall mass of PM2.5 within a country, our models are designed to omit these precursors from the list of predictors, thereby not reflecting a decrease in PM2.5 levels.” Please clarify if the secondary inorganic aerosols are excluded or not.
Page 11, line 29: Please clarify what you mean by: “We randomly split a gridded data set stratifying by grid cell. Hence, randomization occurs over the temporal dimension.”
Page 11, line 31: Why the splitting of the training and the test data set is different from the EN model? Same question as for the EN model, were sensitivity tests on the choice of the training / test sats carried out?
Page 12, line 2: Why not to say “Emission scenarios” instead of “Stylized scenarios” in section 4.1 title?
Page 12, line 5: Why are emission perturbations ranging to +60%, when we would expect policy scenarios to necessarily seek to reduce precursor emissions?
Page 13, line 4: Move Figure 6 in the main text. The panels of this figure are very small, and it is very difficult to read the figure correctly. It would be preferable to prepare one figure per model and per pollutant to be mor readable.
Page 13, line 26: Move figures 7 and 8 to the main text.
Page 13, line 28: Could you explain why models work better for O3 than for PM2.5?
Page 13, line 42: The following clarification should be made: DACCIWA is preferred to CAMS over Africa only. (same page 14, lines 1 and 2).
Page 13, lines 3 and 4: Move figures 9 and 10 in the main text.
Page 14, line 10: What you mean by “measurement error of unknown distribution”? Could you please clarify.
Page 14, line 39: The evolution of the approach by the consideration of an ensemble of the models is a quite good perspective of work to improve the final estimate.
Citation: https://doi.org/10.5194/egusphere-2024-995-RC1 -
AC1: 'Reply on RC1', Stefania Renna, 18 Nov 2024
We sincerely thank the Reviewer for taking the time to thoroughly review our manuscript and provide relevant feedback to improve it further. In the attachment, we address their concerns and suggestions point by point. Please kindly let us know if further improvement is needed.
-
AC1: 'Reply on RC1', Stefania Renna, 18 Nov 2024
-
RC2: 'Comment on egusphere-2024-995', Anonymous Referee #2, 25 Sep 2024
General
The authors developed the tool CLAQC to quickly assess the impact of policy scenarios on Air Quality using 2 methods: elastic net modelling (EN) and an extreme gradient boosting regressor (ML). CLAQC can be used to attribute sectoral and country specific emissions changes to changes in PM2.5 and O3 concentrations without great computation burden. It is a useful too for policy makers and other stakeholders. The authors evaluate the performance of both models on a country level and find that the model performance differs depending on country and model used, while generally both models are better at predicting O3 than PM2.5.
The paper is excellently written, the language is easy to understand and the paper well structured. The figures were unfortunately of low resolution and should be improved for publication. In several sections a more detailed explanation or discussion on top of the description of results would be useful.
Comments:
(Abbreviations used: PXX-LYY -> page xx, line YY; EQZ -> Equation Z)
P2-L28f Did you mean “secondary O3 formation and secondary PM formation”?
P2-L40ff The 2 sentences starting with “As new data” and “As new and better data” seem repetitive. Please consolidate these sentences.
P5-L7 Why did you choose TerraClimate over ERA5 for the majority of variables used? Is TerraClimate’s rain product more accurate than the one from ERA5? Since you’re aggregating all data to 0.5° x 0.5°, you don’t seem to make use of TerraClimate’s higher horizontal resolution.
P5-L23ff You are describing how a reanalysis again here when you had already described it in greater detail in the introduction to section 2 (P3-L4ff). This seems repetitive.
P7-L21 Please elaborate on the general purpose of monotonic constraints for the benefit of the reader not too familiar with that kind of modelling.
EQ5 Please define β and β0.
EQ7 Should this be ntest instead of ntest? In P9-L24 test is a sub- not a superscript.
EQ8&9 Several parameters are not defined, e.g. λ, β, ɣi, δ, μ, ν, ξ, θ. Are the α and β the same as in EQ5?
EQ8&9 In both equations emissions are used multiple times: with emissions depending on sector & pollutant (β-term), just sector (δ-term) and just pollutant (λ-term). Please clarify what the purpose of the multiple emission terms is. In the P11-L7ff you describe how the terms in EQ8&9 mimic secondary production, transport and dispersion. It would be useful for the reader to also understand what processes or dependencies the multiple emission terms are a proxy for.
EQ9 The emission terms for the O3 equations are slightly different to the ones for PM2.5. For both pollutants, there are terms depending on sector & pollutant at the same time (β-term) and then just on the sector (δ-term). In EQ9, term depending on just the pollutant (λ) is for a different pollutant (p3) than the β-term. Why do the β- and λ-term for PM2.5 (EQ8) depend on the same pollutants, but the β- and λ-terms for O3 (EQ9) do not: namely the λ-term depends on one additional pollutant (SO2).
P11-L29 You say that randomisation occurs over the temporal dimension. Does that mean that the concentration fields calculated by the ML model do not depend on the previous time step (month in this case)? Is there an initialisation of the pollutant concentrations or is the assumption essentially that the ML model can estimate the pollutant concentration of the current month based on the emissions and meteorological conditions of the current month only, without knowledge of previous atmospheric conditions and pollutant concentrations?
P13-L12f The road and residential sectors are named as having the greatest impact in DEU, ITA and BRA. Is that referring to Fig 6a? I cannot see that in the figure. Italy seems to only have impact from Agriculture. The resolution of the plot is quite low so it’s hard to see details.
P13-L28ff Please discuss why the models perform so poorly in some countries. Is it inconsistencies in either emission or concentration data for that country? Are there important mechanisms occurring in this countries that are missed by the models? Are there pollutants missing from the emission data sets that are important in those countries? Does the model perform poorly because of some of the inputs or is it something in the model that you could change to improve the performance?
P14-L1 In P13-L43f it sounds like DACCIWA is used everywhere where available, so in Africa CAMS is never used, correct? In fig 9c then, are the runs using DACCIWA actually being compared with runs using CAMS in Africa?
P14-L15 Is the CAMS reanalysis you mention here the EAC4 reanalysis you introduced in 2.3? If yes, it is confusing for the reader to refer to the same product with different names. If no, please introduce the CAMS reanalysis.
P14-L36 The sentence starting with “It is a complimentary model” is confusing:
- “A […] model to the model […] community” is repetitive. Maybe “A complimentary tool”?
- Which scenario community? The policy scenario community?
- Did you mean “providing empirically based estimates”?P14-L39 Unless there is a second paper planned describing the CLAQC tool’s functionality, it would be useful to have a short overview over the kind of scenarios that can be run. I.e. is the 60% perturbation fixed or can the user have some control over the scenario selection (apart from country, model, specification, etc used).
P14-L44 Link is broken. Is the code embargoed until the paper is published?
Figures
Fig 1 Please include more labels for the colour scale in b).
Fig 6a In the plots for BRA, NGA, SAU and TUR there are line plots instead of filled areas for some of the sectors. Is that a plotting error or does that signify something?
Fig 7 It is difficult to see which countries are below 0.5 with a continuous colour scale. Maybe include a colour break at 0.5?
Fig 8 Similar to Fig 7, it is not possible to see the cut-off of 12 with the colour scale used.
Fig 9 “with” should be abbreviated with “w” or “w/” not “w/t”.
Fig 9 To make the best performing model “group” (EN vs ML) more obvious, you could use one hue per model group, i.e. all EN models in shades of blue and all ML models in shades of red.
There are some small inconsistencies in notation the authors may want to address, e.g.
P7-L24ff PM2.5 not subscripted for some occurrences in this paragraph.
P2-L31 O3 is cursive here but nowhere else.
L39 Remove gap between T and g for Tg.
P10 in description for TMINt and TMAXt, use °C, not degC just as elsewhere in the text
Citation: https://doi.org/10.5194/egusphere-2024-995-RC2 -
AC2: 'Reply on RC2', Stefania Renna, 18 Nov 2024
We sincerely thank the Reviewer for taking the time to thoroughly review our manuscript and provide relevant feedback to improve it further. In the attachment, we address their concerns and suggestions point by point. Please kindly let us know if further improvement is needed.
-
AC2: 'Reply on RC2', Stefania Renna, 18 Nov 2024
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
222 | 0 | 0 | 222 | 0 | 0 |
- HTML: 222
- PDF: 0
- XML: 0
- Total: 222
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1