the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Air quality model assessment in city plumes of Europe and East Asia
Abstract. An air quality model ensemble is used to represent the current state-of-the-art in atmospheric modeling, composed of two global forecasts and two regional simulations. The model ensemble assessment focuses on both carbonaceous aerosols, i.e. black carbon (BC) and organic aerosol (OA), and five trace gases during two aircraft campaigns of the EMeRGe (Effect of Megacities on the Transport and Transformation of Pollutants on the Regional to Global Scales) project. These campaigns, designed with similar flight plans for Europe and Asia, along with identical instrumentation, provide a unique opportunity to evaluate air quality models with a specific focus on city plumes.
The observed concentration ranges for all pollutants are reproduced by the ensemble in the various environments sampled during the EMeRGe campaigns. The evaluation of the air quality model ensemble reveals differences between the two campaigns, with carbon monoxide (CO) better reproduced in East Asia, while other studied pollutants exhibit a better agreement in Europe. These differences may be associated to the modeling of biomass burning pollution during the EMeRGe Asian campaign. However, the modeled CO generally demonstrates good agreement with observations with a correlation coefficient (R) of ≈ 0.8. For formaldehyde (HCHO), nitrogen dioxide (NO2), ozone (O3) and BC the agreement is moderate (with R ranging from 0.5 to 0.7), while for OA and SO2 the agreement is weak (with R ranging from 0.2 to 0.3).
The modeled wind speed shows very good agreement (R ≈ 0.9). This supports the use of modeled pollutant transport to identify flight legs associated with pollution originating from major population centers targeted among different flight plans. City plumes are identified using a methodology based on numerical tracer experiments, where tracers are emitted from city centers. This approach robustly localizes the different city plumes in both time and space, even after traveling several hundred kilometers. Focusing on city plumes, the fractions of high concentration are overestimated for BC, OA, HCHO, and SO2, which degrades the performance of the ensemble.
This assessment of air quality models with collocated airborne measurements provides a clear insight into the existing limitations in modeling the composition of carbonaceous aerosols and trace gases, especially in city plumes.
- Preprint
(32947 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2024-516', Anonymous Referee #1, 09 Apr 2024
Deroubaix et al. present an evaluation of an air quality model ensemble using two aircraft campaigns focusing on carbonaceous aerosols and multiple trace gases. They find that each member of the ensemble reproduces carbon monoxide reasonably well, whereas the correlation between observation and the ensemble for organic aerosols is weak. Overall, the performed analysis is, however, superficial and focuses, at last in their discussion, on the Pearson correlation coefficient only. It includes many statements that are not supported by the analysis or are of a confusing manner. The abstract includes conclusions which are not discussed anywhere in the manuscript. The manuscript needs expanded revision to ensure a comprehensive analysis in order to meet the quality standards of ACP. Further, the manuscript needs major improvements in its language and presentation. Thus, I cannot support the publication of the manuscript in ACP.
Major comments
Why is the analysis only performed for each individual ensemble member but not for the ensemble mean? The strength of a multi-model ensemble is to provide an estimate of the forecast uncertainty. Therefore, I would expect that the ensemble spread is discussed and evaluated with respect to the observation intercomparison. In the current version of the manuscript, the evaluation only considers a basic model intercomparison.
Some of the trace gases analyzed have a strong diurnal cycle. Even though the authors discuss the impact of averaging the observations by 1, 3, and 10 minutes, the model output frequency provided for at least one model is 6 hours. I suspect that the low model output frequency (6 hours) and the performed interpolation has a much stronger effect on the evaluation. How do you justify that a 6 hour output is sufficient? Why not obtain model output at a higher frequency?
The authors tend to attribute differences between the models only to the different emissions used (e.g., line 417 or 435), even though the models differ in the gas phase chemical mechanism and the representation of aerosols. Further, these statements are pure guesses and are not supported by any in depth analysis. What are the differences in the emission inventories?
The motivation of the authors to use the selected two flight campaigns bases on the similarity of the flight plans. Later in the manuscript, however, the authors state that the flights differ significantly (different seasons, different time periods spend over the ocean, etc.). This is not consistent.
Why are you only focusing on wind speed? The wind direction is equally important in order to assess transport pattern. How well does each ensemble member perform with respect to wind direction?
Line 9: This statement is confusing. It sounds as if only due to the ensemble, differences in the observed concentrations are revealed.
Line 11: I cannot find any statement about biomass burning in the main manuscript, except a general statement in the introduction.
Figure 1 is unreadable.
Line 229 to 231: This statement is confusing. I would suggest to focus on the correlation coefficient, root-mean-square error, and the standard deviation to assess the performance of each ensemble member. Here, the use of Taylor diagrams would significantly improve the value and readability of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2024-516-RC1 -
RC2: 'Comment on egusphere-2024-516', Anonymous Referee #2, 30 Apr 2024
This manuscript evaluates the performance of three models in reproducing measurements from two aircraft campaigns conducted in Europe and East Asia. The focus is on carbonaceous aerosols and five trace gases, assessing whether the models can well simulate these pollutants in city plumes. They found good agreement for CO, while weak one for organic aerosol.
This manuscript is more like a measurement-model comparison report. It does not specify the initial scientific questions it aims to address and lacks in-depth analysis. It needs to be thoroughly revised to align with the standards of ACP. Below, I outline a couple of major concerns.
- The introduction section lacks a comprehensive literature review on the current state of the models used to simulate city plumes. Which models are typically used (why did the authors choose these three models, especially the two global models, for city-plume evaluation)? How effectively have these models reproduced city plumes? What obstacles are encountered (e.g., resolution being too coarse to capture local chemical/physical conditions)? Why is evaluating these models in two distinct regions (Europe and East Asia) beneficial for a better understanding of city plume modeling? Detailed clarifications are needed to help readers grasp the scientific importance of this work. Also, these clarifications must be supported by substantial literature rather than merely instinctive judgments.
- If one of the authors’ goals is to conduct a model intercomparison, why do the authors use different emission inventories for different models (as listed in Table 1). This setting complicates the comparison, making it hard to determine the source of the differences. I encourage the authors to use consistent emission inventories.
- An introduction detailing the differences in chemical mechanisms and aerosol schemes across the models is necessary, given that this work involves model intercomparison.
- It is necessary for the authors to provide more information about their aircraft campaign measurements: how each species is observed and what the uncertainties or detection limits of each instrument are. These uncertainties should be taken into account in the model-observation comparison.
- The paper is all about presenting simple statistical metrics of model-observation comparison, without an in-depth analysis of why these metrics appear similar or differ across species and regions. Moreover, it is unclear whether many of the metric presentations are helpful (back to my 1st major comment). For example, on Page 9 Line 165, why is CAMchem-CESM2 the only model that fails to reproduce high ozone concentrations? On Page 9 Line 185, how do the authors determine whether the better agreement found in ERA5 in terms of OA is due to better meteorological input, or if it is coincidentally correct for the wrong reasons, considering the uncertainties in OA chemistry? On Page 16 Lines 300-302, how do the authors conclude that fire emissions are the reason why CO is better reproduced in Asia? There are many other similar instances throughout the paper. If the authors do believe these issues are worth discussing, they should provide deeper analysis to support their statements.
- Regarding writing and organization, the sections presenting statistical metrics could be significantly shortened, while the interpretation of these comparisons should be expanded.
Minor comments:
- Page 6 Line 123-124, what is the spatial representativeness of aircraft observations?
- Figure 3 and 5, a legend is needed.
Citation: https://doi.org/10.5194/egusphere-2024-516-RC2 -
RC3: 'Comment on egusphere-2024-516', Anonymous Referee #3, 01 May 2024
General Comments
This study compares model predictions from four models with flight data captured over Europe and Asia. A statistical analysis is performed to evaluate model-measurement agreement of BC, OA, CO, HCHO, NO2, O3, SO2, and wind speed. The paper is framed such that the model ensemble and comparison with observations will inform model development. However, the paper presents a simple listing of R-values for model-measurement comparisons, without describing properties of the models or physical systems that they aim to represent. This study will need major revisions in order to be published in ACP. Either much more analysis is required, or the paper must be reframed as a model evaluation paper. In the latter case, the paper may not be best-suited for ACP.
Specific Comments
- In line 80, the authors state that the trace gases were chosen because they are readily observable by satellites. This statement doesn’t seem to fit with the remainder of the paper, since satellite observations are not used. If satellite relevancy is a main driver of this study, then it should be mentioned in the Introduction and Conclusions.
- The authors should consider adding another column to Table 1 + another row for meteorology, so that each of the WRF simulations (FNL and ERA5) have their own columns. Since you describe 4 models in the ensemble, it would be helpful to see 4 models in the table.
- It would be easier to interpret the similarities between aerosols and gases if Figure 2b-c was part of Figure 3.
- Consider replacing Tables 2-3 with heat maps, where the axes remain the same (model vs. variable), but each square of the table is filled with a color that scales with the R-value, e.g., darker color is higher R and lighter color is lower R. This could make it easier to see differences between “good” and “bad” predictions.
- Sections 3.2.1 and 3.2.2 compare R-values (and bias and RMSE to some extent) between the Europe and Asia campaigns, but do not explain why these differences/similarities may have occurred. Could it be caused by similarities/differences in wind speed, topography, emission sources, etc? Does model configuration and design have an impact and why? Section 3.3 also comments on the ability of the models to reproduce observed concentrations but does not give possible explanations for differences. Please expand on these sections.
- The sentence spanning lines 357-359 does not make sense to me, please rephrase.
- Section 4 does not fully explain why the largest megacities use five source points of tracer gas. Does this improve plume detection (like in Figures 7-8) as opposed to only having tracer “released” from one point in the city center?
- Reiterating the same comment as above (comment #5) for Sections 4.2-4.3. Section 4.3 has a small amount of physical reasoning (e.g., increased concentrations in plumes due to higher emissions), but both of these sections could use significantly more physical insight.
- The paper focuses almost entirely on R-values. Why can this be chosen as the only (and “best”) analytic of model accuracy? Consider putting more metrics in the main text.
- Lines 417-418 states “we attribute primarily to inaccurate anthropogenic emissions rather than to the modeled chemistry or the identification of city plumes”. How did you determine this and how can you show this?
- The first sentence of the Conclusions says that this study “contributes to the improvement of air quality modeling”. However, this study applies existing models rather than making model improvements. Please rephrase.
- The last sentence of the abstract is inconsistent with what the paper presents. Because aspects of the models (chemical mechanisms, transport models, gridding, nesting, boundary conditions, etc.) and physical properties of the plumes/emissions/meteorology/chemistry/etc. are not discussed in the paper, the paper does not give a complete assessment of how the models do and do not perform well, or make suggestions on how to improve the models.
Technical Corrections
- In general, much of the phrasing is confusing and the grouping of paragraphs is unorganized. I recommend having a few more people involved in proofreading the manuscript before re-submitting.
- Line 26: “early warning of the health impacts” should be “early warning of health impacts”
- Combine the paragraph starting at line 105 with the paragraph before it.
- Remake Figure 2A so that the red P* labels do not overlap.
- Line 142: define amsl
- Table 2:
- Missing the red WRFchem ERA5 label on the left side
- In the caption, you don’t need “(left part)” in the last sentence
- Replace “- N” with “: N” so that it looks less like a minus sign
- Make it more clear in the paragraph beginning in line 320 that the tracers are “released” in the model, not in actual city plumes.
Citation: https://doi.org/10.5194/egusphere-2024-516-RC3 -
AC1: 'Comment on egusphere-2024-516', Adrien Deroubaix, 27 Sep 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-516/egusphere-2024-516-AC1-supplement.pdf
Status: closed
-
RC1: 'Comment on egusphere-2024-516', Anonymous Referee #1, 09 Apr 2024
Deroubaix et al. present an evaluation of an air quality model ensemble using two aircraft campaigns focusing on carbonaceous aerosols and multiple trace gases. They find that each member of the ensemble reproduces carbon monoxide reasonably well, whereas the correlation between observation and the ensemble for organic aerosols is weak. Overall, the performed analysis is, however, superficial and focuses, at last in their discussion, on the Pearson correlation coefficient only. It includes many statements that are not supported by the analysis or are of a confusing manner. The abstract includes conclusions which are not discussed anywhere in the manuscript. The manuscript needs expanded revision to ensure a comprehensive analysis in order to meet the quality standards of ACP. Further, the manuscript needs major improvements in its language and presentation. Thus, I cannot support the publication of the manuscript in ACP.
Major comments
Why is the analysis only performed for each individual ensemble member but not for the ensemble mean? The strength of a multi-model ensemble is to provide an estimate of the forecast uncertainty. Therefore, I would expect that the ensemble spread is discussed and evaluated with respect to the observation intercomparison. In the current version of the manuscript, the evaluation only considers a basic model intercomparison.
Some of the trace gases analyzed have a strong diurnal cycle. Even though the authors discuss the impact of averaging the observations by 1, 3, and 10 minutes, the model output frequency provided for at least one model is 6 hours. I suspect that the low model output frequency (6 hours) and the performed interpolation has a much stronger effect on the evaluation. How do you justify that a 6 hour output is sufficient? Why not obtain model output at a higher frequency?
The authors tend to attribute differences between the models only to the different emissions used (e.g., line 417 or 435), even though the models differ in the gas phase chemical mechanism and the representation of aerosols. Further, these statements are pure guesses and are not supported by any in depth analysis. What are the differences in the emission inventories?
The motivation of the authors to use the selected two flight campaigns bases on the similarity of the flight plans. Later in the manuscript, however, the authors state that the flights differ significantly (different seasons, different time periods spend over the ocean, etc.). This is not consistent.
Why are you only focusing on wind speed? The wind direction is equally important in order to assess transport pattern. How well does each ensemble member perform with respect to wind direction?
Line 9: This statement is confusing. It sounds as if only due to the ensemble, differences in the observed concentrations are revealed.
Line 11: I cannot find any statement about biomass burning in the main manuscript, except a general statement in the introduction.
Figure 1 is unreadable.
Line 229 to 231: This statement is confusing. I would suggest to focus on the correlation coefficient, root-mean-square error, and the standard deviation to assess the performance of each ensemble member. Here, the use of Taylor diagrams would significantly improve the value and readability of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2024-516-RC1 -
RC2: 'Comment on egusphere-2024-516', Anonymous Referee #2, 30 Apr 2024
This manuscript evaluates the performance of three models in reproducing measurements from two aircraft campaigns conducted in Europe and East Asia. The focus is on carbonaceous aerosols and five trace gases, assessing whether the models can well simulate these pollutants in city plumes. They found good agreement for CO, while weak one for organic aerosol.
This manuscript is more like a measurement-model comparison report. It does not specify the initial scientific questions it aims to address and lacks in-depth analysis. It needs to be thoroughly revised to align with the standards of ACP. Below, I outline a couple of major concerns.
- The introduction section lacks a comprehensive literature review on the current state of the models used to simulate city plumes. Which models are typically used (why did the authors choose these three models, especially the two global models, for city-plume evaluation)? How effectively have these models reproduced city plumes? What obstacles are encountered (e.g., resolution being too coarse to capture local chemical/physical conditions)? Why is evaluating these models in two distinct regions (Europe and East Asia) beneficial for a better understanding of city plume modeling? Detailed clarifications are needed to help readers grasp the scientific importance of this work. Also, these clarifications must be supported by substantial literature rather than merely instinctive judgments.
- If one of the authors’ goals is to conduct a model intercomparison, why do the authors use different emission inventories for different models (as listed in Table 1). This setting complicates the comparison, making it hard to determine the source of the differences. I encourage the authors to use consistent emission inventories.
- An introduction detailing the differences in chemical mechanisms and aerosol schemes across the models is necessary, given that this work involves model intercomparison.
- It is necessary for the authors to provide more information about their aircraft campaign measurements: how each species is observed and what the uncertainties or detection limits of each instrument are. These uncertainties should be taken into account in the model-observation comparison.
- The paper is all about presenting simple statistical metrics of model-observation comparison, without an in-depth analysis of why these metrics appear similar or differ across species and regions. Moreover, it is unclear whether many of the metric presentations are helpful (back to my 1st major comment). For example, on Page 9 Line 165, why is CAMchem-CESM2 the only model that fails to reproduce high ozone concentrations? On Page 9 Line 185, how do the authors determine whether the better agreement found in ERA5 in terms of OA is due to better meteorological input, or if it is coincidentally correct for the wrong reasons, considering the uncertainties in OA chemistry? On Page 16 Lines 300-302, how do the authors conclude that fire emissions are the reason why CO is better reproduced in Asia? There are many other similar instances throughout the paper. If the authors do believe these issues are worth discussing, they should provide deeper analysis to support their statements.
- Regarding writing and organization, the sections presenting statistical metrics could be significantly shortened, while the interpretation of these comparisons should be expanded.
Minor comments:
- Page 6 Line 123-124, what is the spatial representativeness of aircraft observations?
- Figure 3 and 5, a legend is needed.
Citation: https://doi.org/10.5194/egusphere-2024-516-RC2 -
RC3: 'Comment on egusphere-2024-516', Anonymous Referee #3, 01 May 2024
General Comments
This study compares model predictions from four models with flight data captured over Europe and Asia. A statistical analysis is performed to evaluate model-measurement agreement of BC, OA, CO, HCHO, NO2, O3, SO2, and wind speed. The paper is framed such that the model ensemble and comparison with observations will inform model development. However, the paper presents a simple listing of R-values for model-measurement comparisons, without describing properties of the models or physical systems that they aim to represent. This study will need major revisions in order to be published in ACP. Either much more analysis is required, or the paper must be reframed as a model evaluation paper. In the latter case, the paper may not be best-suited for ACP.
Specific Comments
- In line 80, the authors state that the trace gases were chosen because they are readily observable by satellites. This statement doesn’t seem to fit with the remainder of the paper, since satellite observations are not used. If satellite relevancy is a main driver of this study, then it should be mentioned in the Introduction and Conclusions.
- The authors should consider adding another column to Table 1 + another row for meteorology, so that each of the WRF simulations (FNL and ERA5) have their own columns. Since you describe 4 models in the ensemble, it would be helpful to see 4 models in the table.
- It would be easier to interpret the similarities between aerosols and gases if Figure 2b-c was part of Figure 3.
- Consider replacing Tables 2-3 with heat maps, where the axes remain the same (model vs. variable), but each square of the table is filled with a color that scales with the R-value, e.g., darker color is higher R and lighter color is lower R. This could make it easier to see differences between “good” and “bad” predictions.
- Sections 3.2.1 and 3.2.2 compare R-values (and bias and RMSE to some extent) between the Europe and Asia campaigns, but do not explain why these differences/similarities may have occurred. Could it be caused by similarities/differences in wind speed, topography, emission sources, etc? Does model configuration and design have an impact and why? Section 3.3 also comments on the ability of the models to reproduce observed concentrations but does not give possible explanations for differences. Please expand on these sections.
- The sentence spanning lines 357-359 does not make sense to me, please rephrase.
- Section 4 does not fully explain why the largest megacities use five source points of tracer gas. Does this improve plume detection (like in Figures 7-8) as opposed to only having tracer “released” from one point in the city center?
- Reiterating the same comment as above (comment #5) for Sections 4.2-4.3. Section 4.3 has a small amount of physical reasoning (e.g., increased concentrations in plumes due to higher emissions), but both of these sections could use significantly more physical insight.
- The paper focuses almost entirely on R-values. Why can this be chosen as the only (and “best”) analytic of model accuracy? Consider putting more metrics in the main text.
- Lines 417-418 states “we attribute primarily to inaccurate anthropogenic emissions rather than to the modeled chemistry or the identification of city plumes”. How did you determine this and how can you show this?
- The first sentence of the Conclusions says that this study “contributes to the improvement of air quality modeling”. However, this study applies existing models rather than making model improvements. Please rephrase.
- The last sentence of the abstract is inconsistent with what the paper presents. Because aspects of the models (chemical mechanisms, transport models, gridding, nesting, boundary conditions, etc.) and physical properties of the plumes/emissions/meteorology/chemistry/etc. are not discussed in the paper, the paper does not give a complete assessment of how the models do and do not perform well, or make suggestions on how to improve the models.
Technical Corrections
- In general, much of the phrasing is confusing and the grouping of paragraphs is unorganized. I recommend having a few more people involved in proofreading the manuscript before re-submitting.
- Line 26: “early warning of the health impacts” should be “early warning of health impacts”
- Combine the paragraph starting at line 105 with the paragraph before it.
- Remake Figure 2A so that the red P* labels do not overlap.
- Line 142: define amsl
- Table 2:
- Missing the red WRFchem ERA5 label on the left side
- In the caption, you don’t need “(left part)” in the last sentence
- Replace “- N” with “: N” so that it looks less like a minus sign
- Make it more clear in the paragraph beginning in line 320 that the tracers are “released” in the model, not in actual city plumes.
Citation: https://doi.org/10.5194/egusphere-2024-516-RC3 -
AC1: 'Comment on egusphere-2024-516', Adrien Deroubaix, 27 Sep 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-516/egusphere-2024-516-AC1-supplement.pdf
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
643 | 88 | 115 | 846 | 17 | 20 |
- HTML: 643
- PDF: 88
- XML: 115
- Total: 846
- BibTeX: 17
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1