Operational and Probabilistic Evaluation of AQMEII-4 Regional Scale Ozone Dry Deposition. Time to Harmonise Our LULC Masks

Kioutsioukis, Ioannis; Hogrefe, Christian; Makar, Paul A.; Alyuz, Ummugulsun; Bash, Jessy O.; Bellasio, Roberto; Bianconi, Roberto; Buttler, Tim; Clifton, Olivia E.; Cheung, Philippe; Hodzic, Alma; Kranenburg, Richard; Lupascu, Aurelia; Momoh, Kester; Perez-Camaño, Juan Luis; Pleim, John; Ryu, Young-Hee; San Jose, Robero; Schwede, Donna; Sokhi, Ranjeet; Galmarini, Stefano

doi:https://doi.org/10.5194/egusphere-2025-1091

Preprints

https://doi.org/10.5194/egusphere-2025-1091

Preprints

19 Mar 2025

| 19 Mar 2025

Operational and Probabilistic Evaluation of AQMEII-4 Regional Scale Ozone Dry Deposition. Time to Harmonise Our LULC Masks

Ioannis Kioutsioukis, Christian Hogrefe, Paul A. Makar, Ummugulsun Alyuz, Jessy O. Bash, Roberto Bellasio, Roberto Bianconi, Tim Buttler, Olivia E. Clifton, Philippe Cheung, Alma Hodzic, Richard Kranenburg, Aurelia Lupascu, Kester Momoh, Juan Luis Perez-Camaño, John Pleim, Young-Hee Ryu, Robero San Jose, Donna Schwede, Ranjeet Sokhi, and Stefano Galmarini

Abstract. We present the collective evaluation of the regional scale models that took part in the fourth edition of the Air Quality Model Evaluation International Initiative (AQMEII). The activity consists of the evaluation and intercomparison of regional scale air quality models run over North American (NA) and European (EU) domains in 2016 (NA) and 2010 (EU). The focus of the paper is ozone deposition. The collective consists in an operational evaluation (Dennis et al., 2010, namely a direct comparison of model-simulated predictions with monitoring data aiming at assessing model performance. Following the AQMEII protocol and Dennis et al. (2010), we also perform a probabilistic evaluation in the form of ensemble analyses and an introductory diagnostic evaluation. The latter, analyses the role of dry deposition in comparison with dynamic and radiative processes and land-use/land-cover types (LULC), in determining surface ozone variability. Important differences are found across deposition results when the same LULC is considered. Models use very different LULC masks, thus introducing an additional level of diversity in the model results. The study stresses that, as for other kinds of prior and problem-defining information (emissions, topography or land-water masks), the choice of a LULC mask should not be at modeller’s discretion. Furthermore, LULC should be considered as variable to be evaluated in any future model intercomparison, unless set as common input information. The differences in LULC selection can have a substantial impact on model results, making the task of evaluating deposition modules across different regional-scale models very difficult.

How to cite. Kioutsioukis, I., Hogrefe, C., Makar, P. A., Alyuz, U., Bash, J. O., Bellasio, R., Bianconi, R., Buttler, T., Clifton, O. E., Cheung, P., Hodzic, A., Kranenburg, R., Lupascu, A., Momoh, K., Perez-Camaño, J. L., Pleim, J., Ryu, Y.-H., San Jose, R., Schwede, D., Sokhi, R., and Galmarini, S.: Operational and Probabilistic Evaluation of AQMEII-4 Regional Scale Ozone Dry Deposition. Time to Harmonise Our LULC Masks, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2025-1091, 2025.

Received: 07 Mar 2025 – Discussion started: 19 Mar 2025

Competing interests: At least one of the (co-)authors is a member of the editorial board of Atmospheric Chemistry and Physics. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2161 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2161 KB)

Supplement (878 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

17 Oct 2025

Operational, diagnostic, and probabilistic evaluation of AQMEII-4 regional-scale ozone dry deposition: time to harmonize our LULC masks

Ioannis Kioutsioukis, Christian Hogrefe, Paul A. Makar, Ummugulsum Alyuz, Jesse O. Bash, Roberto Bellasio, Roberto Bianconi, Tim Butler, Olivia E. Clifton, Philip Cheung, Alma Hodzic, Richard Kranenburg, Aura Lupascu, Kester Momoh, Juan Luis Perez-Camaño, Jonathan Pleim, Young-Hee Ryu, Roberto San Jose, Donna Schwede, Ranjeet Sokhi, and Stefano Galmarini

Atmos. Chem. Phys., 25, 12923–12953, https://doi.org/10.5194/acp-25-12923-2025,https://doi.org/10.5194/acp-25-12923-2025, 2025

Short summary

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1091', Anonymous Referee #1, 10 May 2025

The manuscript provides a comprehensive operational evaluation of 12 regional-scale air quality models over continental North America and Europe participating in the AQMEII4 initiative, with a focused assessment on ozone deposition processes. The comparative approach across differing land-use types and model parameterizations is well-structured and effectively demonstrates the wide variability in model behavior, particularly regarding the contributions of different deposition pathways.
One of the paper’s most valuable contributions is its insight into the limitations of evaluating deposition processes solely at standard ozone monitoring sites. The authors argue that such sights are often poorly suited for deposition analysis and thus recommend the inclusion or designation of monitoring stations specifically targeting high-deposition land-use types. In addition, the paper raises a significant concern about the inconsistent LULC data across models. The call for the harmonization of LULC inputs is compelling and well supported by the evaluation presented.
However, the manuscript would benefit from clarifying some minor methodological choices. For example, a more detailed explanation of the selection criteria for the simulation years, emissions inventories used, resolution, would improve the paper.
Overall, the paper makes a strong case for methodological improvements in model inter-comparison and deposition process evaluation, and its conclusions are well-supported. I support the publication of this paper with minor edits that will further improve the quality and readability of the manuscript.

Specific Comments:
Line 23: The abstract is missing information on why evaluation of ozone deposition is important for models/air quality.
Lines 48 – 52: The introduction would benefit from a brief description of the AQMEII4 activity and importance of deposition evaluation for this study.
Line 71: LULC acronym is introduced here without being defined.
Line 80: It is not clear why the NA and EU models are chosen for the years 2016 and 2010 for the evaluation. The text should make clearer why these years are considered for the analyses.
Table 1: Missing specifics on the model resolutions. NA and EU are described as regional scale models covering general domains but the specifics on model domain and resolution could be including here and within the text as well. Adding the model versions here might be helpful as well since this distinction is mentioned later in the text (i.e., Line 187).
Lines 83-88: It is not clear in the text which emissions are using for anthropogenic, fire, etc. missing relevant citations here.
Line 135: As this section is long, it may be useful to split this section into further subsections that discuss the results from the NA models, EU models, and comparisons.
Lines 206-210: Reference?
Line 250: It might be useful to discuss the seasonal and diurnal cycles underneath a separate subsection for clarity.
Line 320 – 324: The conclusions could be more clearly stated here. This section should be broken up into multiple sentences.
Lines 383 – 386: The phrasing here is unclear and should be restructured accordingly.
Line 688 – 692: Please reword for clarity.

Technical Corrections:
Figure 1: Images are blurry. The color bar is missing units to denote differences between the numerical amounts shown. Regions (i.e., R1, R2, R3) should be defined in the figure caption.
Figure 2: The figure should have a title or color bar label indicating that RMSE is what is being shown. The image resolution is poor.
Figure 3: Same as above, labeling the figure as MB or adding a color bar label.
Figure 4/5: Same as above. Images have poor resolution.
Figure 6: Resolution is low quality. Titles for (a) and (b) would help make this figure more readable.
Figure 7/Figure 8: Image key should be placed outside of the figure for readability.
Figure 14/15: In the caption, the corresponding figure labels for wind speed, PBL height, solar radiation, and deposition velocity are missing.
Figure 17: Figure labels could be larger and /or boldened.

Citation: https://doi.org/10.5194/egusphere-2025-1091-RC1
- AC3: 'Reply on RC1', Stefano Galmarini, 09 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1091/egusphere-2025-1091-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1091-AC3
RC2:
'Comment on egusphere-2025-1091', Anonymous Referee #2, 02 Jun 2025

“Operational and Probabilistic Evaluation of AQMEII-4 Regional Scale Ozone Dry Deposition. Time to Harmonise Our LULC Masks” by Ioannis Kioutsioukis et al.

General Comments
This manuscript is part of a suite of papers that is being prepared under the umbrella of the AQMEII Phase 4 project. It presents results from a regional-level evaluation of model predictions of both ozone concentration and ozone deposition flux fields for both North America and Europe. This evaluation, which falls under Activity 1 of AQMEI?I4, includes a traditional operational evaluation but also probabilistic and diagnostic evaluations.
This manuscript shows how apparently similar grid-scale behavior between models may hide different emphases in process representation between models. This is quite striking since all of the models considered in the manuscript essentially follow the same conceptual paradigm to represent the process of gas-phase dry deposition. The manuscript also shows how interconnected different processes and model components are and how difficult it is to point to a single model shortcoming as being responsible for model errors. In the case of different versions of WRF-Chem this includes the impact of slightly different implementations of the same algorithm. The manuscript also suggests that compensating errors are present in the model predictions, which consist of some errors that have been documented in the case of dry deposition by other AQMEII4 evaluations but also others that are unknown at the present time. Lastly, it is shown that one apparently straightforward input field needed by all of the models, the classification of land-surface characteristics, is the source of considerable additional model variability.
I found this manuscript to be well-structured and reasonably well-written. It should be suitable for publication in ACP after a number of revisions. In its present form, however, it has a number of "rough edges", including some awkward language and an apparent reordering of figures before submission that led to inconsistencies in referencing them in both the text and in figure captions. To this end I have made a number of specific comments and suggestions below that I believe would improve the final version and that I hope the authors will consider.
One general comment about the figures, especially those for North America. I found it difficult to identify results for individual models. It would be helpful if a variety of line types (e.g., solid, long dash, short dash) and symbol types could be used for these figures in addition to different colors.

Specific Comments
1. In the description of the surface ozone measurement data that were used in this study in Section 2 (lines 100-110), no information is provided about any data filtering that was applied before use, including how roadside monitors were treated. For example, Solazzo et al. (2012b) noted that in the AQMEII-1 operational evaluation they only used ozone measurement data from rural receptors below an altitude of 1000 m with at least 75% annual data availability. The present manuscript also notes better performance statistics in general for the EU case vs. the NA case, but this difference might be influenced by differences in characteristics of the measurement data sets that were used for the evaluation. For example, ozone statistics are typically quite different for urban stations vs. rural stations. Were the urban-rural splits for the EU ozone data set and the NA ozone data set similar?
2. The text introducing Figures 2 to 10 at the beginning of Section 3.1 (lines 136-139) seems inconsistent and out of order. For example, Figures 2-5 show station-level RMSE and MB plots but not Figure 6, Figure 10 does not show box plots, and there is no reference to domain-level monthly and diurnal time series.
3. I found the discussion in lines 182-197 of the similarities and differences between the three WRF-Chem versions that were run for North America a bit hard to follow. First, there is a reference to Figure 5, which shows results for the European simulations. Then the three different versions of WRF-Chem that were used for North American simulations are described, but this is followed by references to "the former" and "the latter", implying that only two model versions are being discussed. Then the discussion turns to "relatively minor differences" and "larger differences" without stating differences in which quantity: model configuration choice or RMSE or MB or something else?
4. The discussion of Figure 4 (lines 198-212) states that model EU3 has the best results while EU1 and EU4 have less good results north of the Mediterranean area. To my eye, though, model EU2 had worse RMSE scores than the other three models over Germany, Poland, and Hungary, and it also has the highest median RMSE value in Fig. 6,b but EU2 is left out of the discussion. Am I misinterpreting these figures?

- very late in the paper another significant difference between NA6/NA8 and NA7 is mentioned (lines 741-743)
5. There are multiple discussions in the manuscript concerning the differences between two of the models used for the North American simulations: GEM-MACH(Base) [or NA3] and GEM-MACH(Ops) [or NA5] (see l. 177-181, l. 222-232, l. 269-278, l. 294-297, l. 411-413, l. 612-619, and l. 745-748). These discussions are somewhat hard to follow because the differences are mentioned incrementally wth wide gaps between mentions: (a) use/non-use of a canopy shading and turbulence scheme (l. 179) and different treatments of area-source emissions injection (l. 181); (b) use/non-use of a vehicle-induced turbulence scheme (l. 271); (c) full feedback vs. no feedback of meteorology on chemistry (l. 412); and (d) different treatments of seasonality of LAI (l. 747). These differences are then summed up on line 616, where it is stated that GEM-MACH(Base) uses "very different physical parameterizations than" GEM-MACH(Ops). This seems like an overstatement; isn't it more of a case that GEM-MACH(Base) uses two additional physical parameterizations compared to GEM-MACH(Ops)? And despite these differences, O3, NO, and NO2 predictions made by these two model versions have very similar scores (e.g., Figs. 2, 3, 6, 7, 9, S1, S2, S3, S5). It thus appears somewhat unbalanced that there is considerable discussion of GEM-MACH(Base) and its forest canopy and vehicle-induced turbulence (VIT) parameterizations, even though GEM-MACH(Ops), which does not employ either the canopy or VIT parameterizations, had comparable scores and was the most frequently included member of high-performing ensembles for NA (Table 2).
6. I found it striking how the observed monthly O3 profile differs in NA in the autumn between regions R1 and R2-R4 (Figure 9). This is not true in EU (Figure 10), where the observed monthly O3 profiles are similar across the four regions (and also with NA R1). The discussion of Figure 9 (lines 337-349) notes this behavior indirectly, but refers to the model overpredictions rather than the sharp reduction in observed concentration. Note too that Figure S3 is not referred to in the manuscript unlike other figures in the Supplement, but there are similarly significant regional differences in monthly NO profiles for the NA1 vs. NA2-NA4 subregions but not the EU1-EU4 subregions. And for the sentence that begins "The same behavior observed" (lines 347-348), should a parenthetical "(not shown)" be appended to this sentence since I don't think there are supporting figures provided.
7. I think this manuscript only presents quantitative analyses of ozone concentrations and ozone dry deposition flux. Can lines 494-495 be reworded to remove the reference in line 494 to "deposition velocity performance for NA3, NA5 here"?
8. In Section 4 the number of possible ensembles in line 472 for NA (254) and in line 498 for EU (18) don't seem correct. Based on the sum of rows of Pascal's triangle, shouldn't these numbers be 255 and 15? Also, in line 499 since we are excluding 4C1 (i.e., 4C2 + 4C3 + 4C4) it should say "Four out of the 11 combinations of ...". And for Table 3 the color scheme appears to be inconsistent with Table 2 -- shouldn't three columns (one 2nd order, one 3rd order, and the 4th order) be colored orange?
9. Would it be possible to recheck Eqns. 2b and 3 in Section 5.1? Just quickly looking at Eqns. 1 to 3 as a whole, my sense was that f/2 should be e/2 in Eqn. 2b and [a] should be [c] in Eqn. 3, but I stand to be corrected. And to help the reader, line 565 could be expanded to state something like "where the shared fraction of variation explained by X1 and X2 is (similarly for e and f)".
10. Section 5 has two parts, a subsection on ozone deposition flux variability and a subsection on ozone concentration variability. However, the text frequently refers just to "ozone flux" and "ozone variability". Using more exact terminology would be helpful to the reader in Sections 5 and 6. A similar comment applies to the use of "deposition" instead of "dry deposition" throughout the manuscript.
11. I have two minor concerns about the discussion of Figure 12 on pages 20-21. First, the comment that "the picture changes completely from NA" seems too strong to me, especially since the discussion that follows is much more nuanced. My sense is that the WRF-Chem results in Figure 12 (EU1, EU2) are in fact broadly similar to those in Figure 11 (NA6-NA8). The one big qualitative difference is the very small contribution from stomatal effective flux in the three winter months for EU1 and EU2 (which is different from NA6). My second minor concern is about terminology. The term "meteorological driver" generally suggests a numerical weather prediction model like WRF whose inputs are fed to a chemical transport model. Wouldn't "driving meteorology" or just "meteorology" be better here? I do wonder too whether the implementation of the seasonal dependence of stomatal conductance for these European simulations isn't also a contributing factor (cf. line 213), especially when compared to the NA6-NA8 results in Figure 11?
12. Following the discussion of Figure 17 in lines 789-800, there is no discussion about the corresponding LULC statistics for Europe that are presented in Figure S10. Moreover, Figure S10 is incomplete -- either the top or bottom panel is missing, and it would also be helpful to have panel labels similar to Figure 17. In addition, the Abstract mentions an "introductory diagnostic evaluation" but Section 5 does not mention this aspect.
13. One important finding noted on page 27 of the Conclusions (line 848) is that the predominant LULC types at ozone receptor locations are {\vf LULC types for which deposition is relatively low}". This is a second factor to explain the similarities in VP of ozone concentration variability shown in Figures 15b and 16, but this finding does not appear to be mentioned in Section 5.2 in the discussion of Figures 17 and S10.
14. One important finding noted by the authors in Section 3.1 (lines 278-294) that in my opinion represents an important conclusion for the entire AQMEII4 project is not mentioned at all in the Conclusion section. This is that the two models whose ozone concentration predictions for North America were found by the operational evaluation to have the highest skill were previously found by Activity 2 of AQMEII4 to have dry deposition modules with larger errors than other models, thus suggesting that their grid-scale performance was almost certainly due to error compensation between now-known errors in their dry deposition module and compensating errors in one or more process representations and that the other models applied for North America likely also had these same unknown errors, which resulted in their less good operational performance for ozone concentration due to their smaller errors for representation of dry deposition.
15. The manuscript title mentions both operational and probabilistic evaluations, the Abstract mentions operational, probabilistic, and diagnostic evaluations, but the Conclusions section only mentions the operational evaluation directly. Perhaps findings for the other two evaluation types could be added in the Conclusions section and "deposition-focused model evaluation" could be referred to in this section as a type of diagnostic evaluation.
16. The References section needs some attention:

- Hogrefe et al. (2023) is not cited in the manuscript but there is a reference;

- Kioutsioukis et al. (2014) is cited three times in the manuscript but there is no matching reference; however, Kioutsioukis and Galmarini (2014) is not cited at all but there is a reference;

- Solazzo et al. (2015) is cited two times in the manuscript but there is no matching reference; however, Solazzo et al. (2013) is not cited at all but there are two references for Solazzo et al. (2013);

- The Makar et al. (2024) reference should be updated to Makar et al. (2025) (https://acp.copernicus.org/articles/25/3049/2025/);

- Would the Campbell et al. (2022) journal publication be a better reference than the Campbell et al. (2021) conference presentation?

[Campbell, P. C., Tang, Y., Lee, P., Baker, B., Tong, D., Saylor, R., Stein, A., Huang, J., Huang, H.-C., Strobach, E., McQueen, J., Pan, L., Stajner, I., Sims, J., Tirado-Delgado, J., Jung, Y., Yang, F., Spero, T. L., and Gilliam, R. C.: Development and evaluation of an advanced National Air Quality Forecasting Capability using the NOAA Global Forecast System version 16, Geosci. Model Dev., 15, 3281–3313, https://doi.org/10.5194/gmd-15-3281-2022, 2022.]
17. The "Contributions" section (lines 880-886) also needs some attention. First, in what ways did KM and JP contribute? Second, should "WRF-Chem(IASS)" be "WRF-Chem(RIFS)" and should YHC be YHR? Perhaps you could also refer here to "WRF/CMAQ-M3Dry" and "WRF/CMAQ-STAGE" to be consistent with Table 1.

Technical and Editorial Corrections/Suggestions
p. 1, l. 27 \ \ Perhaps "The collective evaluation begins with an operational evaluation, namely a direct comparison of model-simulated predictions with monitoring data aiming at assessing model performance (Dennis et al., 2010)."
p. 1, l. 37 \ \ "... as a variable to be ..."
p. 2, l. 50, 52 \ \ Perhaps "dry deposition process modelling" and "standalone dry deposition modules" -- Galmarini et al. (2021) mentions wet deposition as a focus of AQMEII4 but that doesn't seem relevant for this manuscript.
p. 2, l. 63 \ \ Perhaps "The operational evaluation also provides important context ..."
p. 2, l. 72 \ \ Perhaps "... of modelled ozone dry deposition fluxes and velocities can be found ..."
p. 3, l. 108 \ \ Perhaps "For the European case the monitoring network databases employed included:"
p. 4, l. 117 \ \ Perhaps "... and the yearly average measured ozone at these sites"
p. 4, l. 123 \ \ Re "greater activity density" - I am not sure what is meant here by activity density
p. 4, l. 127 \ \ "... less detail, and ..."
p. 5, l. 147 \ \ Change "from the Figure 2-5" to "from Figure 2"
p. 5, l. 158 \ \ Delete "(see also Figure 3" -- superfluous?
p. 5, l. 168-173 \ \ Very long sentence?
p. 7, l. 224 \ \ Wang et al. (2025)?
p. 7, l. 233 \ \ Normalized mean bias?
p. 7, l. 235 \ \ Instead of "notice", would "note" be more appropriate? Same comment for lines 582, 646, 726, and 801.
p. 8, l. 238 \ \ Perhaps "... have ozone bias values closest to zero, followed ..."
p. 8, l. 240 \ \ From inspection of Fig. S1 I see three models with medium NRMSE values and only one with a high value.
p. 8, l. 250 \ \ Perhaps "... observed and modelled seasonal and diurnal cycles for North America for ozone, NO and NO2"
p. 9, l. 277 \ \ Perhaps "transported upwards away from the model surface"
p. 10, l. 314-315 \ \ The fact that models with smallest NO and NO2 biases do quite well for NO and NO2 shouldn't be surprising.
p. 11, l. 334 \ \ "... confidence in mobile and stack emissions, which ..."
p. 11, l. 338 \ \ I think reference here should be to Figures 9 and 10 rather than Figures 5 and 6.
p. 11, l. 340 \ \ "regions"
p. 11, l. 351-352 \ \ It is probably obvious from manuscript context but why not insert the word "dry" here, as in "ozone dry deposition fluxes"?
p. 11, l. 355 \ \ Perhaps "is not only due to these resistances but also"
p. 12, l. 368 \ \ "in which"
p. 12, l. 374-375 \ \ How many EU grid cells for "Deciduous Broadleaf Forest", and which continent for "Mixed Forest" and "Urban" values and what about the other continent?
p. 12, l. 380 \ \ Are 6130 cells or 6108 cells really "very few"? Perhaps "relatively few".
p. 13, l. 404-406 \ \ Double negative: keep "not" and change "neither ... nor" to "either ... or"
p. 13, l. 417 \ \ It is not clear to me what the "rule of model differences" is.
p. 14, l. 426-427 \ \ "but sometimes there is contributing seasonality in non-stomatal flux" -- awkward wording
p. 18, l. 561 \ \ "in this case ozone deposition flux"
p. 19, l. 590 \ \ Perhaps "... show that different models have very different ..."
p. 19, l. 596 \ \ Perhaps "... ozone flux variability is more equally distributed across ..."
p. 20, l. 619 \ \ Is "vehicles on highways" strictly true? From a quick perusal of Makar et al. (2021) it appears that the parameterization is based on VKT on roadways in general, not just highways.
p. 20, l. 629 \ \ "WRF-Chem"
p. 21, l. 638 \ \ Change "S8-S10" to "S7-S9"
p. 21, l. 645 \ \ Change "9" to "13"
p. 21, l. 649-650 \ \ Change "8" to "12"
p. 21, l. 650, 788 \ \ Perhaps change "back-to-back to" to "side-by-side with"
p. 21, l. 653 \ \ Perhaps "... stomatal flux, though disagreeing on the exact ..."
p. 24, l. 731 \ \ "solar radiation"
p. 24, l. 736 \ \ "wind speed"
p. 24, l. 737 \ \ "... though contributing on average 30% of the resolved varibility"?
p. 24, l. 738-739 \ \ Why this particular order of models: NA2, NA6, NA7, NA4, NA1?
p. 25, l. 764-765 \ \ "emissions variability" would be better
p. 25, l. 762, 767 \ \ "Interesting is the ..." -- awkward wording
p. 25, l. 778-779 \ \ Change "12" to "16" and "11b" to "15b"
p. 25, l. 782-783 \ \ "NA1, NA2, NA3, and NA5" would be more consistent with rest of manuscript. Should NA8 be included in this list?
p. 25, l. 784-785 \ \ "... contributor to ozone concentration variability at receptor locations"
p. 25, l. 787 \ \ Change "12b" to "15b"
p. 26, l. 802-804 \ \ To support this statement, could the following addition be made: "..., in conditions of uniform LU characteristics and dominance of urban and Planted/Cultivated LULC types, as shown in Figures 15b and 16 the models tend to produce comparable results in terms of contributors to ozone variability"
p. 26, l. 817-818 \ \ "... performance of dry deposition schemes ...", "Ozone dry deposition, in particular, ..."
p. 27, l. 821 \ \ One EU model has NRMSE for NO2 of 35% (Fig. S1)
p. 27, l. 834 \ \ "over North America"
p. 32, l. 988 \ \ For Hogrefe et al. (2025) reference, change "2005" to "2025".
p. 34, l. 1056 \ \ Separate these two references.
p. 38, l. 1131 \ \ Should "orange" be "yellow"?
p. 50, l. 1192 \ \ "columns"
p. 52, l. 1210 \ \ "Same as Fig. 11 but at the locations of ..."?
p. 52, l. 1196-1197 \ \ Is this sentence necessary?
p. 55, l. 1226 \ \ Should be "Same as Fig. 14 but at ..."
Figs. 1-4 captions \ \ Add units of MB and RMSE to captions.
Figs. 7-8 captions \ \ Should note that time units are local time.
Fig. 9 vs. Fig. 10 \ \ Labels of former are smaller and harder to read
Fig. S1 caption \\\ "Figure S1: Soccer plot diagrams of O3, NO2, NO2 and NO." would better reflect the figure layout.
Fig. S3 \ \ There seem to be two NO panels for the NA case.
Figs. S3 and S4 \ \ Could the line thicknesses in the legends be increased so that the line color is easier to see?
Figs. S5 and S6 \ \ What are units of O3FLX? Perhaps "Monthly ozone dry deposition flux". The resolution of these figures is too low -- it is very difficult to see the line colors of the legend.

Citation: https://doi.org/10.5194/egusphere-2025-1091-RC2
- AC2: 'Reply on RC2', Stefano Galmarini, 09 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1091/egusphere-2025-1091-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1091-AC2
RC3:
'Comment on egusphere-2025-1091', Anonymous Referee #3, 12 Jun 2025

This manuscript describes an evaluation of the ozone dry deposition schemes used in regional-scale air quality models. This activity was conducted by an international team as a component of the AQMEII project. The participants ran 12 regional models on North American and European domains for several different years. They present the results of model comparisons with ozone observations and present insights into the underlying physical processes and how they are represented in the models, including multiple model configurations. The models were characterized by their performance and the differences attributed to specific processes such as dry deposition modules, meteorological drivers, and model configurations. The findings are the basis for recommendations to harmonize LULC data and establish LULC-specific monitoring sites to improve air quality models. These are important points and the manuscript is appropriate for this journal. I recommend it be published after considering the following comments:
This authors indicate that the paper is focused on an evaluation of model estimates of ozone deposition but the main activity is comparing ozone concentrations (not deposition) and considering the various controlling processes. However, the paper is part of the broader AQMEII effort that includes companion papers that cover related model components, such as Clifton et al. 2023 that already describe model evaluations with direct measurements of ozone deposition and other papers on other trace gases, etc. so this manuscript ties this to model ozone concentration estimates. It would be helpful if this were explained early in the manuscript along with a summary of the results of the other papers and how it relates to this manuscript.
The paper mentions differences in gas-phase mechanisms but it is not clear how they interact with and influence deposition processes. A more in-depth discussion on the importance of these chemical scheme differences would be useful.
A major finding of this manuscript is that LULC is important. The authors point out the model implications of this (that LULC data should be accurate and consistent for all models) but they do not discuss the implications regarding the importance of different LULC (e.g., urban green spaces) for air pollution control. Recent papers suggest that urban green spaces may not be an efficient abatement measure for air pollution (e.g, Venter et al. 2024, doi.org/10.1073/pnas.230620012). The authors should consider whether their results provide any insights on this.
The manuscript emphasizes the importance of having the correct LULC but doesn’t consider whether any of the current LULC schemes are adequate for characterizing ozone deposition. For example, are the ozone uptake capabilities of all evergreen needleleaf trees the same? If there is significant variability within a given LULC type, do these LULC schemes need to be modified to represent these differences?
The authors focus on evergreen needleleaf forests and briefly present results for other LULC types shown in the supplement. It would be useful to have more discussion of these other LULC types to show their differences/similarities. Even though there are fewer representative sites, it could still show the importance of differences in LULC types such as the range of ozone uptake capabilities.
I recognize that different ozone units (ppb, ug/m3) are typically used in Europe and North America but for this exercise it would be better to be consistent and just use one. At least explain the rational if you don’t want to do this.
Why were those specific years chosen and why are they different in NA and Europe?
The criteria for "optimal" ensembles are based on minimizing RMSE, which does not capture all aspects of model skill, especially the ability to reproduce the maximum values that are a concern for air quality managers. There should be some discussion of the implications of this.
Some figures, such as Figure 6, are difficult to see. Others, especially those displaying multiple model results (e.g., Figure 11), are challenging to interpret due to the amount of information. I appreciate the attempt to get all the information in one figure but perhaps clearer differentiation could enhance readability.
Why do forest canopy shading effects increase NOx? (see Line 270)

Citation: https://doi.org/10.5194/egusphere-2025-1091-RC3
- AC1: 'Reply on RC3', Stefano Galmarini, 09 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1091/egusphere-2025-1091-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1091-AC1
RC4:
'Comment on egusphere-2025-1091', Anonymous Referee #4, 16 Jun 2025

This manuscript presents a comprehensive operational and probabilistic evaluation of regional-scale air quality models participating in the AQMEII Phase 4 project. This work studies ozone dry deposition while focusing on the continents of North America and Europe. This study analyzes model ensemble performance, depositional flux pathways, and highlights significant variability stemming from inconsistencies in LULC masks. The authors emphasize the need to harmonize LULC inputs for comparability and accuracy in model intercomparisons. This manuscript is well structured, and it offers operational evaluation using techniques like RMSE, MB, ensemble skill assessment, and pathway-level flux decomposition by variation partitioning.

This is a well-written, impactful, and methodologically robust work, and it presents novel insights into the effect of land surface characterization on ozone deposition. This work can be accepted with minor revision, clarifying several points mentioned in the detailed comments below:

Detailed comments

While the manuscript presents a detailed and robust analysis highlighting the inconsistencies in the LULC masks and their implications for ozone dry deposition modeling, it has become extremely long and dense. Several sections, particularly operational evaluation, are notably verbose, with extensive model-by-model commentary that could be streamlined. I recommend that authors condense or relocate some of these detailed discussions to the supplements, especially where the text reiterates statistical patterns already evident in the figures. Additionally, the manuscript would benefit from clearer synthesis or summary paragraphs at the end of each major section to reinforce the key takeaways and guide the reader through the analysis.

Line 83-88: What are the emission inventories used for lightning NOx, forest fires, biogenic emissions, and other natural sources? Please provide a brief discussion on the internal model processing of these emissions.

Line 89-90: Please provide a reasoning behind choosing the spatial and temporal domains for this study. What are the spatial and temporal resolutions used in the model runs?

Line 136-139: Ozone values in the figures are reported in ppb over North America, while in µg/m3 over Europe. The use of these two different units for ozone concentration complicates the direct comparison. While the authors attempted to align the color scales, I recommend standardizing the units across both regions for the convenience of readers and easier comparisons.

Line 161: Figure 4 presents results for Europe, which is incorrectly referred to as showing results for North America. I recommend that authors carefully review the entire manuscript to identify and correct such figure and section reference inconsistencies. While minor, these errors can significantly impact the clarity and interpretation of the results and may confuse readers.

Line 190-194: The authors hypothesize that the relatively minor difference between WRF-Chem (UPM) and WRF-Chem (UCAR) is primarily due to the difference in gas-phase chemistry mechanisms. However, this claim is made without presenting supporting analysis or citing a reference that explicitly evaluated the gas-phase chemical mechanisms used in these two configurations. I recommend that the authors provide a reference to a past work supporting this hypothesis or rephrase this discussion as a hypothesis.

Line 308-311: What are the factors driving the underestimation of wintertime NOx?

Line 390-393: Are there any implications of combining LCAN and soil for models that distinguish these two terms?

Line 529-534: The authors identify factors that are expected to be relevant in the determination of ozone concentration variability at the surface but fail to discuss the methodology used in identifying these factors. I recommend that authors briefly discuss the methods used to identify these factors, either in the main text or the supplement.

Citation: https://doi.org/10.5194/egusphere-2025-1091-RC4
- AC4: 'Reply on RC4', Stefano Galmarini, 09 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1091/egusphere-2025-1091-AC4-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1091-AC4

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1091', Anonymous Referee #1, 10 May 2025

The manuscript provides a comprehensive operational evaluation of 12 regional-scale air quality models over continental North America and Europe participating in the AQMEII4 initiative, with a focused assessment on ozone deposition processes. The comparative approach across differing land-use types and model parameterizations is well-structured and effectively demonstrates the wide variability in model behavior, particularly regarding the contributions of different deposition pathways.
One of the paper’s most valuable contributions is its insight into the limitations of evaluating deposition processes solely at standard ozone monitoring sites. The authors argue that such sights are often poorly suited for deposition analysis and thus recommend the inclusion or designation of monitoring stations specifically targeting high-deposition land-use types. In addition, the paper raises a significant concern about the inconsistent LULC data across models. The call for the harmonization of LULC inputs is compelling and well supported by the evaluation presented.
However, the manuscript would benefit from clarifying some minor methodological choices. For example, a more detailed explanation of the selection criteria for the simulation years, emissions inventories used, resolution, would improve the paper.
Overall, the paper makes a strong case for methodological improvements in model inter-comparison and deposition process evaluation, and its conclusions are well-supported. I support the publication of this paper with minor edits that will further improve the quality and readability of the manuscript.

Specific Comments:
Line 23: The abstract is missing information on why evaluation of ozone deposition is important for models/air quality.
Lines 48 – 52: The introduction would benefit from a brief description of the AQMEII4 activity and importance of deposition evaluation for this study.
Line 71: LULC acronym is introduced here without being defined.
Line 80: It is not clear why the NA and EU models are chosen for the years 2016 and 2010 for the evaluation. The text should make clearer why these years are considered for the analyses.
Table 1: Missing specifics on the model resolutions. NA and EU are described as regional scale models covering general domains but the specifics on model domain and resolution could be including here and within the text as well. Adding the model versions here might be helpful as well since this distinction is mentioned later in the text (i.e., Line 187).
Lines 83-88: It is not clear in the text which emissions are using for anthropogenic, fire, etc. missing relevant citations here.
Line 135: As this section is long, it may be useful to split this section into further subsections that discuss the results from the NA models, EU models, and comparisons.
Lines 206-210: Reference?
Line 250: It might be useful to discuss the seasonal and diurnal cycles underneath a separate subsection for clarity.
Line 320 – 324: The conclusions could be more clearly stated here. This section should be broken up into multiple sentences.
Lines 383 – 386: The phrasing here is unclear and should be restructured accordingly.
Line 688 – 692: Please reword for clarity.

Technical Corrections:
Figure 1: Images are blurry. The color bar is missing units to denote differences between the numerical amounts shown. Regions (i.e., R1, R2, R3) should be defined in the figure caption.
Figure 2: The figure should have a title or color bar label indicating that RMSE is what is being shown. The image resolution is poor.
Figure 3: Same as above, labeling the figure as MB or adding a color bar label.
Figure 4/5: Same as above. Images have poor resolution.
Figure 6: Resolution is low quality. Titles for (a) and (b) would help make this figure more readable.
Figure 7/Figure 8: Image key should be placed outside of the figure for readability.
Figure 14/15: In the caption, the corresponding figure labels for wind speed, PBL height, solar radiation, and deposition velocity are missing.
Figure 17: Figure labels could be larger and /or boldened.

Citation: https://doi.org/10.5194/egusphere-2025-1091-RC1
- AC3: 'Reply on RC1', Stefano Galmarini, 09 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1091/egusphere-2025-1091-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1091-AC3
RC2:
'Comment on egusphere-2025-1091', Anonymous Referee #2, 02 Jun 2025

“Operational and Probabilistic Evaluation of AQMEII-4 Regional Scale Ozone Dry Deposition. Time to Harmonise Our LULC Masks” by Ioannis Kioutsioukis et al.

General Comments
This manuscript is part of a suite of papers that is being prepared under the umbrella of the AQMEII Phase 4 project. It presents results from a regional-level evaluation of model predictions of both ozone concentration and ozone deposition flux fields for both North America and Europe. This evaluation, which falls under Activity 1 of AQMEI?I4, includes a traditional operational evaluation but also probabilistic and diagnostic evaluations.
This manuscript shows how apparently similar grid-scale behavior between models may hide different emphases in process representation between models. This is quite striking since all of the models considered in the manuscript essentially follow the same conceptual paradigm to represent the process of gas-phase dry deposition. The manuscript also shows how interconnected different processes and model components are and how difficult it is to point to a single model shortcoming as being responsible for model errors. In the case of different versions of WRF-Chem this includes the impact of slightly different implementations of the same algorithm. The manuscript also suggests that compensating errors are present in the model predictions, which consist of some errors that have been documented in the case of dry deposition by other AQMEII4 evaluations but also others that are unknown at the present time. Lastly, it is shown that one apparently straightforward input field needed by all of the models, the classification of land-surface characteristics, is the source of considerable additional model variability.
I found this manuscript to be well-structured and reasonably well-written. It should be suitable for publication in ACP after a number of revisions. In its present form, however, it has a number of "rough edges", including some awkward language and an apparent reordering of figures before submission that led to inconsistencies in referencing them in both the text and in figure captions. To this end I have made a number of specific comments and suggestions below that I believe would improve the final version and that I hope the authors will consider.
One general comment about the figures, especially those for North America. I found it difficult to identify results for individual models. It would be helpful if a variety of line types (e.g., solid, long dash, short dash) and symbol types could be used for these figures in addition to different colors.

Specific Comments
1. In the description of the surface ozone measurement data that were used in this study in Section 2 (lines 100-110), no information is provided about any data filtering that was applied before use, including how roadside monitors were treated. For example, Solazzo et al. (2012b) noted that in the AQMEII-1 operational evaluation they only used ozone measurement data from rural receptors below an altitude of 1000 m with at least 75% annual data availability. The present manuscript also notes better performance statistics in general for the EU case vs. the NA case, but this difference might be influenced by differences in characteristics of the measurement data sets that were used for the evaluation. For example, ozone statistics are typically quite different for urban stations vs. rural stations. Were the urban-rural splits for the EU ozone data set and the NA ozone data set similar?
2. The text introducing Figures 2 to 10 at the beginning of Section 3.1 (lines 136-139) seems inconsistent and out of order. For example, Figures 2-5 show station-level RMSE and MB plots but not Figure 6, Figure 10 does not show box plots, and there is no reference to domain-level monthly and diurnal time series.
3. I found the discussion in lines 182-197 of the similarities and differences between the three WRF-Chem versions that were run for North America a bit hard to follow. First, there is a reference to Figure 5, which shows results for the European simulations. Then the three different versions of WRF-Chem that were used for North American simulations are described, but this is followed by references to "the former" and "the latter", implying that only two model versions are being discussed. Then the discussion turns to "relatively minor differences" and "larger differences" without stating differences in which quantity: model configuration choice or RMSE or MB or something else?
4. The discussion of Figure 4 (lines 198-212) states that model EU3 has the best results while EU1 and EU4 have less good results north of the Mediterranean area. To my eye, though, model EU2 had worse RMSE scores than the other three models over Germany, Poland, and Hungary, and it also has the highest median RMSE value in Fig. 6,b but EU2 is left out of the discussion. Am I misinterpreting these figures?

- very late in the paper another significant difference between NA6/NA8 and NA7 is mentioned (lines 741-743)
5. There are multiple discussions in the manuscript concerning the differences between two of the models used for the North American simulations: GEM-MACH(Base) [or NA3] and GEM-MACH(Ops) [or NA5] (see l. 177-181, l. 222-232, l. 269-278, l. 294-297, l. 411-413, l. 612-619, and l. 745-748). These discussions are somewhat hard to follow because the differences are mentioned incrementally wth wide gaps between mentions: (a) use/non-use of a canopy shading and turbulence scheme (l. 179) and different treatments of area-source emissions injection (l. 181); (b) use/non-use of a vehicle-induced turbulence scheme (l. 271); (c) full feedback vs. no feedback of meteorology on chemistry (l. 412); and (d) different treatments of seasonality of LAI (l. 747). These differences are then summed up on line 616, where it is stated that GEM-MACH(Base) uses "very different physical parameterizations than" GEM-MACH(Ops). This seems like an overstatement; isn't it more of a case that GEM-MACH(Base) uses two additional physical parameterizations compared to GEM-MACH(Ops)? And despite these differences, O3, NO, and NO2 predictions made by these two model versions have very similar scores (e.g., Figs. 2, 3, 6, 7, 9, S1, S2, S3, S5). It thus appears somewhat unbalanced that there is considerable discussion of GEM-MACH(Base) and its forest canopy and vehicle-induced turbulence (VIT) parameterizations, even though GEM-MACH(Ops), which does not employ either the canopy or VIT parameterizations, had comparable scores and was the most frequently included member of high-performing ensembles for NA (Table 2).
6. I found it striking how the observed monthly O3 profile differs in NA in the autumn between regions R1 and R2-R4 (Figure 9). This is not true in EU (Figure 10), where the observed monthly O3 profiles are similar across the four regions (and also with NA R1). The discussion of Figure 9 (lines 337-349) notes this behavior indirectly, but refers to the model overpredictions rather than the sharp reduction in observed concentration. Note too that Figure S3 is not referred to in the manuscript unlike other figures in the Supplement, but there are similarly significant regional differences in monthly NO profiles for the NA1 vs. NA2-NA4 subregions but not the EU1-EU4 subregions. And for the sentence that begins "The same behavior observed" (lines 347-348), should a parenthetical "(not shown)" be appended to this sentence since I don't think there are supporting figures provided.
7. I think this manuscript only presents quantitative analyses of ozone concentrations and ozone dry deposition flux. Can lines 494-495 be reworded to remove the reference in line 494 to "deposition velocity performance for NA3, NA5 here"?
8. In Section 4 the number of possible ensembles in line 472 for NA (254) and in line 498 for EU (18) don't seem correct. Based on the sum of rows of Pascal's triangle, shouldn't these numbers be 255 and 15? Also, in line 499 since we are excluding 4C1 (i.e., 4C2 + 4C3 + 4C4) it should say "Four out of the 11 combinations of ...". And for Table 3 the color scheme appears to be inconsistent with Table 2 -- shouldn't three columns (one 2nd order, one 3rd order, and the 4th order) be colored orange?
9. Would it be possible to recheck Eqns. 2b and 3 in Section 5.1? Just quickly looking at Eqns. 1 to 3 as a whole, my sense was that f/2 should be e/2 in Eqn. 2b and [a] should be [c] in Eqn. 3, but I stand to be corrected. And to help the reader, line 565 could be expanded to state something like "where the shared fraction of variation explained by X1 and X2 is (similarly for e and f)".
10. Section 5 has two parts, a subsection on ozone deposition flux variability and a subsection on ozone concentration variability. However, the text frequently refers just to "ozone flux" and "ozone variability". Using more exact terminology would be helpful to the reader in Sections 5 and 6. A similar comment applies to the use of "deposition" instead of "dry deposition" throughout the manuscript.
11. I have two minor concerns about the discussion of Figure 12 on pages 20-21. First, the comment that "the picture changes completely from NA" seems too strong to me, especially since the discussion that follows is much more nuanced. My sense is that the WRF-Chem results in Figure 12 (EU1, EU2) are in fact broadly similar to those in Figure 11 (NA6-NA8). The one big qualitative difference is the very small contribution from stomatal effective flux in the three winter months for EU1 and EU2 (which is different from NA6). My second minor concern is about terminology. The term "meteorological driver" generally suggests a numerical weather prediction model like WRF whose inputs are fed to a chemical transport model. Wouldn't "driving meteorology" or just "meteorology" be better here? I do wonder too whether the implementation of the seasonal dependence of stomatal conductance for these European simulations isn't also a contributing factor (cf. line 213), especially when compared to the NA6-NA8 results in Figure 11?
12. Following the discussion of Figure 17 in lines 789-800, there is no discussion about the corresponding LULC statistics for Europe that are presented in Figure S10. Moreover, Figure S10 is incomplete -- either the top or bottom panel is missing, and it would also be helpful to have panel labels similar to Figure 17. In addition, the Abstract mentions an "introductory diagnostic evaluation" but Section 5 does not mention this aspect.
13. One important finding noted on page 27 of the Conclusions (line 848) is that the predominant LULC types at ozone receptor locations are {\vf LULC types for which deposition is relatively low}". This is a second factor to explain the similarities in VP of ozone concentration variability shown in Figures 15b and 16, but this finding does not appear to be mentioned in Section 5.2 in the discussion of Figures 17 and S10.
14. One important finding noted by the authors in Section 3.1 (lines 278-294) that in my opinion represents an important conclusion for the entire AQMEII4 project is not mentioned at all in the Conclusion section. This is that the two models whose ozone concentration predictions for North America were found by the operational evaluation to have the highest skill were previously found by Activity 2 of AQMEII4 to have dry deposition modules with larger errors than other models, thus suggesting that their grid-scale performance was almost certainly due to error compensation between now-known errors in their dry deposition module and compensating errors in one or more process representations and that the other models applied for North America likely also had these same unknown errors, which resulted in their less good operational performance for ozone concentration due to their smaller errors for representation of dry deposition.
15. The manuscript title mentions both operational and probabilistic evaluations, the Abstract mentions operational, probabilistic, and diagnostic evaluations, but the Conclusions section only mentions the operational evaluation directly. Perhaps findings for the other two evaluation types could be added in the Conclusions section and "deposition-focused model evaluation" could be referred to in this section as a type of diagnostic evaluation.
16. The References section needs some attention:

- Hogrefe et al. (2023) is not cited in the manuscript but there is a reference;

- Kioutsioukis et al. (2014) is cited three times in the manuscript but there is no matching reference; however, Kioutsioukis and Galmarini (2014) is not cited at all but there is a reference;

- Solazzo et al. (2015) is cited two times in the manuscript but there is no matching reference; however, Solazzo et al. (2013) is not cited at all but there are two references for Solazzo et al. (2013);

- The Makar et al. (2024) reference should be updated to Makar et al. (2025) (https://acp.copernicus.org/articles/25/3049/2025/);

- Would the Campbell et al. (2022) journal publication be a better reference than the Campbell et al. (2021) conference presentation?

[Campbell, P. C., Tang, Y., Lee, P., Baker, B., Tong, D., Saylor, R., Stein, A., Huang, J., Huang, H.-C., Strobach, E., McQueen, J., Pan, L., Stajner, I., Sims, J., Tirado-Delgado, J., Jung, Y., Yang, F., Spero, T. L., and Gilliam, R. C.: Development and evaluation of an advanced National Air Quality Forecasting Capability using the NOAA Global Forecast System version 16, Geosci. Model Dev., 15, 3281–3313, https://doi.org/10.5194/gmd-15-3281-2022, 2022.]
17. The "Contributions" section (lines 880-886) also needs some attention. First, in what ways did KM and JP contribute? Second, should "WRF-Chem(IASS)" be "WRF-Chem(RIFS)" and should YHC be YHR? Perhaps you could also refer here to "WRF/CMAQ-M3Dry" and "WRF/CMAQ-STAGE" to be consistent with Table 1.

Technical and Editorial Corrections/Suggestions
p. 1, l. 27 \ \ Perhaps "The collective evaluation begins with an operational evaluation, namely a direct comparison of model-simulated predictions with monitoring data aiming at assessing model performance (Dennis et al., 2010)."
p. 1, l. 37 \ \ "... as a variable to be ..."
p. 2, l. 50, 52 \ \ Perhaps "dry deposition process modelling" and "standalone dry deposition modules" -- Galmarini et al. (2021) mentions wet deposition as a focus of AQMEII4 but that doesn't seem relevant for this manuscript.
p. 2, l. 63 \ \ Perhaps "The operational evaluation also provides important context ..."
p. 2, l. 72 \ \ Perhaps "... of modelled ozone dry deposition fluxes and velocities can be found ..."
p. 3, l. 108 \ \ Perhaps "For the European case the monitoring network databases employed included:"
p. 4, l. 117 \ \ Perhaps "... and the yearly average measured ozone at these sites"
p. 4, l. 123 \ \ Re "greater activity density" - I am not sure what is meant here by activity density
p. 4, l. 127 \ \ "... less detail, and ..."
p. 5, l. 147 \ \ Change "from the Figure 2-5" to "from Figure 2"
p. 5, l. 158 \ \ Delete "(see also Figure 3" -- superfluous?
p. 5, l. 168-173 \ \ Very long sentence?
p. 7, l. 224 \ \ Wang et al. (2025)?
p. 7, l. 233 \ \ Normalized mean bias?
p. 7, l. 235 \ \ Instead of "notice", would "note" be more appropriate? Same comment for lines 582, 646, 726, and 801.
p. 8, l. 238 \ \ Perhaps "... have ozone bias values closest to zero, followed ..."
p. 8, l. 240 \ \ From inspection of Fig. S1 I see three models with medium NRMSE values and only one with a high value.
p. 8, l. 250 \ \ Perhaps "... observed and modelled seasonal and diurnal cycles for North America for ozone, NO and NO2"
p. 9, l. 277 \ \ Perhaps "transported upwards away from the model surface"
p. 10, l. 314-315 \ \ The fact that models with smallest NO and NO2 biases do quite well for NO and NO2 shouldn't be surprising.
p. 11, l. 334 \ \ "... confidence in mobile and stack emissions, which ..."
p. 11, l. 338 \ \ I think reference here should be to Figures 9 and 10 rather than Figures 5 and 6.
p. 11, l. 340 \ \ "regions"
p. 11, l. 351-352 \ \ It is probably obvious from manuscript context but why not insert the word "dry" here, as in "ozone dry deposition fluxes"?
p. 11, l. 355 \ \ Perhaps "is not only due to these resistances but also"
p. 12, l. 368 \ \ "in which"
p. 12, l. 374-375 \ \ How many EU grid cells for "Deciduous Broadleaf Forest", and which continent for "Mixed Forest" and "Urban" values and what about the other continent?
p. 12, l. 380 \ \ Are 6130 cells or 6108 cells really "very few"? Perhaps "relatively few".
p. 13, l. 404-406 \ \ Double negative: keep "not" and change "neither ... nor" to "either ... or"
p. 13, l. 417 \ \ It is not clear to me what the "rule of model differences" is.
p. 14, l. 426-427 \ \ "but sometimes there is contributing seasonality in non-stomatal flux" -- awkward wording
p. 18, l. 561 \ \ "in this case ozone deposition flux"
p. 19, l. 590 \ \ Perhaps "... show that different models have very different ..."
p. 19, l. 596 \ \ Perhaps "... ozone flux variability is more equally distributed across ..."
p. 20, l. 619 \ \ Is "vehicles on highways" strictly true? From a quick perusal of Makar et al. (2021) it appears that the parameterization is based on VKT on roadways in general, not just highways.
p. 20, l. 629 \ \ "WRF-Chem"
p. 21, l. 638 \ \ Change "S8-S10" to "S7-S9"
p. 21, l. 645 \ \ Change "9" to "13"
p. 21, l. 649-650 \ \ Change "8" to "12"
p. 21, l. 650, 788 \ \ Perhaps change "back-to-back to" to "side-by-side with"
p. 21, l. 653 \ \ Perhaps "... stomatal flux, though disagreeing on the exact ..."
p. 24, l. 731 \ \ "solar radiation"
p. 24, l. 736 \ \ "wind speed"
p. 24, l. 737 \ \ "... though contributing on average 30% of the resolved varibility"?
p. 24, l. 738-739 \ \ Why this particular order of models: NA2, NA6, NA7, NA4, NA1?
p. 25, l. 764-765 \ \ "emissions variability" would be better
p. 25, l. 762, 767 \ \ "Interesting is the ..." -- awkward wording
p. 25, l. 778-779 \ \ Change "12" to "16" and "11b" to "15b"
p. 25, l. 782-783 \ \ "NA1, NA2, NA3, and NA5" would be more consistent with rest of manuscript. Should NA8 be included in this list?
p. 25, l. 784-785 \ \ "... contributor to ozone concentration variability at receptor locations"
p. 25, l. 787 \ \ Change "12b" to "15b"
p. 26, l. 802-804 \ \ To support this statement, could the following addition be made: "..., in conditions of uniform LU characteristics and dominance of urban and Planted/Cultivated LULC types, as shown in Figures 15b and 16 the models tend to produce comparable results in terms of contributors to ozone variability"
p. 26, l. 817-818 \ \ "... performance of dry deposition schemes ...", "Ozone dry deposition, in particular, ..."
p. 27, l. 821 \ \ One EU model has NRMSE for NO2 of 35% (Fig. S1)
p. 27, l. 834 \ \ "over North America"
p. 32, l. 988 \ \ For Hogrefe et al. (2025) reference, change "2005" to "2025".
p. 34, l. 1056 \ \ Separate these two references.
p. 38, l. 1131 \ \ Should "orange" be "yellow"?
p. 50, l. 1192 \ \ "columns"
p. 52, l. 1210 \ \ "Same as Fig. 11 but at the locations of ..."?
p. 52, l. 1196-1197 \ \ Is this sentence necessary?
p. 55, l. 1226 \ \ Should be "Same as Fig. 14 but at ..."
Figs. 1-4 captions \ \ Add units of MB and RMSE to captions.
Figs. 7-8 captions \ \ Should note that time units are local time.
Fig. 9 vs. Fig. 10 \ \ Labels of former are smaller and harder to read
Fig. S1 caption \\\ "Figure S1: Soccer plot diagrams of O3, NO2, NO2 and NO." would better reflect the figure layout.
Fig. S3 \ \ There seem to be two NO panels for the NA case.
Figs. S3 and S4 \ \ Could the line thicknesses in the legends be increased so that the line color is easier to see?
Figs. S5 and S6 \ \ What are units of O3FLX? Perhaps "Monthly ozone dry deposition flux". The resolution of these figures is too low -- it is very difficult to see the line colors of the legend.

Citation: https://doi.org/10.5194/egusphere-2025-1091-RC2
- AC2: 'Reply on RC2', Stefano Galmarini, 09 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1091/egusphere-2025-1091-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1091-AC2
RC3:
'Comment on egusphere-2025-1091', Anonymous Referee #3, 12 Jun 2025

This manuscript describes an evaluation of the ozone dry deposition schemes used in regional-scale air quality models. This activity was conducted by an international team as a component of the AQMEII project. The participants ran 12 regional models on North American and European domains for several different years. They present the results of model comparisons with ozone observations and present insights into the underlying physical processes and how they are represented in the models, including multiple model configurations. The models were characterized by their performance and the differences attributed to specific processes such as dry deposition modules, meteorological drivers, and model configurations. The findings are the basis for recommendations to harmonize LULC data and establish LULC-specific monitoring sites to improve air quality models. These are important points and the manuscript is appropriate for this journal. I recommend it be published after considering the following comments:
This authors indicate that the paper is focused on an evaluation of model estimates of ozone deposition but the main activity is comparing ozone concentrations (not deposition) and considering the various controlling processes. However, the paper is part of the broader AQMEII effort that includes companion papers that cover related model components, such as Clifton et al. 2023 that already describe model evaluations with direct measurements of ozone deposition and other papers on other trace gases, etc. so this manuscript ties this to model ozone concentration estimates. It would be helpful if this were explained early in the manuscript along with a summary of the results of the other papers and how it relates to this manuscript.
The paper mentions differences in gas-phase mechanisms but it is not clear how they interact with and influence deposition processes. A more in-depth discussion on the importance of these chemical scheme differences would be useful.
A major finding of this manuscript is that LULC is important. The authors point out the model implications of this (that LULC data should be accurate and consistent for all models) but they do not discuss the implications regarding the importance of different LULC (e.g., urban green spaces) for air pollution control. Recent papers suggest that urban green spaces may not be an efficient abatement measure for air pollution (e.g, Venter et al. 2024, doi.org/10.1073/pnas.230620012). The authors should consider whether their results provide any insights on this.
The manuscript emphasizes the importance of having the correct LULC but doesn’t consider whether any of the current LULC schemes are adequate for characterizing ozone deposition. For example, are the ozone uptake capabilities of all evergreen needleleaf trees the same? If there is significant variability within a given LULC type, do these LULC schemes need to be modified to represent these differences?
The authors focus on evergreen needleleaf forests and briefly present results for other LULC types shown in the supplement. It would be useful to have more discussion of these other LULC types to show their differences/similarities. Even though there are fewer representative sites, it could still show the importance of differences in LULC types such as the range of ozone uptake capabilities.
I recognize that different ozone units (ppb, ug/m3) are typically used in Europe and North America but for this exercise it would be better to be consistent and just use one. At least explain the rational if you don’t want to do this.
Why were those specific years chosen and why are they different in NA and Europe?
The criteria for "optimal" ensembles are based on minimizing RMSE, which does not capture all aspects of model skill, especially the ability to reproduce the maximum values that are a concern for air quality managers. There should be some discussion of the implications of this.
Some figures, such as Figure 6, are difficult to see. Others, especially those displaying multiple model results (e.g., Figure 11), are challenging to interpret due to the amount of information. I appreciate the attempt to get all the information in one figure but perhaps clearer differentiation could enhance readability.
Why do forest canopy shading effects increase NOx? (see Line 270)

Citation: https://doi.org/10.5194/egusphere-2025-1091-RC3
- AC1: 'Reply on RC3', Stefano Galmarini, 09 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1091/egusphere-2025-1091-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1091-AC1
RC4:
'Comment on egusphere-2025-1091', Anonymous Referee #4, 16 Jun 2025

This manuscript presents a comprehensive operational and probabilistic evaluation of regional-scale air quality models participating in the AQMEII Phase 4 project. This work studies ozone dry deposition while focusing on the continents of North America and Europe. This study analyzes model ensemble performance, depositional flux pathways, and highlights significant variability stemming from inconsistencies in LULC masks. The authors emphasize the need to harmonize LULC inputs for comparability and accuracy in model intercomparisons. This manuscript is well structured, and it offers operational evaluation using techniques like RMSE, MB, ensemble skill assessment, and pathway-level flux decomposition by variation partitioning.

This is a well-written, impactful, and methodologically robust work, and it presents novel insights into the effect of land surface characterization on ozone deposition. This work can be accepted with minor revision, clarifying several points mentioned in the detailed comments below:

Detailed comments

While the manuscript presents a detailed and robust analysis highlighting the inconsistencies in the LULC masks and their implications for ozone dry deposition modeling, it has become extremely long and dense. Several sections, particularly operational evaluation, are notably verbose, with extensive model-by-model commentary that could be streamlined. I recommend that authors condense or relocate some of these detailed discussions to the supplements, especially where the text reiterates statistical patterns already evident in the figures. Additionally, the manuscript would benefit from clearer synthesis or summary paragraphs at the end of each major section to reinforce the key takeaways and guide the reader through the analysis.

Line 83-88: What are the emission inventories used for lightning NOx, forest fires, biogenic emissions, and other natural sources? Please provide a brief discussion on the internal model processing of these emissions.

Line 89-90: Please provide a reasoning behind choosing the spatial and temporal domains for this study. What are the spatial and temporal resolutions used in the model runs?

Line 136-139: Ozone values in the figures are reported in ppb over North America, while in µg/m3 over Europe. The use of these two different units for ozone concentration complicates the direct comparison. While the authors attempted to align the color scales, I recommend standardizing the units across both regions for the convenience of readers and easier comparisons.

Line 161: Figure 4 presents results for Europe, which is incorrectly referred to as showing results for North America. I recommend that authors carefully review the entire manuscript to identify and correct such figure and section reference inconsistencies. While minor, these errors can significantly impact the clarity and interpretation of the results and may confuse readers.

Line 190-194: The authors hypothesize that the relatively minor difference between WRF-Chem (UPM) and WRF-Chem (UCAR) is primarily due to the difference in gas-phase chemistry mechanisms. However, this claim is made without presenting supporting analysis or citing a reference that explicitly evaluated the gas-phase chemical mechanisms used in these two configurations. I recommend that the authors provide a reference to a past work supporting this hypothesis or rephrase this discussion as a hypothesis.

Line 308-311: What are the factors driving the underestimation of wintertime NOx?

Line 390-393: Are there any implications of combining LCAN and soil for models that distinguish these two terms?

Line 529-534: The authors identify factors that are expected to be relevant in the determination of ozone concentration variability at the surface but fail to discuss the methodology used in identifying these factors. I recommend that authors briefly discuss the methods used to identify these factors, either in the main text or the supplement.

Citation: https://doi.org/10.5194/egusphere-2025-1091-RC4
- AC4: 'Reply on RC4', Stefano Galmarini, 09 Jul 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1091/egusphere-2025-1091-AC4-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1091-AC4

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Stefano Galmarini on behalf of the Authors (09 Jul 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (28 Jul 2025) by Joshua Fu

AR by Stefano Galmarini on behalf of the Authors (30 Jul 2025) Manuscript

Journal article(s) based on this preprint

17 Oct 2025

Operational, diagnostic, and probabilistic evaluation of AQMEII-4 regional-scale ozone dry deposition: time to harmonize our LULC masks

Atmos. Chem. Phys., 25, 12923–12953, https://doi.org/10.5194/acp-25-12923-2025,https://doi.org/10.5194/acp-25-12923-2025, 2025

Short summary

Supplement

https://doi.org/10.5194/egusphere-2025-1091-supplement

Viewed

Total article views: 973 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
879	64	30	973	29	20	34

HTML: 879
PDF: 64
XML: 30
Total: 973
Supplement: 29
BibTeX: 20
EndNote: 34

Views and downloads (calculated since 19 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	56	13	4	73
Apr 2025	40	8	1	49
May 2025	44	12	3	59
Jun 2025	90	7	8	105
Jul 2025	57	16	12	85
Aug 2025	123	2	0	125
Sep 2025	442	3	2	447
Oct 2025	27	3	0	30

Cumulative views and downloads (calculated since 19 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	56	13	4	73
Apr 2025	40	8	1	49
May 2025	44	12	3	59
Jun 2025	90	7	8	105
Jul 2025	57	16	12	85
Aug 2025	123	2	0	125
Sep 2025	442	3	2	447
Oct 2025	27	3	0	30

Viewed (geographical distribution)

Total article views: 921 (including HTML, PDF, and XML) Thereof 921 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 21 Oct 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2161 KB)
Metadata XML

Short summary

Deposition is a key in air quality modelling. An evaluation of the AQMEII4 models is performed prior to analysing the different deposition schemes in relation to the LULC used. Such analysis is unprecedented. Among the results, LULC masks have to be harmonised and up-to-date information used in place of outdated and too course masks. Alternatively LULC masks should be evaluated and intercom pared when multiple model results are analysed.


Total:	0
HTML:	0
PDF:	0
XML:	0