the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Identifying Drivers of Surface Ozone Bias in Global Chemical Reanalysis with Explainable Machine Learning
Abstract. This study employs an explainable machine learning (ML) framework to examine the regional dependencies of sur- face ozone biases and their underlying drivers in global chemical reanalysis. Surface ozone observations from the Tropospheric Ozone Assessment Report (TOAR) network and chemical reanalysis outputs from the multi-model multi-constituent chemical (MOMO-Chem) data assimilation (DA) system for the period 2005–2020 were utilized for ML training. A regression tree-based randomized ensemble ML approach successfully reproduced the spatiotemporal patterns of ozone bias in the chemical reanalysis relative to TOAR observations across North America, Europe, and East Asia. The global distributions of ozone bias predicted by ML revealed systematic patterns influenced by meteorological conditions, geographic features, anthropogenic activities, and biogenic emissions. The primary drivers identified include temperature, surface pressure, carbon monoxide (CO), formaldehyde (CH2O), and nitrogen oxides (NOx) reservoirs such as nitric acid (HNO3) and peroxyacetyl nitrate (PAN). The ML framework provided a detailed quantification of the magnitude and variability of these drivers, delivering bias-corrected ozone estimates suitable for human health and environmental impact assessments. The findings provide valuable insights that can inform advancements in chemical transport modeling, DA, and observational system design, thereby improving surface ozone reanalysis. However, the complex interplay among numerous parameters highlights the need for rigorous validation of identified drivers against established scientific knowledge to attain a comprehensive understanding at the process level. Further advancements in ML interpretability are essential to achieve reliable, actionable outcomes and to lead to an improved reanalysis framework for more effectively mitigating air pollution and its impacts.
- Preprint
(9050 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3753', Anonymous Referee #1, 11 Feb 2025
Miyazaki et al. applied a tree-based explainable machine learning framework to investigate the drivers of bias in surface ozone estimates from a chemical reanalysis data product. The proposed approach allows for a quantitative analysis on the drivers of surface ozone bias, which provides valuable insights for future development and improvement of data assimilation systems and chemical transport models. The paper is well written and is suitable for publication after revision, provided that the following comments are addressed.
General comments:
- The method section in the current version of the manuscript can be improved. For example, it is sometimes confusing for readers to understand what the ML model is trying to predict. If the ground truth used to train the ML model is TOAR observations aggregated to the TCR-2 grid, then how is the global training and evaluation conducted?
- It is mentioned around line 160 that the spatial resolution of 1.125 deg x 1.125 deg can limit the representativeness of gridded data sets. While I appreciate the discussion on the limitation, I am wondering if it is possible that the coarse resolution could be one of the uncertainty drivers? It would be great to see more details on this, for example, how many sites are in urban/rural regions and if there is any imbalance between urban/rural grid boxes. It is also not mentioned how urban and rural regions are defined.
- A number of important figures mentioned in the discussion are not shown. It would become easier for readers to follow the discussion, if these figures are provided in supplementary materials and referenced in the main text.
Specific comments:
Line 80: (Watson et al., 2019)
Line 99-100: Incomplete sentence
Table 1: Are the meteorological variables from surface only and essentially identical to ERA-Interim data (i.e., they are not optimized in TCR-2, right?). Also, are all the chemical variables (concentrations and emissions) optimized in the data assimilation system, or only a number of the chemical variables are optimized?
Line 170: Does this mean you would need to provide both the mean and quantile values from the ground truth while training the QRF?
Line 190: I thought one of the advantages of Permutation Importance is that you can calculate PI using already trained models and avoiding any re-training?
Section 2.3.1: The purposes of the two emulator runs are still not very clearly written in the current version. Are the emulator runs referring to first training using global TCR-2 data and then training using regional TOAR data? Also for the Emu_toar run, do you still use global MDA8 in the evaluation, because the goal is to test the generalizability of the model?
Line 241: Incomplete sentence
Figures 1 and 3: The caption says blue and red lines represent observed (actual) and ML-predicted values. But the legends indicate the opposite. My guess is orange lines are actual (i.e., legends are correct)?
Figure 4: Are the results from the independent out-of-training samples? For the bottom row, the full time series of actual and ML-predicted surface ozone biases seem to match pretty well. However, why are the correlation coefficients so low? Also, why do the North American results contain two predicted biases with zero values?
Figure 8: Do negative contributions to ozone bias mean that over some regions the corresponding parameters help reduce bias, or that these parameters lead to negative bias?
Line 468: is also a critical factor
Line 490: It is noteworthy
Figure 12: What are the dominant contributing factors for each color?
Citation: https://doi.org/10.5194/egusphere-2024-3753-RC1 -
RC2: 'Comment on egusphere-2024-3753', Anonymous Referee #2, 16 Feb 2025
The authors use an RF algorithm to (1) emulate predicted concentrations of surface ozone from a leading tropospheric chemical reanalysis product, and (2) predict the bias of the reanalysis product relative to global surface observations. The authors use explainable AI techniques to understand drivers of bias in ozone reanalysis. These results offer a useful perspective on O3 chemical transport model predictions and data assimilation output, and I recommend publication after the following comments are addressed.
Major points:
- My biggest concern by far in this work is spatial extrapolation: the TOAR surface data used in training are clustered in a few regions (North America, Europe, and east Asia) while bias is predicted globally. The authors are aware of this, but spatial crossvalidation is a more direct way of quantifying the issue and is not done in this work. I am most concerned about (1) oceans, (2) boreal regions, and (3) the tropics where training data is limited. Consider withholding some of the few training sites in these regions and measure how well the RF predicts bias there (the clustering maps in Figure 12 might be a reasonable way to do crossvalidation). The authors could also use more recent observations in China and India for independent evaluation. Do we really have enough data to use RF, a highly data-dependent algorithm, for extrapolation to these regions?
- In cases of highly imbalanced training sets, where some regions are far overrepresented, methods like SMOTE or weighted training are sometimes employed to ensure that the RF is penalized more heavily for bad predictions at some sites. Did the authors consider using such approaches?
- Explainable AI methods are vulnerable to collinearity in the inputs, as the authors are aware, and of the algorithms used SHAP (TreeExplainer) is most robust to this problem. I would like to see more comparison between SHAP and the other methods. For example, in Figure 6, what does SHAP suggest are the top contributors to ozone bias in these regions? In the literature, for SHAP regional attribution some authors use separate RFs trained on training sets focused on particular regions.
Minor points:
- Figure 3: It is not surprising that RF has trouble predicting the distributional tails; this has long been observed in the literature (and is to be expected given it is an ensemble algorithm). Consider commenting on the limitation of this method for e.g. improving models such that they give better predictions of NAAQS ozone exceedances (e.g. MDA8).
- Given the given tropical Pacific pattern in RMSE (Figure 2) I am curious about the role of ENSO in driving RF error. Is lightning NOx a problem here?
- Could you clarify if TOAR surface sites averaged to the grid of the TCR-2 output? In places with many monitors within a single grid cell this could lead to sample bias where e.g. urban areas are even more disproportionately represented.
- Figure 5: consider using same colorbar for observations and for predictions.
- Figure 11: consider also showing uncertainty as percentage of predicted bias
- Throughout, increase font size of figures. It can be quite hard to read.
- Some typos throughout. Here are a couple: Line 83: “the simulation of simulate” should read “the simulation of”. Line 241: Missing unit after “exceeded 30” (I think it should be percent).
Citation: https://doi.org/10.5194/egusphere-2024-3753-RC2
Data sets
TROPESS chemical reanalysis product, TCR-2data K. Miyazaki et al. https://doi.org/10.25966/9qgv-fe81
Model code and software
Machine learning code James Montgomery https://github.com/JPLMLIA/SUDSAQ
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
161 | 67 | 7 | 235 | 8 | 6 |
- HTML: 161
- PDF: 67
- XML: 7
- Total: 235
- BibTeX: 8
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 127 | 54 |
China | 2 | 15 | 6 |
Germany | 3 | 10 | 4 |
France | 4 | 10 | 4 |
Japan | 5 | 9 | 3 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 127