A machine learning approach to downscale EMEP4UK: analysis of UK ozone variability and trends
Abstract. High-resolution modelling of surface ozone is an essential step in the quantification of the impacts on health and ecosystems from historic and future concentrations. It also provides a principled way in which to extend analysis beyond measurement locations. Often, such modelling uses relatively coarse resolution chemistry transport models (CTMs), which exhibit biases when compared to measurements. EMEP4UK is a CTM that is used extensively to inform UK air quality policy, including the effects on ozone from mitigation of its precursors. Our evaluation of EMEP4UK for the years 2001–2018 finds a high bias in reproducing daily maximum 8-hr average ozone (MDA8), due in part to the coarse spatial resolution. We present a machine learning downscaling methodology to downscale EMEP4UK ozone output from a 5 × 5 km to 1 × 1 km resolution using a gradient boosted tree. By addressing the high bias present in EMEP4UK, the downscaled surface better represents the measured data, with a 128 % improvement in R2 and 37 % reduction in RMSE. Our analysis of the downscaled surface shows a decreasing trend in annual and March–August mean MDA8 ozone for all regions of the UK between 2001–2018, differing from increasing measurement trends in some regions. We find the proportion of the UK which fails the government objective to have at most 10 exceedances of 100 µg/m3 per annum is 27 % (2014–2018 average), compared to 99 % from the unadjusted EMEP4UK model. A statistically significant trend in this proportion of −2.19 %/year is found from the downscaled surface only, highlighting the importance of bias correction in the assessment of policy metrics. Finally, we use the downscaling approach to examine the sensitivity of UK surface ozone to reductions in UK terrestrial NOx (i.e., NO + NO2) emissions on a 1 × 1 km surface. Moderate NOx emission reductions with respect to present day (20 % or 40 %) increase both average and high-level ozone concentrations in large portions of the UK, whereas larger NOx reductions (80 %) cause a similarly wide-spread decrease in high-level ozone. In all three scenarios, very urban areas (i.e., major cities) are the most affected by increasing concentrations of ozone, emphasising the broader air quality challenges of NOx control.
Lily Gouldsbrough et al.
Status: open (until 27 Jun 2023)
- RC1: 'Comment on egusphere-2023-632', Anonymous Referee #1, 31 May 2023 reply
Lily Gouldsbrough et al.
Lily Gouldsbrough et al.
Viewed (geographical distribution)
Review of Gouldsbrough “A machine learning approach to downscale EMEP4UK: analysis of UK ozone variability and trends” by Gouldsbrough et al.
The manuscript by Gouldsbrough et al., discusses application of machine learning to downscale ozone results from the EMEP model over the UK. In it, the authors use results from the EMP4UK model along with Gradient Boosting to develop 1x1 km fields of ozone, and their response to emissions. They then use the downscaled model to look at the response of ozone to emissions reductions.
While the paper describes an interesting potential approach to using machine learning (ML) to develop finer scale fields than are typically produced from chemical transport models (CTMs), the current paper has a number of concerns.
I will start out by noting, the paper is pretty well written, and I have few concerns there. Further, I like the idea of using ML to blend observations, CTM results and other data, as they have done here.
In terms of limitations, probably the big one is their thought process on how to use the EMP4UK model to look at ozone response after fusing with observations. As noted, the EMEP4UK model is biased, and may be even more biased than indicated in that you should look at bias on a location-by-location basis. The reason bias is SO important is that if you now use a method to remove that bias, you can be artificially shifting in to a different ozone-NOx-VOC response regime. For example, let’s say the model is predicting peak ozone levels with a low bias. This would indicate that it is more radical-limited that in might be in actuality. Thus, the response to VOC controls would be enhanced, and NO potentially reduced, if not having the wrong sign. In this application, they have not actually shown that the model response to controls is correct. Without some assurance that the model is correctly capturing the response, using the model as a component of the GB is very concerning.
I was intrigued by Fig. 3. From how I look at what it says, the SHAP value is typically negative for the response to the EMEP4K model. Doesn’t this mean that most of the time, the GB model responds negatively? Does this not mean that the GB model response should be in the opposite direction? (It might be more precisely explained). Even if it only means that the response is muted, what is the physical/chemical reasoning for such in terms of actually believing the model response to controls?
This brings up the second potential question: why not include emissions in the GB model? This can help assure that the “package” (i.e., the GB model) is capturing the response over time, assuming that it correctly captures the response.
Their model evaluation is not well described. Are the R2 given for how well it captures peak ozone daily at each site for each year? (Is it for all sites, all days?) It seems so, and this should be emphasized. It seemed like it could also be for a different metric spatially. The table caption should be very precise as to what is shown. Also, I will note, they have NOT verified the model accuracy. They have estimated it. Indeed, they should describe what they mean by verification and what metrics and cut-offs are used to verify the model. As noted by Oreskes et al., (Science, ~1993) environmental models can not be verified. I think in this case, they mean evaluated. I was also intrigued by why their 70/30 results are less good than the 10-fold CV. Also, it seemed for an ML model, the correlations might be a bit low: more discussion is needed.
In the end, I am not sure they can overcome the main concern, i.e., using the model to assess response to controls when the approach used is not evaluated and may actually introduce a bias in the response, potentially even having the wrong sign. This needs to be further assessed before publication.