the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A machine learning approach to downscale EMEP4UK: analysis of UK ozone variability and trends
Abstract. High-resolution modelling of surface ozone is an essential step in the quantification of the impacts on health and ecosystems from historic and future concentrations. It also provides a principled way in which to extend analysis beyond measurement locations. Often, such modelling uses relatively coarse resolution chemistry transport models (CTMs), which exhibit biases when compared to measurements. EMEP4UK is a CTM that is used extensively to inform UK air quality policy, including the effects on ozone from mitigation of its precursors. Our evaluation of EMEP4UK for the years 2001–2018 finds a high bias in reproducing daily maximum 8-hr average ozone (MDA8), due in part to the coarse spatial resolution. We present a machine learning downscaling methodology to downscale EMEP4UK ozone output from a 5 × 5 km to 1 × 1 km resolution using a gradient boosted tree. By addressing the high bias present in EMEP4UK, the downscaled surface better represents the measured data, with a 128 % improvement in R2 and 37 % reduction in RMSE. Our analysis of the downscaled surface shows a decreasing trend in annual and March–August mean MDA8 ozone for all regions of the UK between 2001–2018, differing from increasing measurement trends in some regions. We find the proportion of the UK which fails the government objective to have at most 10 exceedances of 100 µg/m3 per annum is 27 % (2014–2018 average), compared to 99 % from the unadjusted EMEP4UK model. A statistically significant trend in this proportion of −2.19 %/year is found from the downscaled surface only, highlighting the importance of bias correction in the assessment of policy metrics. Finally, we use the downscaling approach to examine the sensitivity of UK surface ozone to reductions in UK terrestrial NOx (i.e., NO + NO2) emissions on a 1 × 1 km surface. Moderate NOx emission reductions with respect to present day (20 % or 40 %) increase both average and high-level ozone concentrations in large portions of the UK, whereas larger NOx reductions (80 %) cause a similarly wide-spread decrease in high-level ozone. In all three scenarios, very urban areas (i.e., major cities) are the most affected by increasing concentrations of ozone, emphasising the broader air quality challenges of NOx control.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(13577 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(13577 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-632', Anonymous Referee #1, 31 May 2023
Review of Gouldsbrough “A machine learning approach to downscale EMEP4UK: analysis of UK ozone variability and trends” by Gouldsbrough et al.
The manuscript by Gouldsbrough et al., discusses application of machine learning to downscale ozone results from the EMEP model over the UK. In it, the authors use results from the EMP4UK model along with Gradient Boosting to develop 1x1 km fields of ozone, and their response to emissions. They then use the downscaled model to look at the response of ozone to emissions reductions.
While the paper describes an interesting potential approach to using machine learning (ML) to develop finer scale fields than are typically produced from chemical transport models (CTMs), the current paper has a number of concerns.
I will start out by noting, the paper is pretty well written, and I have few concerns there. Further, I like the idea of using ML to blend observations, CTM results and other data, as they have done here.
In terms of limitations, probably the big one is their thought process on how to use the EMP4UK model to look at ozone response after fusing with observations. As noted, the EMEP4UK model is biased, and may be even more biased than indicated in that you should look at bias on a location-by-location basis. The reason bias is SO important is that if you now use a method to remove that bias, you can be artificially shifting in to a different ozone-NOx-VOC response regime. For example, let’s say the model is predicting peak ozone levels with a low bias. This would indicate that it is more radical-limited that in might be in actuality. Thus, the response to VOC controls would be enhanced, and NO potentially reduced, if not having the wrong sign. In this application, they have not actually shown that the model response to controls is correct. Without some assurance that the model is correctly capturing the response, using the model as a component of the GB is very concerning.
I was intrigued by Fig. 3. From how I look at what it says, the SHAP value is typically negative for the response to the EMEP4K model. Doesn’t this mean that most of the time, the GB model responds negatively? Does this not mean that the GB model response should be in the opposite direction? (It might be more precisely explained). Even if it only means that the response is muted, what is the physical/chemical reasoning for such in terms of actually believing the model response to controls?
This brings up the second potential question: why not include emissions in the GB model? This can help assure that the “package” (i.e., the GB model) is capturing the response over time, assuming that it correctly captures the response.
Their model evaluation is not well described. Are the R2 given for how well it captures peak ozone daily at each site for each year? (Is it for all sites, all days?) It seems so, and this should be emphasized. It seemed like it could also be for a different metric spatially. The table caption should be very precise as to what is shown. Also, I will note, they have NOT verified the model accuracy. They have estimated it. Indeed, they should describe what they mean by verification and what metrics and cut-offs are used to verify the model. As noted by Oreskes et al., (Science, ~1993) environmental models can not be verified. I think in this case, they mean evaluated. I was also intrigued by why their 70/30 results are less good than the 10-fold CV. Also, it seemed for an ML model, the correlations might be a bit low: more discussion is needed.
In the end, I am not sure they can overcome the main concern, i.e., using the model to assess response to controls when the approach used is not evaluated and may actually introduce a bias in the response, potentially even having the wrong sign. This needs to be further assessed before publication.
Citation: https://doi.org/10.5194/egusphere-2023-632-RC1 -
AC1: 'Response to reviewer comments', Lily Gouldsbrough, 21 Sep 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-632/egusphere-2023-632-AC1-supplement.pdf
-
AC1: 'Response to reviewer comments', Lily Gouldsbrough, 21 Sep 2023
-
RC2: 'Comment on egusphere-2023-632', Anonymous Referee #2, 25 Jun 2023
The manuscript by Gouldsbrough et al. makes use of surface ozone simulations from the EMEP4UK model and uses a machine learning approach to downscale this data to 1 km x 1km, thus providing a higher resolution dataset for assessing O3 air quality but also remove a positive bias in the model. This is achieved by exploiting surface ozone data from the AURN. Overall, the manuscript is well written and provides an interesting insight of how to fuse model data and observations together to provide a supposedly more robust product for air quality assessments. My comments/suggestions are as follows:
Major Comments:
- In the machine learning approach, you use surface O3 observations but could you also exploit information on NO2 and aerosols from AURN. O3 is influenced by concentrations of both, so would having surface data to constrain the biases you have in precursor pollutants help with the O3 downscaling? Also, would other meteorological variables (e.g. model cloud cover and photolysis) and surface type variables (e.g. vegetation cover and roughness – for deposition) help constrain the O3 downscaling?
- I’m not an expert in machine learning but as the AURN O3 data is used in the downscaling approach, would an independent data set be more appropriate for assessing the skill of it? From what I understand, you use the “training data” from the AURN data (i.e. a sub-selection) and then apply the method to all the data to generate the final product. However, if you then use the same data to evaluate the product, I’m not sure this can be deemed “independent” and a suitable evaluation of the product. Would it be worthwhile comparing the final product and EMEP4UK with an independent O3 observational data set to see if this downscaling approach truly works. E.g. ozonesonde. Or are there separate EMEP sites not included in the AURN data?
- The average and trends values are calculated for EMEP4UK, the downscaled product and the observations on a regional basis. However, I’m concerned that you are potentially getting regional domain statistics for the modelled data and then comparing with observations but there is a substantial representation bias (i.e. lots more model pixels then surface observational sites). Therefore, in your supporting information document, I think you need to see if the model/downscaled metrics are sensitive if you only sample the model/downscaled data like the observations.
- Section 4.1: I do not fully understand the benefit of this section. Does 5-years really provide you with enough information on the inter-annual variability? You look at trends over the full time period, so you might as well look at the full data set? Or at least include the first 5-year period and compare that with the 2014-2018 period. That would let you assess inter-annual variability at the start and end of the record. Then use the trend analysis to look at long-term changes (i.e. rates) with time.
Minor Comments:
Line 74: Delete ozone at end of line.
Line 100: Add , after “In Section 4”.
Line 141: Should that be 2001-2017?
Line 153: Would it be better to compare with NAEI emission data for e.g. 2019 and not 2020 as the latter was influenced by COVID19 and probably over-exaggerates the true emission decrease.
Line 175: Distance to road acts as a proxy for NO2 concentration. Is this a linear relationship or treated as non-linear in your work as I suspect there would be a sharp drop off with distance in NO2 concentration?
Line 178: Is this definition of London based on a subjective choice?
Line 185: Would it not make sense to get data for Northern Ireland as well since you have got extra data sets for the other nations of the UK?
Line 188: Is 3-years a long enough record for a surface site to be included in the analysis. Since you are using this data to produce a data set for trend analysis, I would suspect 5-year at least but 10-years would be better.
Lines 208-209: What quantitative approach do you use to implement Steps 3 and 4? I.e. what is used to determine no improved predictive skill in the approach?
Line 239: What do you mean by “long-tail”? Not normally distributed?
Line 401: Defra 2021b reference style needs changing.
Figure 9 and Lines 495-497: I do not see the correlations between temperature anomaly and number of days where data > 100 ug/m3. Please quantify this relationship. Also, you are comparing number of days per heatwave with temperature anomaly. How long were the heat waves? Sample sizes suitable in time to get a relationship between ΔT and N Days??
Figure A3 and similar: Instead of plotting all the data points for each year, would it be better to show the percentile values (e.g. 10, 25, 50, 75 and 90%) as this might be clearer and tell you more about the distribution for that year. With all data points plotted, some of the detail is difficult to resolve by the naked eye.
Figure A5: The AURN data tends to be more variable per year than the model data and downscaled data. Why is this and would this have an important impact on your downscaling approach. From what I can see, the downscaled data is struggling to capture the full variability in the observations? Especially for London.
Citation: https://doi.org/10.5194/egusphere-2023-632-RC2 -
AC1: 'Response to reviewer comments', Lily Gouldsbrough, 21 Sep 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-632/egusphere-2023-632-AC1-supplement.pdf
-
AC1: 'Response to reviewer comments', Lily Gouldsbrough, 21 Sep 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-632/egusphere-2023-632-AC1-supplement.pdf
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-632', Anonymous Referee #1, 31 May 2023
Review of Gouldsbrough “A machine learning approach to downscale EMEP4UK: analysis of UK ozone variability and trends” by Gouldsbrough et al.
The manuscript by Gouldsbrough et al., discusses application of machine learning to downscale ozone results from the EMEP model over the UK. In it, the authors use results from the EMP4UK model along with Gradient Boosting to develop 1x1 km fields of ozone, and their response to emissions. They then use the downscaled model to look at the response of ozone to emissions reductions.
While the paper describes an interesting potential approach to using machine learning (ML) to develop finer scale fields than are typically produced from chemical transport models (CTMs), the current paper has a number of concerns.
I will start out by noting, the paper is pretty well written, and I have few concerns there. Further, I like the idea of using ML to blend observations, CTM results and other data, as they have done here.
In terms of limitations, probably the big one is their thought process on how to use the EMP4UK model to look at ozone response after fusing with observations. As noted, the EMEP4UK model is biased, and may be even more biased than indicated in that you should look at bias on a location-by-location basis. The reason bias is SO important is that if you now use a method to remove that bias, you can be artificially shifting in to a different ozone-NOx-VOC response regime. For example, let’s say the model is predicting peak ozone levels with a low bias. This would indicate that it is more radical-limited that in might be in actuality. Thus, the response to VOC controls would be enhanced, and NO potentially reduced, if not having the wrong sign. In this application, they have not actually shown that the model response to controls is correct. Without some assurance that the model is correctly capturing the response, using the model as a component of the GB is very concerning.
I was intrigued by Fig. 3. From how I look at what it says, the SHAP value is typically negative for the response to the EMEP4K model. Doesn’t this mean that most of the time, the GB model responds negatively? Does this not mean that the GB model response should be in the opposite direction? (It might be more precisely explained). Even if it only means that the response is muted, what is the physical/chemical reasoning for such in terms of actually believing the model response to controls?
This brings up the second potential question: why not include emissions in the GB model? This can help assure that the “package” (i.e., the GB model) is capturing the response over time, assuming that it correctly captures the response.
Their model evaluation is not well described. Are the R2 given for how well it captures peak ozone daily at each site for each year? (Is it for all sites, all days?) It seems so, and this should be emphasized. It seemed like it could also be for a different metric spatially. The table caption should be very precise as to what is shown. Also, I will note, they have NOT verified the model accuracy. They have estimated it. Indeed, they should describe what they mean by verification and what metrics and cut-offs are used to verify the model. As noted by Oreskes et al., (Science, ~1993) environmental models can not be verified. I think in this case, they mean evaluated. I was also intrigued by why their 70/30 results are less good than the 10-fold CV. Also, it seemed for an ML model, the correlations might be a bit low: more discussion is needed.
In the end, I am not sure they can overcome the main concern, i.e., using the model to assess response to controls when the approach used is not evaluated and may actually introduce a bias in the response, potentially even having the wrong sign. This needs to be further assessed before publication.
Citation: https://doi.org/10.5194/egusphere-2023-632-RC1 -
AC1: 'Response to reviewer comments', Lily Gouldsbrough, 21 Sep 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-632/egusphere-2023-632-AC1-supplement.pdf
-
AC1: 'Response to reviewer comments', Lily Gouldsbrough, 21 Sep 2023
-
RC2: 'Comment on egusphere-2023-632', Anonymous Referee #2, 25 Jun 2023
The manuscript by Gouldsbrough et al. makes use of surface ozone simulations from the EMEP4UK model and uses a machine learning approach to downscale this data to 1 km x 1km, thus providing a higher resolution dataset for assessing O3 air quality but also remove a positive bias in the model. This is achieved by exploiting surface ozone data from the AURN. Overall, the manuscript is well written and provides an interesting insight of how to fuse model data and observations together to provide a supposedly more robust product for air quality assessments. My comments/suggestions are as follows:
Major Comments:
- In the machine learning approach, you use surface O3 observations but could you also exploit information on NO2 and aerosols from AURN. O3 is influenced by concentrations of both, so would having surface data to constrain the biases you have in precursor pollutants help with the O3 downscaling? Also, would other meteorological variables (e.g. model cloud cover and photolysis) and surface type variables (e.g. vegetation cover and roughness – for deposition) help constrain the O3 downscaling?
- I’m not an expert in machine learning but as the AURN O3 data is used in the downscaling approach, would an independent data set be more appropriate for assessing the skill of it? From what I understand, you use the “training data” from the AURN data (i.e. a sub-selection) and then apply the method to all the data to generate the final product. However, if you then use the same data to evaluate the product, I’m not sure this can be deemed “independent” and a suitable evaluation of the product. Would it be worthwhile comparing the final product and EMEP4UK with an independent O3 observational data set to see if this downscaling approach truly works. E.g. ozonesonde. Or are there separate EMEP sites not included in the AURN data?
- The average and trends values are calculated for EMEP4UK, the downscaled product and the observations on a regional basis. However, I’m concerned that you are potentially getting regional domain statistics for the modelled data and then comparing with observations but there is a substantial representation bias (i.e. lots more model pixels then surface observational sites). Therefore, in your supporting information document, I think you need to see if the model/downscaled metrics are sensitive if you only sample the model/downscaled data like the observations.
- Section 4.1: I do not fully understand the benefit of this section. Does 5-years really provide you with enough information on the inter-annual variability? You look at trends over the full time period, so you might as well look at the full data set? Or at least include the first 5-year period and compare that with the 2014-2018 period. That would let you assess inter-annual variability at the start and end of the record. Then use the trend analysis to look at long-term changes (i.e. rates) with time.
Minor Comments:
Line 74: Delete ozone at end of line.
Line 100: Add , after “In Section 4”.
Line 141: Should that be 2001-2017?
Line 153: Would it be better to compare with NAEI emission data for e.g. 2019 and not 2020 as the latter was influenced by COVID19 and probably over-exaggerates the true emission decrease.
Line 175: Distance to road acts as a proxy for NO2 concentration. Is this a linear relationship or treated as non-linear in your work as I suspect there would be a sharp drop off with distance in NO2 concentration?
Line 178: Is this definition of London based on a subjective choice?
Line 185: Would it not make sense to get data for Northern Ireland as well since you have got extra data sets for the other nations of the UK?
Line 188: Is 3-years a long enough record for a surface site to be included in the analysis. Since you are using this data to produce a data set for trend analysis, I would suspect 5-year at least but 10-years would be better.
Lines 208-209: What quantitative approach do you use to implement Steps 3 and 4? I.e. what is used to determine no improved predictive skill in the approach?
Line 239: What do you mean by “long-tail”? Not normally distributed?
Line 401: Defra 2021b reference style needs changing.
Figure 9 and Lines 495-497: I do not see the correlations between temperature anomaly and number of days where data > 100 ug/m3. Please quantify this relationship. Also, you are comparing number of days per heatwave with temperature anomaly. How long were the heat waves? Sample sizes suitable in time to get a relationship between ΔT and N Days??
Figure A3 and similar: Instead of plotting all the data points for each year, would it be better to show the percentile values (e.g. 10, 25, 50, 75 and 90%) as this might be clearer and tell you more about the distribution for that year. With all data points plotted, some of the detail is difficult to resolve by the naked eye.
Figure A5: The AURN data tends to be more variable per year than the model data and downscaled data. Why is this and would this have an important impact on your downscaling approach. From what I can see, the downscaled data is struggling to capture the full variability in the observations? Especially for London.
Citation: https://doi.org/10.5194/egusphere-2023-632-RC2 -
AC1: 'Response to reviewer comments', Lily Gouldsbrough, 21 Sep 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-632/egusphere-2023-632-AC1-supplement.pdf
-
AC1: 'Response to reviewer comments', Lily Gouldsbrough, 21 Sep 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-632/egusphere-2023-632-AC1-supplement.pdf
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
339 | 149 | 29 | 517 | 18 | 23 |
- HTML: 339
- PDF: 149
- XML: 29
- Total: 517
- BibTeX: 18
- EndNote: 23
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Lily Gouldsbrough
Emma Eastoe
Paul J. Young
Massimo Vieno
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(13577 KB) - Metadata XML