the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Estimating surface sulfur dioxide concentrations from satellite data: Using chemical transport models vs. machine learning
Abstract. Sulfur dioxide (SO2) is an important air pollutant that contributes to negative health effects, acid rain, and aerosol formation and growth. SO2 has been measured using ground-based air quality monitoring networks, but the routine monitoring sites are predominantly placed in urban areas, leaving large gaps in the network in less populated locations. Previous studies have used chemical transport models (CTMs) or machine learning techniques to estimate surface SO2 concentrations from satellite vertical column densities, but no direct comparisons between the methods have been made. In this study, we estimated surface SO2 concentrations using Ozone Monitoring Instrument (OMI) retrievals over eastern China from 2015–2018 utilizing GEOS-Chem simulations and an extreme gradient boosting machine learning model. Compared to the in situ measurements, the SO2 concentrations estimated from the CTM method had similar spatial distributions (r = 0.58) and intra- and interannual variations but were underestimated (slope = 0.24) with a relative percent error of ~75 % and had worsening performance over time. The machine learning method produced more accurate spatial distributions (r = 0.77) and temporal variations, a smaller discrepancy and bias (~30 %; slope = 0.69) and relatively stable performance over time. The machine learning method performed better than the GEOS-Chem method on smaller datasets and timescales with shorter temporal averaging periods. Ultimately, both methods were useful for estimating surface SO2 concentrations since the CTM-based method does not rely on in situ monitoring and produced more reasonable spatial distributions than the machine learning method over areas without surface monitoring data.
- Preprint
(2448 KB) - Metadata XML
-
Supplement
(2333 KB) - BibTeX
- EndNote
Status: open (until 06 Jun 2025)
-
RC1: 'Comment on egusphere-2025-1735', Anonymous Referee #1, 16 May 2025
reply
This study examines the ability of two methods to translate satellite column measurements of SO2 into surface concentrations. The authors focus their study over eastern China where there are substantial point sources of SO2. They compare and contrast the abilities of these two methods – one involving the GEOS-Chem model and the other a machine learning model – to reproduce in situ surface measurements of SO2 across their study region. They find that the machine learning model is generally better at reproducing observed spatial and temporal variation, although the GEOS-Chem model approach also did a good job. They highlight that the GEOS-Chem model approach is typically better in regions when in situ data are absent.
The following comments are intended to improve the value of the study to a wider readership.
The authors mention methodological uncertainties throughout the paper but this reviewer didn’t see any summation of these uncertainties reported alongside the surface SO2 concentration estimates. This kind of information would help any potential user to assess the usefulness of the reported data. This reviewer recommends that the authors explicitly state the origin and magnitude of each source of uncertainty. For example, using one year of model output to interpret multiple years of satellite data introduces an uncertainty that the authors have reported.
Line 127: increasing the time steps by 50%. Please assure this reader that this adjustment does not violate the CFL condition.
In the description of the GEOS-Chem technique, this reviewer was curious about SO2 retrievals with little or no sensitivity to the surface, perhaps due to elevated aerosols over industrialised regions. In those cases, perhaps the retrievals are removed from further analysis but the authors might also be misallocating an SO2 column with elevated values in the free troposphere to changes in the surface. This might help to explain the results shown in Figure 3. This reviewer is also wondering whether it might also explain why boundary layer height is the single most important predictor for the machine learning model (Figure 5). At least some discussion is needed about this point.
Line 140: Explain to the reader why 40 km was chosen? Do alternative values significantly change the results?
Line 174: Is it normal practice to use so much data for training? Later in this paragraph the authors mention the resulting machine learning model overfitting the data. Have the authors considered using fewer data to train and more data to test the resulting model?
Line 420: This reviewer may have missed this point in the manuscript, but I didn’t see any evidence that the GEOS-Chem approach reproduced temporal distribution observed by the CNEMC in situ data. Figure 7 shows a muted seasonal cycle with a correlation coefficient of typically less than 0.4 so the model only captures at most 20% of the observed variation.
Minor points
Line 108: regridding does not result in good data quality.
Line 120: The current version of GEOS-Chem bears little resemblance to the model described by Bey et al, 2001. Strongly suggest using a more updated reference.
Line 125: When stating horizontal resolution, this reviewer suggests you label which of the values represents latitude and longitude.
Line 136: difference in (horizontal) resolution…
Citation: https://doi.org/10.5194/egusphere-2025-1735-RC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
105 | 17 | 7 | 129 | 12 | 4 | 3 |
- HTML: 105
- PDF: 17
- XML: 7
- Total: 129
- Supplement: 12
- BibTeX: 4
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1