Estimating surface sulfur dioxide concentrations from satellite data: Using chemical transport models vs. machine learning

Watson, Zachary; Li, Can; Liu, Fei; Freeman, Sean W.; Zhang, Huanxin; Wang, Jun; Lee, Shan-Hu

doi:10.5194/egusphere-2025-1735

Preprints

https://doi.org/10.5194/egusphere-2025-1735

Preprints

25 Apr 2025

| 25 Apr 2025

Estimating surface sulfur dioxide concentrations from satellite data: Using chemical transport models vs. machine learning

Zachary Watson, Can Li, Fei Liu, Sean W. Freeman, Huanxin Zhang, Jun Wang, and Shan-Hu Lee

Abstract. Sulfur dioxide (SO₂) is an important air pollutant that contributes to negative health effects, acid rain, and aerosol formation and growth. SO₂ has been measured using ground-based air quality monitoring networks, but the routine monitoring sites are predominantly placed in urban areas, leaving large gaps in the network in less populated locations. Previous studies have used chemical transport models (CTMs) or machine learning techniques to estimate surface SO₂ concentrations from satellite vertical column densities, but no direct comparisons between the methods have been made. In this study, we estimated surface SO₂ concentrations using Ozone Monitoring Instrument (OMI) retrievals over eastern China from 2015–2018 utilizing GEOS-Chem simulations and an extreme gradient boosting machine learning model. Compared to the in situ measurements, the SO₂ concentrations estimated from the CTM method had similar spatial distributions (r = 0.58) and intra- and interannual variations but were underestimated (slope = 0.24) with a relative percent error of ~75 % and had worsening performance over time. The machine learning method produced more accurate spatial distributions (r = 0.77) and temporal variations, a smaller discrepancy and bias (~30 %; slope = 0.69) and relatively stable performance over time. The machine learning method performed better than the GEOS-Chem method on smaller datasets and timescales with shorter temporal averaging periods. Ultimately, both methods were useful for estimating surface SO₂ concentrations since the CTM-based method does not rely on in situ monitoring and produced more reasonable spatial distributions than the machine learning method over areas without surface monitoring data.

Received: 11 Apr 2025 – Discussion started: 25 Apr 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2448 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (2448 KB)

Supplement (2333 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

23 Oct 2025

Estimating surface sulfur dioxide concentrations from satellite data over eastern China: Using chemical transport models vs. machine learning

Zachary Watson, Can Li, Fei Liu, Sean W. Freeman, Huanxin Zhang, Jun Wang, and Shan-Hu Lee

Atmos. Chem. Phys., 25, 13527–13545, https://doi.org/10.5194/acp-25-13527-2025,https://doi.org/10.5194/acp-25-13527-2025, 2025

Short summary

Zachary Watson, Can Li, Fei Liu, Sean W. Freeman, Huanxin Zhang, Jun Wang, and Shan-Hu Lee

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1735', Anonymous Referee #1, 16 May 2025

This study examines the ability of two methods to translate satellite column measurements of SO2 into surface concentrations. The authors focus their study over eastern China where there are substantial point sources of SO2. They compare and contrast the abilities of these two methods – one involving the GEOS-Chem model and the other a machine learning model – to reproduce in situ surface measurements of SO2 across their study region. They find that the machine learning model is generally better at reproducing observed spatial and temporal variation, although the GEOS-Chem model approach also did a good job. They highlight that the GEOS-Chem model approach is typically better in regions when in situ data are absent.
The following comments are intended to improve the value of the study to a wider readership.
The authors mention methodological uncertainties throughout the paper but this reviewer didn’t see any summation of these uncertainties reported alongside the surface SO2 concentration estimates. This kind of information would help any potential user to assess the usefulness of the reported data. This reviewer recommends that the authors explicitly state the origin and magnitude of each source of uncertainty. For example, using one year of model output to interpret multiple years of satellite data introduces an uncertainty that the authors have reported.
Line 127: increasing the time steps by 50%. Please assure this reader that this adjustment does not violate the CFL condition.
In the description of the GEOS-Chem technique, this reviewer was curious about SO2 retrievals with little or no sensitivity to the surface, perhaps due to elevated aerosols over industrialised regions. In those cases, perhaps the retrievals are removed from further analysis but the authors might also be misallocating an SO2 column with elevated values in the free troposphere to changes in the surface. This might help to explain the results shown in Figure 3. This reviewer is also wondering whether it might also explain why boundary layer height is the single most important predictor for the machine learning model (Figure 5). At least some discussion is needed about this point.
Line 140: Explain to the reader why 40 km was chosen? Do alternative values significantly change the results?
Line 174: Is it normal practice to use so much data for training? Later in this paragraph the authors mention the resulting machine learning model overfitting the data. Have the authors considered using fewer data to train and more data to test the resulting model?
Line 420: This reviewer may have missed this point in the manuscript, but I didn’t see any evidence that the GEOS-Chem approach reproduced temporal distribution observed by the CNEMC in situ data. Figure 7 shows a muted seasonal cycle with a correlation coefficient of typically less than 0.4 so the model only captures at most 20% of the observed variation.
Minor points
Line 108: regridding does not result in good data quality.
Line 120: The current version of GEOS-Chem bears little resemblance to the model described by Bey et al, 2001. Strongly suggest using a more updated reference.
Line 125: When stating horizontal resolution, this reviewer suggests you label which of the values represents latitude and longitude.
Line 136: difference in (horizontal) resolution…

Citation: https://doi.org/10.5194/egusphere-2025-1735-RC1
- AC1: 'Authors' Reply to RC1', Zachary Watson, 15 Aug 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1735/egusphere-2025-1735-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1735-AC1
RC2:
'Comment on egusphere-2025-1735', Anonymous Referee #2, 26 May 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1735/egusphere-2025-1735-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-1735-RC2
- AC2: 'Authors' Reply to RC2', Zachary Watson, 15 Aug 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1735/egusphere-2025-1735-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1735-AC2
- AC3: 'Authors' Reply to RC2', Zachary Watson, 15 Aug 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1735/egusphere-2025-1735-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1735-AC3

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1735', Anonymous Referee #1, 16 May 2025

This study examines the ability of two methods to translate satellite column measurements of SO2 into surface concentrations. The authors focus their study over eastern China where there are substantial point sources of SO2. They compare and contrast the abilities of these two methods – one involving the GEOS-Chem model and the other a machine learning model – to reproduce in situ surface measurements of SO2 across their study region. They find that the machine learning model is generally better at reproducing observed spatial and temporal variation, although the GEOS-Chem model approach also did a good job. They highlight that the GEOS-Chem model approach is typically better in regions when in situ data are absent.
The following comments are intended to improve the value of the study to a wider readership.
The authors mention methodological uncertainties throughout the paper but this reviewer didn’t see any summation of these uncertainties reported alongside the surface SO2 concentration estimates. This kind of information would help any potential user to assess the usefulness of the reported data. This reviewer recommends that the authors explicitly state the origin and magnitude of each source of uncertainty. For example, using one year of model output to interpret multiple years of satellite data introduces an uncertainty that the authors have reported.
Line 127: increasing the time steps by 50%. Please assure this reader that this adjustment does not violate the CFL condition.
In the description of the GEOS-Chem technique, this reviewer was curious about SO2 retrievals with little or no sensitivity to the surface, perhaps due to elevated aerosols over industrialised regions. In those cases, perhaps the retrievals are removed from further analysis but the authors might also be misallocating an SO2 column with elevated values in the free troposphere to changes in the surface. This might help to explain the results shown in Figure 3. This reviewer is also wondering whether it might also explain why boundary layer height is the single most important predictor for the machine learning model (Figure 5). At least some discussion is needed about this point.
Line 140: Explain to the reader why 40 km was chosen? Do alternative values significantly change the results?
Line 174: Is it normal practice to use so much data for training? Later in this paragraph the authors mention the resulting machine learning model overfitting the data. Have the authors considered using fewer data to train and more data to test the resulting model?
Line 420: This reviewer may have missed this point in the manuscript, but I didn’t see any evidence that the GEOS-Chem approach reproduced temporal distribution observed by the CNEMC in situ data. Figure 7 shows a muted seasonal cycle with a correlation coefficient of typically less than 0.4 so the model only captures at most 20% of the observed variation.
Minor points
Line 108: regridding does not result in good data quality.
Line 120: The current version of GEOS-Chem bears little resemblance to the model described by Bey et al, 2001. Strongly suggest using a more updated reference.
Line 125: When stating horizontal resolution, this reviewer suggests you label which of the values represents latitude and longitude.
Line 136: difference in (horizontal) resolution…

Citation: https://doi.org/10.5194/egusphere-2025-1735-RC1
- AC1: 'Authors' Reply to RC1', Zachary Watson, 15 Aug 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1735/egusphere-2025-1735-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1735-AC1
RC2:
'Comment on egusphere-2025-1735', Anonymous Referee #2, 26 May 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1735/egusphere-2025-1735-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-1735-RC2
- AC2: 'Authors' Reply to RC2', Zachary Watson, 15 Aug 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1735/egusphere-2025-1735-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1735-AC2
- AC3: 'Authors' Reply to RC2', Zachary Watson, 15 Aug 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1735/egusphere-2025-1735-AC3-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-1735-AC3

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Zachary Watson on behalf of the Authors (15 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (03 Sep 2025) by Farahnaz Khosrawi

RR by Anonymous Referee #2 (15 Sep 2025)

ED: Publish subject to technical corrections (16 Sep 2025) by Farahnaz Khosrawi

AR by Zachary Watson on behalf of the Authors (16 Sep 2025) Author's response Manuscript

Journal article(s) based on this preprint

23 Oct 2025

Estimating surface sulfur dioxide concentrations from satellite data over eastern China: Using chemical transport models vs. machine learning

Zachary Watson, Can Li, Fei Liu, Sean W. Freeman, Huanxin Zhang, Jun Wang, and Shan-Hu Lee

Atmos. Chem. Phys., 25, 13527–13545, https://doi.org/10.5194/acp-25-13527-2025,https://doi.org/10.5194/acp-25-13527-2025, 2025

Short summary

Zachary Watson, Can Li, Fei Liu, Sean W. Freeman, Huanxin Zhang, Jun Wang, and Shan-Hu Lee

Supplement

https://doi.org/10.5194/egusphere-2025-1735-supplement

Zachary Watson, Can Li, Fei Liu, Sean W. Freeman, Huanxin Zhang, Jun Wang, and Shan-Hu Lee

Viewed

Total article views: 3,045 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
2,367	586	92	3,045	227	109	158

HTML: 2,367
PDF: 586
XML: 92
Total: 3,045
Supplement: 227
BibTeX: 109
EndNote: 158

Views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	146	26	10	182
May 2025	182	32	12	226
Jun 2025	104	18	6	128
Jul 2025	50	40	4	94
Aug 2025	300	78	2	380
Sep 2025	1,116	112	16	1,244
Oct 2025	46	44	2	92
Nov 2025	50	28	8	86
Dec 2025	74	82	6	162
Jan 2026	74	34	8	116
Feb 2026	102	24	4	130
Mar 2026	50	28	6	84
Apr 2026	38	19	1	58
May 2026	21	9	3	33
Jun 2026	6	6	2	14
Jul 2026	8	6	2	16

Cumulative views and downloads (calculated since 25 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	146	26	10	182
May 2025	182	32	12	226
Jun 2025	104	18	6	128
Jul 2025	50	40	4	94
Aug 2025	300	78	2	380
Sep 2025	1,116	112	16	1,244
Oct 2025	46	44	2	92
Nov 2025	50	28	8	86
Dec 2025	74	82	6	162
Jan 2026	74	34	8	116
Feb 2026	102	24	4	130
Mar 2026	50	28	6	84
Apr 2026	38	19	1	58
May 2026	21	9	3	33
Jun 2026	6	6	2	14
Jul 2026	8	6	2	16

Viewed (geographical distribution)

Total article views: 3,039 (including HTML, PDF, and XML) Thereof 3,039 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 25 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (2448 KB)
Metadata XML

Short summary

Air pollutants like sulfur dioxide cause direct impacts on human health and the environment. Our work estimated surface concentrations from satellite data using atmospheric models and machine learning compared to an air quality monitoring network. We found that both methods can accurately determine the locations and changes in sulfur dioxide, but the machine learning method had better accuracy. Both methods are useful for monitoring air quality in locations without ground-based measurements.


Total:	0
HTML:	0
PDF:	0
XML:	0

Estimating surface sulfur dioxide concentrations from satellite data: Using chemical transport models vs. machine learning

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Supplement

Viewed

Viewed (geographical distribution)