A machine-learning reference dataset for SO<sub>2</sub> plumes observed by TROPOMI: uncertainties and emission estimates

Finch, Douglas P.; Palmer, Paul I.

doi:10.5194/egusphere-2025-5900

Preprints

https://doi.org/10.5194/egusphere-2025-5900

Preprints

15 Jan 2026

| 15 Jan 2026

A machine-learning reference dataset for SO₂ plumes observed by TROPOMI: uncertainties and emission estimates

Douglas P. Finch and Paul I. Palmer

Abstract. Sulphur dioxide (SO₂) is a major atmospheric pollutant from fossil fuel combustion, metal smelting, and volcanic degassing, impacting human health, acid deposition, and climate forcing. Existing emission inventories are often temporally lagged and spatially coarse, failing to capture high-intensity, sporadic events. To address this, we present a novel, near real-time approach using a U-Net image segmentation model to automatically isolate SO₂plumes from over 31,000 TROPOMI satellite swaths (Jan 2019–Dec 2024). The model successfully identified 53,993 individual plumes. The highest annual detection rate in 2019 was attributed to massive stratospheric SO₂ injections from the Raikoke and Ulawun volcanic eruptions. Clustering analysis confirmed plume origins around expected volcanic and industrial hotspots (e.g., Iztaccíhuatl, Norilsk), with volcanic sources dominating the top ten clusters. We derived rapid, physics-informed emission rate estimates for each plume, finding a median rate of 14,629 kg hr^-1. This detection threshold for this approach, which we estimate to be ~524 kg hr^-1, is four orders of magnitude larger than typical fluxes in the EDGAR inventory, demonstrating the utility of the plume database for detecting extreme, high-intensity events. However, the algorithm struggles to detect sources in high-background regions like China, where high SO₂ saturation likely prevents individual plume isolation. This study demonstrates machine learning as a powerful tool for transforming atmospheric monitoring, providing the high-cadence, fine-grained quantification of SO₂ emissions crucial for validating global inventories and ensuring effective environmental management.

Received: 27 Nov 2025 – Discussion started: 15 Jan 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Douglas P. Finch and Paul I. Palmer

Status: final response (author comments only)

CC1:
'Comment on egusphere-2025-5900', Pascal Hedelt, 16 Jan 2026

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-5900/egusphere-2025-5900-CC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-5900-CC1
- AC1:
  'Reply on CC1', Douglas Finch, 20 Jan 2026
  Thank you for providing comments on our paper. We will address the comments in full alongside the comments from the other reviewers when we receive them. However, there are a number of points we would like to address now:
  Dataset availability: We agree that published data, particular a reference dataset, should be available without having to request the data from the authors and that was our intention after our study had been accepted for publication. To address the immediate request, we have uploaded the data to the following repository: https://zenodo.org/records/18302024 . These data are open access.
  
  The appropriateness of the manuscript title: We disagree that the title is misleading. The dataset is a reference to what can be detected using machine learning on the TROPOMI data. We will clarify our methods in the updated manuscript to expand on error characterization and model training. Given these revisions and the inclusion of open-access data, we believe the current manuscript title remains appropriate.
  
  Including TROPOMI SO2 detection flag in the training the ML model: The purpose of the ML model is to create a system that does not rely on prior knowledge of emission sources. As described in the ATBD, the detection flag uses proximity to known sources to attribute an emission label. While this is very useful in many cases, training the model using these data may lead to the model not capturing any new or unknown sources of SO2. We acknowledge that the detection flag may help the ML algorithm determine whether some detections are true or false (particularly with low SZA), but we believe it would also introduce errors relating to plumes detected in regions away from known sources. As this method for creating the training dataset relies on the judgment of the authors, including more variables (e.g. SZA, albedo & cloud cover) would not necessarily result in a more accurate model as the initial plume judgement may be incorrect. However, an investigation into this may be of interest in any future studies. We will clarify our method for creating the training dataset in the updated manuscript.
  
  Missing references: Thank you for highlighting these references, we will include them in the updated manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2025-5900-AC1
CC2: 'Comment on egusphere-2025-5900', Alexander Ukhov, 23 Jan 2026

I enjoyed reading this manuscript. The construction of a large SO2 plume database using a U‑Net segmentation approach has value for the Community.
A minor concern is the reliance on a relatively small set of manually drawn plume masks (1000 plumes) for training. Manual plume labeling is very subjective.

One possible workaround is to generate plume masks using a Lagrangian dispersion model (e.g., FLEXPART-WRF driven by WRF winds).
We used FLEXPART-WRF in our recent work (Ukhov et al., 2025, JGR Atmospheres, https://doi.org/10.1029/2025JD043334) for SO2 point sources and emphasized that dense source clusters

and incomplete background removal can bias plume-based top-down, which is relevant to your discussion of regions where individual plumes cannot be isolated.
Question: how does your post-processing handle overlapping or merged plumes from closely spaced sources? For example, how do you avoid double-counting in the emission-rate estimate?

Citation: https://doi.org/10.5194/egusphere-2025-5900-CC2
RC1: 'Comment on egusphere-2025-5900', Anonymous Referee #1, 29 Jan 2026

General comments:
The manuscript entitled "A machine-learning reference dataset for SO₂ plumes observed by TROPOMI: uncertainties and emission estimates", co-authored by Douglas P. Finch and Paul I. Palmer, presents a machine-learning method aimed to quantification of SO₂ plumes from point sources. The authors consider this method as a demonstration of the transformative role of machine-learning for validation of emission inventories and for effective environmental management.
My impression is what this study really demonstrates is a novel and efficient algorithm for a first-guess detection of SO₂ plumes (less than 15 minutes for the entire globe! according to the authors' estimate). But, it does not demonstrate a reliable quantification of emission. In this sense, I think the title of the manuscript should highlight the method ("machine learning for detection") and not the product ("reference dataset").
Despite this limitation, the approach to detect plumes globally and efficiently is an important contribution as it may provide a quick screening and identification of plumes. This would be particularly important for the current and upcoming geo-stationary satellite missions (GEMS, TEMPO, Sentinel-4), as indicated by the authors. However, the paper only shows its application to TROPOMI, and it is very likely that the application of the method to other missions will require a lot of mission-specific adaptations.
I find the method for estimation of emission strength overly simplified to provide reliable results, and therefore a meaningful comparison with emission inventories is not possible. Moreover, there are evident limitations regarding source attribution, especially for volcanoes, and disregard to recent literature on the "classical" methods for SO₂ emission quantification.
Although the manuscript is easy to follow, the figures need more details and there are some errors, as indicated in the "technical corrections".
As a structural change, if these comments are not addressed, it would be better to skip the emission quantification and comparison with inventory, and concentrate on the plume-identification method.
Specific comments:
SC1 - In my opinion, the manuscript's main methodological innovation is the implementation of the U-net architecture for plume segmentation. As such, it would be very informative to provide more details of how the architecture was conceived, e,g. what guides the choice of hyperparameters, e.g. the number of blocks in relation to memory constraints, the type of activation or normalization, etc.
SC2 - It is understood that the authors trained their model only using images that show the presence of a plume. Why not using even images without a plume for the training?
SC3 - The size of the images used for the segmentation model is such that it would correspond to plume straight paths of <90 km, for which plume ages may be 1-25 hours, for typical ranges of transport speed. Continuous sources will most likely produce longer plumes, and this may result in the tendency of the method to split big plumes into smaller ones. The size of the images makes also the method very sensitive to "polluted" scenes, i.e. scenes with noisy background, which may be dominant for unsteady plumes, changing winds, or big emission events. What would be the computational cost or other penalty for extending the size of the input images?
SC4 - The emission estimation is based on fitting an ellipse to the identified plume, dividing the mass of the masked plume by the major axis of the fitted ellipse and multiplying this ratio by the wind speed (here taken as the ERA5 10-m wind fields provided with the L2 TROPOMI data products). The metric used to validate the approach is the ratio of the plume area to the fitted area. I method too simplistic, or erroneous for several reasons:
- SC4-a - Quantifying the emission rate as the total mass in the area of the plume divided by the plume length and multiplied by the plume mean velocity is not erroneous by itself and it is used by other established methods. However, the way that the variables are estimated in this study may only result in reasonable emission estimates for "well-behaved" plumes that result from steady emission at stable altitude and with stable winds. In reality, weak plumes (e.g. from industrial sources or passive volcanic degassing at low altitude) may be severely distorted by interaction with the ground, and large plumes may be too big to adapt to the assumptions behind this method. Variable source and wind conditions would results in plumes with heterogeneous shapes. Still the area of these plumes may be similar to the area of a fitted ellipse, without this meaning that the mass to length ratio are similar. I think it would be better to divide the measured mass above background inside the masked plume by an "equivalent plume length". This plume length could be estimated by estimating a mean mass density (kg/m2) inside the fitted ellipse, so that the product of this mass density and the area of the ellipse be equal to the observed mass of the plume. From this relation you could obtained the equivalent plume length (mass divided by pi and by a factor for the eccentricity obtained from the fitting).
- SC4-b - The wind speed at 10-m would not be representative for most volcanoes, and providing statistics of this wind field is not informative because the winds at that level will most likely not be correlated with the winds at plume level.
I understand that implementing corrections to varying plume altitude (important to correct column densities and for the choice of wind speed) may be too time-consuming, going against the advantages of this method. Therefore, I conclude that the method, as presented here, may be good at identifying potential plumes, but not on quantifying the emission.
SC5 - Source attribution is also a problem with this method. Several sources, especially volcanic, are evidently wrong, e,g, Iztaccíhuatl -> Popocatépetl, Cerro Bravo -> Nevado del Ruiz, Nyiragongo -> Nyiragongo/Nyamuragira, Chimborazo -> Sangay, Ampato -> Sabancaya. I think it is correct to keep this wrong attribution as a consequence of the potential pitfall of the method, but a column to the most likely source should also be provided. The pitfall is possibly originated from big column densities being discarded in the L2 data products of TROPOMI, or from unsteady emission.
SC6 - Given that the manuscript introduces a novel algorithm, it seems important to present in relation to other established methods for plume identification and quantification. There is a clear lack of relevant references, even to the documents describing the TROPOMI products, and to the several approaches used for proper SO₂ quantification of emission (wind rotation, divergence, delta-M, back-trajectory, disk method, etc.)
SC7 - All figures presenting maps should include latitude, longitud, time, scale of column density, name of source.
Technical corrections:
L19 - SO₂ lifetime is too dependent on environmental conditions to provide a single, representative estimate.
L34 - Provide full name and reference for the EDGAR inventory.
L53 - Reference to Carn et al., 2017 seems misplaced here (Arellano et al., 2021?)
L57 - Missing reference for the Network for Observation of Volcanic and Atmospheric Change (Galle, Arellano, ...)
L79 - What were the filters used for VCD, cloud cover, SZA or number of pixels at the edge of the swath?
L106-107 - Sentence not justified (see above).
L170 - Make sure that pixel size was homogeneous, since the resolution of TROPOMI pixels changed during the period of study. Were these values resampled before their use for training and testing of the algorithm?
L184 - Check reference to "?".
L187 - Add year to reference to "Ester et al.".
Fig9 - More correct to refer the activity to the Svartsengi volcanic system.
L252 - The definition of the coefficient of variation is wrong. If this definition was used, then the conclusions are also wrong.
L256 - I recommend to compare with the CAMS-GLOB-VOLC dataset, which is based on satellite- and ground-based observations. However, the entire section may just as well be entirely discarded, considering that the emission estimates are too unreliable for a proper comparison.
The reviewer thanks the Editor of AMT for the opportunity to review this manuscript.

Citation: https://doi.org/10.5194/egusphere-2025-5900-RC1
RC2:
'Comment on egusphere-2025-5900', Anonymous Referee #3, 05 Feb 2026
The authors use a U-Net image segmentation model to detect SO2 plumes and measure their emissions from TROPOMI data. The detection part is well-done and creates a valuable database of detected SO2 plumes. The emission calculations, while promising, could benefit from some refinement. I suggest reducing the focus on emissions unless the wind-related concerns are thoroughly addressed. The manuscript is on the right track for publication after the revisions below.
Model Performance: In line 104, the authors say the model's precision and recall are 65.7% and 74%. Does this mean 35% of predictions are incorrect and 26% of cases are missed? I assume this is decent for a U-Net model, but more explanation would help readers who are not familiar with U-Net understand the model's performance. Also, how does this accuracy compare to other plume detection techniques, including the authors' previous work? Providing precision and recall for volcano sources alone would be useful, as performance is likely better for larger sources. This would show that some errors come from the signal-to-noise level of SO2 observations.

Training Data: The training truth is based on manually selected grid cells. While this makes sense due to lack of better options, manual labeling introduces uncertainties. Discussing these uncertainties would be helpful.

Emission Calculation: Volcanic SO2 emissions have been estimated by Carn et al. and Vitali et al. More details can be found here: https://so2.gsfc.nasa.gov/. How is your method different? My main concern is the use of 10-meter U and V wind fields. As pointed out by the authors themselves, near-surface winds are not suitable for volcano plumes. While concerns about computation cost are valid, using winds at the correct height is essential for emission estimates. I recommend a sensitivity study to show the impact is limited, at least for one case. Otherwise, the emission estimates are less reliable. Comparing volcanic emissions with estimates by Carn et al. would also be strongly recommended.

Data Availability: I suggest the authors make the plume database publicly available to support further research.
Citation: https://doi.org/10.5194/egusphere-2025-5900-RC2
RC3: 'Comment on egusphere-2025-5900', Anonymous Referee #2, 09 Feb 2026

Review of Finch et al "A machine-learning reference dataset for SO2 plumes observed by TROPOMI: uncertainties and emission estimates” submitted to Atmospheric Measurement Techniques.

Finch et al. presented an SO2 data set based on machine learning identification of plumes and emission estimation from TROPOMI. It is an interesting study, and I believe the work could be published after revision. The paper is well written and structured. Overall, the figures are of sufficient quality.
My main reservation is that the SO2 fluxes derived are not evaluated against other available estimates. Over the last years, several studies have reported SO2 top-down estimates using TROPOMI, but the main author does not cite those papers. In particular, Fioletov et al. 2023 (https://doi.org/10.5194/essd-15-75-2023) provides SO2 emission estimates and I would like to see a comparison between the emissions from this work and the results from Fioletov. This can be done for several representative SO2 sources (anthropogenic and volcanic), with stable emissions. This should come with a discussion of pro and cons for the presented method. Apart from that, I agree with all the comments raised by Pascal Hedelt (some are repeated below) and the author should address them in the replies and revised manuscript.
Introduction
Line 33: “… with recent years showing lower values”. Please add a reference.
Line 57: NOVAC. Please add a reference to Galle et al. 2010 https://doi.org/10.1029/2009JD011823

Methodology: section 2.1
-Which SO2 product version was used? Recently, the TROPOMI SO2 product has switched to the COBRA algorithm which is more sensitive to weak SO2 emissions. Did the author use this data? If not, why not?
-In line with P. Hedelt comment, more information must be given on which SO2 column product was used (the main VCD and/or the 1,7,15km VCD product) and what is the impact of this choice on the final result.
-A reference to the main papers, ATBD and product read me file should be added.
-The quality flag >0.5 applies to the main VCD product which assumes an SO2 profile from pollution. This flag removes much of the cloudy pixels which are still very useful for volcanic events where the SO2 plume lies over clouds (in this case, the 1,7,15km VCD product are more appropriate). This is not assessed or discussed in the paper.
Methodology: section 2.2
-line 100: in line with P. Hedelt, the ’manual creation of a precise plume mask‘ deserves a thorough description.
-It is not clear to me what is the added-value of the proposed plume detection compared to the detection flag. The selective detection of SO2 from a hyperspectral instrument like TROPOMI is relatively straightforward, and because the SO2 background is negligible it is easy to identify the plumes. The proposed method would perform better for species like CH4 for which the background level is significant.

Methodology: section 2.3
-Line 35: typo: “To estimate the the emission”
-it would be informative to compare the elipse main axis direction with the wind direction used to estimate the SO2 emissions. Do they compare well?

Section 3
-line 183: typo – a question mark appears in (Vernier et al., 2024;?).
-l184: about Peak I, if it is related to Norilsk, I don’t see why it appears only for this year.
-l195: about the high background SO2 concentrations. Over China, the SO2 levels are quite low. Indeed, there have been many regulations on SO2 emissions in China, and I think this statement is not true. Later, the main author argues on the high SO2 levels in China based on EDGAR inventory but it is not clear to me if EDGAR is up-to-date regarding the SO2 emissions level over China or not.
-section 3.5: the presentation of the emission database is minimal. It consists mainly of Figure 12 which is not very informative. I would like to see at least maps of emissions (global, regional, per emission type, etc). The comparison with EDGAR is also weak in my opinion. The author mainly describes why it is presumably not possible to compare. As a reader, it is not clear to me what this section is about.
Conclusions
The last sentence about the usefulness of the approach for the VAACs is doubtful. A simple VCD threshold mask (or detection flag) is enough to isolate the plumes.
Data availability
The dataset should be available as supplement or in a data repository
Acknowledgements
Please provide information on the ESA project supporting this work.

Citation: https://doi.org/10.5194/egusphere-2025-5900-RC3

Douglas P. Finch and Paul I. Palmer

Viewed

Total article views: 480 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
289	156	35	480	17	17

HTML: 289
PDF: 156
XML: 35
Total: 480
BibTeX: 17
EndNote: 17

Views and downloads (calculated since 15 Jan 2026)

Month	HTML	PDF	XML	Total
Jan 2026	158	74	24	256
Feb 2026	70	20	5	95
Mar 2026	48	50	4	102
Apr 2026	13	12	2	27

Cumulative views and downloads (calculated since 15 Jan 2026)

Month	HTML	PDF	XML	Total
Jan 2026	158	74	24	256
Feb 2026	70	20	5	95
Mar 2026	48	50	4	102
Apr 2026	13	12	2	27

Viewed (geographical distribution)

Total article views: 442 (including HTML, PDF, and XML) Thereof 442 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 13 Apr 2026

Short summary

We have developed a machine learning tool to find emission plumes of sulphur dioxide (SO₂) observed in satellite data. SO₂ is an atmospheric pollutant from fuel combustion, metal smelting, and volcanic degassing, impacting health, acid deposition, and climate forcing. Over 6 years we find over 50,000 plumes, most of which are clustered around known sources (e.g. volcanoes or industrial hotspots). We show how this tool can be used to give near real-time estimates of emissions across the globe.


Total:	0
HTML:	0
PDF:	0
XML:	0

A machine-learning reference dataset for SO2 plumes observed by TROPOMI: uncertainties and emission estimates

Viewed

Viewed (geographical distribution)

A machine-learning reference dataset for SO₂ plumes observed by TROPOMI: uncertainties and emission estimates