the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A machine-learning reference dataset for SO2 plumes observed by TROPOMI: uncertainties and emission estimates
Abstract. Sulphur dioxide (SO2) is a major atmospheric pollutant from fossil fuel combustion, metal smelting, and volcanic degassing, impacting human health, acid deposition, and climate forcing. Existing emission inventories are often temporally lagged and spatially coarse, failing to capture high-intensity, sporadic events. To address this, we present a novel, near real-time approach using a U-Net image segmentation model to automatically isolate SO2 plumes from over 31,000 TROPOMI satellite swaths (Jan 2019–Dec 2024). The model successfully identified 53,993 individual plumes. The highest annual detection rate in 2019 was attributed to massive stratospheric SO2 injections from the Raikoke and Ulawun volcanic eruptions. Clustering analysis confirmed plume origins around expected volcanic and industrial hotspots (e.g., Iztaccíhuatl, Norilsk), with volcanic sources dominating the top ten clusters. We derived rapid, physics-informed emission rate estimates for each plume, finding a median rate of 14,629 kg hr-1. This detection threshold for this approach, which we estimate to be ~524 kg hr-1, is four orders of magnitude larger than typical fluxes in the EDGAR inventory, demonstrating the utility of the plume database for detecting extreme, high-intensity events. However, the algorithm struggles to detect sources in high-background regions like China, where high SO2 saturation likely prevents individual plume isolation. This study demonstrates machine learning as a powerful tool for transforming atmospheric monitoring, providing the high-cadence, fine-grained quantification of SO2 emissions crucial for validating global inventories and ensuring effective environmental management.
- Preprint
(12483 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 19 Feb 2026)
-
CC1: 'Comment on egusphere-2025-5900', Pascal Hedelt, 16 Jan 2026
reply
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-5900/egusphere-2025-5900-CC1-supplement.pdfReplyCitation: https://doi.org/
10.5194/egusphere-2025-5900-CC1 -
AC1: 'Reply on CC1', Douglas Finch, 20 Jan 2026
reply
Thank you for providing comments on our paper. We will address the comments in full alongside the comments from the other reviewers when we receive them. However, there are a number of points we would like to address now:
- Dataset availability: We agree that published data, particular a reference dataset, should be available without having to request the data from the authors and that was our intention after our study had been accepted for publication. To address the immediate request, we have uploaded the data to the following repository: https://zenodo.org/records/18302024 . These data are open access.
- The appropriateness of the manuscript title: We disagree that the title is misleading. The dataset is a reference to what can be detected using machine learning on the TROPOMI data. We will clarify our methods in the updated manuscript to expand on error characterization and model training. Given these revisions and the inclusion of open-access data, we believe the current manuscript title remains appropriate.
- Including TROPOMI SO2 detection flag in the training the ML model: The purpose of the ML model is to create a system that does not rely on prior knowledge of emission sources. As described in the ATBD, the detection flag uses proximity to known sources to attribute an emission label. While this is very useful in many cases, training the model using these data may lead to the model not capturing any new or unknown sources of SO2. We acknowledge that the detection flag may help the ML algorithm determine whether some detections are true or false (particularly with low SZA), but we believe it would also introduce errors relating to plumes detected in regions away from known sources. As this method for creating the training dataset relies on the judgment of the authors, including more variables (e.g. SZA, albedo & cloud cover) would not necessarily result in a more accurate model as the initial plume judgement may be incorrect. However, an investigation into this may be of interest in any future studies. We will clarify our method for creating the training dataset in the updated manuscript.
- Missing references: Thank you for highlighting these references, we will include them in the updated manuscript.
Citation: https://doi.org/10.5194/egusphere-2025-5900-AC1
-
AC1: 'Reply on CC1', Douglas Finch, 20 Jan 2026
reply
-
CC2: 'Comment on egusphere-2025-5900', Alexander Ukhov, 23 Jan 2026
reply
I enjoyed reading this manuscript. The construction of a large SO2 plume database using a U‑Net segmentation approach has value for the Community.
A minor concern is the reliance on a relatively small set of manually drawn plume masks (1000 plumes) for training. Manual plume labeling is very subjective.
One possible workaround is to generate plume masks using a Lagrangian dispersion model (e.g., FLEXPART-WRF driven by WRF winds).We used FLEXPART-WRF in our recent work (Ukhov et al., 2025, JGR Atmospheres, https://doi.org/10.1029/2025JD043334) for SO2 point sources and emphasized that dense source clusters
and incomplete background removal can bias plume-based top-down, which is relevant to your discussion of regions where individual plumes cannot be isolated.Question: how does your post-processing handle overlapping or merged plumes from closely spaced sources? For example, how do you avoid double-counting in the emission-rate estimate?
Citation: https://doi.org/10.5194/egusphere-2025-5900-CC2 -
RC1: 'Comment on egusphere-2025-5900', Anonymous Referee #1, 29 Jan 2026
reply
General comments:
The manuscript entitled "A machine-learning reference dataset for SO2 plumes observed by TROPOMI: uncertainties and emission estimates", co-authored by Douglas P. Finch and Paul I. Palmer, presents a machine-learning method aimed to quantification of SO2 plumes from point sources. The authors consider this method as a demonstration of the transformative role of machine-learning for validation of emission inventories and for effective environmental management.
My impression is what this study really demonstrates is a novel and efficient algorithm for a first-guess detection of SO2 plumes (less than 15 minutes for the entire globe! according to the authors' estimate). But, it does not demonstrate a reliable quantification of emission. In this sense, I think the title of the manuscript should highlight the method ("machine learning for detection") and not the product ("reference dataset").
Despite this limitation, the approach to detect plumes globally and efficiently is an important contribution as it may provide a quick screening and identification of plumes. This would be particularly important for the current and upcoming geo-stationary satellite missions (GEMS, TEMPO, Sentinel-4), as indicated by the authors. However, the paper only shows its application to TROPOMI, and it is very likely that the application of the method to other missions will require a lot of mission-specific adaptations.
I find the method for estimation of emission strength overly simplified to provide reliable results, and therefore a meaningful comparison with emission inventories is not possible. Moreover, there are evident limitations regarding source attribution, especially for volcanoes, and disregard to recent literature on the "classical" methods for SO2 emission quantification.
Although the manuscript is easy to follow, the figures need more details and there are some errors, as indicated in the "technical corrections".
As a structural change, if these comments are not addressed, it would be better to skip the emission quantification and comparison with inventory, and concentrate on the plume-identification method.
Specific comments:
SC1 - In my opinion, the manuscript's main methodological innovation is the implementation of the U-net architecture for plume segmentation. As such, it would be very informative to provide more details of how the architecture was conceived, e,g. what guides the choice of hyperparameters, e.g. the number of blocks in relation to memory constraints, the type of activation or normalization, etc.
SC2 - It is understood that the authors trained their model only using images that show the presence of a plume. Why not using even images without a plume for the training?
SC3 - The size of the images used for the segmentation model is such that it would correspond to plume straight paths of <90 km, for which plume ages may be 1-25 hours, for typical ranges of transport speed. Continuous sources will most likely produce longer plumes, and this may result in the tendency of the method to split big plumes into smaller ones. The size of the images makes also the method very sensitive to "polluted" scenes, i.e. scenes with noisy background, which may be dominant for unsteady plumes, changing winds, or big emission events. What would be the computational cost or other penalty for extending the size of the input images?
SC4 - The emission estimation is based on fitting an ellipse to the identified plume, dividing the mass of the masked plume by the major axis of the fitted ellipse and multiplying this ratio by the wind speed (here taken as the ERA5 10-m wind fields provided with the L2 TROPOMI data products). The metric used to validate the approach is the ratio of the plume area to the fitted area. I method too simplistic, or erroneous for several reasons:
- SC4-a - Quantifying the emission rate as the total mass in the area of the plume divided by the plume length and multiplied by the plume mean velocity is not erroneous by itself and it is used by other established methods. However, the way that the variables are estimated in this study may only result in reasonable emission estimates for "well-behaved" plumes that result from steady emission at stable altitude and with stable winds. In reality, weak plumes (e.g. from industrial sources or passive volcanic degassing at low altitude) may be severely distorted by interaction with the ground, and large plumes may be too big to adapt to the assumptions behind this method. Variable source and wind conditions would results in plumes with heterogeneous shapes. Still the area of these plumes may be similar to the area of a fitted ellipse, without this meaning that the mass to length ratio are similar. I think it would be better to divide the measured mass above background inside the masked plume by an "equivalent plume length". This plume length could be estimated by estimating a mean mass density (kg/m2) inside the fitted ellipse, so that the product of this mass density and the area of the ellipse be equal to the observed mass of the plume. From this relation you could obtained the equivalent plume length (mass divided by pi and by a factor for the eccentricity obtained from the fitting).
- SC4-b - The wind speed at 10-m would not be representative for most volcanoes, and providing statistics of this wind field is not informative because the winds at that level will most likely not be correlated with the winds at plume level.
I understand that implementing corrections to varying plume altitude (important to correct column densities and for the choice of wind speed) may be too time-consuming, going against the advantages of this method. Therefore, I conclude that the method, as presented here, may be good at identifying potential plumes, but not on quantifying the emission.
SC5 - Source attribution is also a problem with this method. Several sources, especially volcanic, are evidently wrong, e,g, Iztaccíhuatl -> Popocatépetl, Cerro Bravo -> Nevado del Ruiz, Nyiragongo -> Nyiragongo/Nyamuragira, Chimborazo -> Sangay, Ampato -> Sabancaya. I think it is correct to keep this wrong attribution as a consequence of the potential pitfall of the method, but a column to the most likely source should also be provided. The pitfall is possibly originated from big column densities being discarded in the L2 data products of TROPOMI, or from unsteady emission.
SC6 - Given that the manuscript introduces a novel algorithm, it seems important to present in relation to other established methods for plume identification and quantification. There is a clear lack of relevant references, even to the documents describing the TROPOMI products, and to the several approaches used for proper SO2 quantification of emission (wind rotation, divergence, delta-M, back-trajectory, disk method, etc.)
SC7 - All figures presenting maps should include latitude, longitud, time, scale of column density, name of source.
Technical corrections:
L19 - SO2 lifetime is too dependent on environmental conditions to provide a single, representative estimate.
L34 - Provide full name and reference for the EDGAR inventory.
L53 - Reference to Carn et al., 2017 seems misplaced here (Arellano et al., 2021?)
L57 - Missing reference for the Network for Observation of Volcanic and Atmospheric Change (Galle, Arellano, ...)
L79 - What were the filters used for VCD, cloud cover, SZA or number of pixels at the edge of the swath?
L106-107 - Sentence not justified (see above).
L170 - Make sure that pixel size was homogeneous, since the resolution of TROPOMI pixels changed during the period of study. Were these values resampled before their use for training and testing of the algorithm?
L184 - Check reference to "?".
L187 - Add year to reference to "Ester et al.".
Fig9 - More correct to refer the activity to the Svartsengi volcanic system.
L252 - The definition of the coefficient of variation is wrong. If this definition was used, then the conclusions are also wrong.
L256 - I recommend to compare with the CAMS-GLOB-VOLC dataset, which is based on satellite- and ground-based observations. However, the entire section may just as well be entirely discarded, considering that the emission estimates are too unreliable for a proper comparison.
The reviewer thanks the Editor of AMT for the opportunity to review this manuscript.
Citation: https://doi.org/10.5194/egusphere-2025-5900-RC1 -
RC2: 'Comment on egusphere-2025-5900', Anonymous Referee #3, 05 Feb 2026
reply
The authors use a U-Net image segmentation model to detect SO2 plumes and measure their emissions from TROPOMI data. The detection part is well-done and creates a valuable database of detected SO2 plumes. The emission calculations, while promising, could benefit from some refinement. I suggest reducing the focus on emissions unless the wind-related concerns are thoroughly addressed. The manuscript is on the right track for publication after the revisions below.
- Model Performance: In line 104, the authors say the model's precision and recall are 65.7% and 74%. Does this mean 35% of predictions are incorrect and 26% of cases are missed? I assume this is decent for a U-Net model, but more explanation would help readers who are not familiar with U-Net understand the model's performance. Also, how does this accuracy compare to other plume detection techniques, including the authors' previous work? Providing precision and recall for volcano sources alone would be useful, as performance is likely better for larger sources. This would show that some errors come from the signal-to-noise level of SO2 observations.
- Training Data: The training truth is based on manually selected grid cells. While this makes sense due to lack of better options, manual labeling introduces uncertainties. Discussing these uncertainties would be helpful.
- Emission Calculation: Volcanic SO2 emissions have been estimated by Carn et al. and Vitali et al. More details can be found here: https://so2.gsfc.nasa.gov/. How is your method different? My main concern is the use of 10-meter U and V wind fields. As pointed out by the authors themselves, near-surface winds are not suitable for volcano plumes. While concerns about computation cost are valid, using winds at the correct height is essential for emission estimates. I recommend a sensitivity study to show the impact is limited, at least for one case. Otherwise, the emission estimates are less reliable. Comparing volcanic emissions with estimates by Carn et al. would also be strongly recommended.
- Data Availability: I suggest the authors make the plume database publicly available to support further research.
Citation: https://doi.org/10.5194/egusphere-2025-5900-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 172 | 77 | 24 | 273 | 11 | 10 |
- HTML: 172
- PDF: 77
- XML: 24
- Total: 273
- BibTeX: 11
- EndNote: 10
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1