the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Using a deep neural network to detect methane point sources and quantify emissions from PRISMA hyperspectral satellite images
Abstract. Anthropogenic emissions of methane (CH4) make up a considerable contribution towards the Earth’s radiative budget since pre-industrial times. This is because large amounts of methane are emitted from human activities and the global warming potential of methane is high. The majority of anthropogenic fossil methane emissions to the atmosphere originate from a large number of small (point) sources. Thus, detection and accurate, rapid quantification of such emissions is vital to enable the reduction of emissions to help mitigate future climate change. There exist a number of instruments on satellites that measure radiation at methane-absorbing wavelengths, which have sufficiently high spatial resolution that can be used for detecting highly spatially localised methane 'point sources' (areas on the order of km2). Searching for methane plumes in methane sensitive satellite images using classical methods, such as thresholding and clustering, can be useful but are time-consuming and often inaccurate. Here, we develop a deep neural network to identify and quantify methane point source emissions from hyperspectral imagery from the PRecursore IperSpettrale della Missione Applicativa (PRISMA) satellite with 30-m spatial resolution. The moderately high spectral and spatial resolution as well as considerable global coverage and free access to data make PRISMA a good candidate for methane plume detection. The neural network was trained with simulated synthetic methane plumes generated with the Large Eddy Simulation extension of the Weather Research and Forecasting model (WRF-LES), which we embedded into PRISMA images. The deep neural network was successful at locating plumes with F1-score, precision and recall of 0.95, 0.96 and 0.92, respectively, and was able to quantify emission rates with a mean error of 24 %. The neural network was furthermore able to locate several plumes in real-world images. We have thus demonstrated that our method can be effective in locating and quantifying methane point source emissions in near real time from 30-m resolution satellite data which can aid us in mitigating future climate change.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(4294 KB)
-
Supplement
(1064 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(4294 KB) - Metadata XML
-
Supplement
(1064 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2022-924', Anonymous Referee #1, 23 Nov 2022
Joyce et al. present a computer vision framework for detecting, masking, and quantifying methane point source plumes in PRISMA satellite data. Their method is successful in application to both synthetic (training/validation) and real satellite observations and shows great promise for application to increasingly large satellite methane datasets to rapidly identify and quantify methane point sources. The paper is creative, well-written, and a valuable contribution to the growing body of literature on space-based monitoring of methane emissions. It is well-suited for publication in AMT. I recommend accepting the paper for publication after the authors address the comments below and those of the other referee(s). My main suggestion is to better balance the discussion of strengths and weaknesses of the machine learning approach compared to classical physics-based methods for plume detection/quantification.
Comments
- L. 21: 1 km2 seems quite large for point sources except landfills. Point sources are usually on the order of m2. Are you referring to the area of a detected plume, or just trying to include a range of source types?
- L. 23: Classical methods are certainly time-consuming, but are they inaccurate? I don’t think so, e.g., see Sherwin et al. (2022): https://eartharxiv.org/repository/view/3465/
- L. 46: One of Daniel Cusworth’s papers would be an appropriate reference for the temporal emission variability of point sources. E.g., consider Cusworth et al. (2021): https://pubs.acs.org/doi/10.1021/acs.estlett.1c00173
- L. 67: “prone to errors owing to the substantial human intervention required”. This strikes me as a bit backwards; I would expect human intervention to produce the highest quality plume detection/delineation, just as human labeling of (for example) photos of cats and dogs produces the most accurate results. The problem is that human intervention is costly.
- L. 73-74: Varon et al. (2018) reported 15-65% additional error from uncertain wind speed.
- L. 198-201: If I understand correctly, the conversion of methane concentration to change in radiance does not account for the plume vertical distribution – i.e., the plume is first vertically integrated and then a single pressure/temperature value is used for the radiative transfer calculation. Do you expect this simplification to have a negligible effect?
- L. 219-226: I found this section hard to understand. Can you explain more clearly how a classical threshold helps with / is applied in the training procedure? Is it simply to create the ground truth plume masks for the automated plume masking task?
- L. 253-255: Suggest identifying which CNNs are encoder CNNs earlier, because it wasn’t clear to me that you are referring to the CNN for binary detection and the emission rate estimator (if I’m not mistaken). And the U-Net also involves an encoder branch, so this feels a bit ambiguous.
- Fig. 2 & Fig. 5: how does the logical output of the binary plume detection model get appended to the data cube that is passed to the emission rate estimator? Is it just a uniform channel of 1’s or 0’s?
- Section 2.5.1: It’s not clear to me what purpose this 1x1 convolution serves, can you rephrase?
- L. 297: Shouldn’t this be the “encoder” part of the model, not decoder? It’s encoding the input data to smaller dimensionality before applying the dense layer.
- L. 333: Can you provide some discussion of why the model might tend to overestimate concentration? Could be 1-2 sentences.
- Table 1 & general: I found myself wondering at several points whether “binary classification” refers only to the yes/no plume detection test, and/or to the U-net binary segmentation task (binary classification per pixel). Please clarify this distinction.
- Fig. 8: It’s interesting that your network overestimates concentrations but underestimates emissions. If the no-plume images are really to blame for this, then they seem to have a strong effect. This could be checked by retraining the final network without no-plume images. Whether or not you do that, I feel this finding deserves a bit more discussion.
- Fig. 10: Are the retrieval fields in the left column of the figure from your network or a physics-based retrieval?
- L. 381: Are you saying that your network found 14/21 plumes? Or was that the result of a classical detection scheme?
- L. 390-391: “only one quarter of the images were deemed suitable to be analysed via clustering algorithms”. On what basis?
- L. 391 & elsewhere: “clustering algorithms”. It’s not clear to me what you mean by this. Classical thresholding is clear, but what clustering techniques are you referring to?
- L. 393: “accurately locate” I thought those statistics were for the binary detection, but this wording would suggest they were for the U-Net segmentation. See my comment above on “Table 1 & general”. Can you clarify?
- L. 396: “which is considerably lower than that obtained from classical methods”. Is that true? 25% mean absolute error seems similar to what's reported in the Sherwin et al. (2022) controlled release study, for which participants used classical methods. Furthermore, the classical methods appear to have near-zero bias (Varon et al., 2018; Sherwin et al., 2022), despite significant error spread, whereas your method has 17% bias. And your 40% interquartile range also seems not so different from a ~50% error standard deviation.
- Adding to my previous comment: How does the error standard deviation (spread) of your method compare with the ~50% classical error? Or put the other way, how does your 25% mean absolute relative error compare to the same quantity for the classical methods? My impression is that your ML network is much more efficient than classical methods, but likely less accurate -- and it’s not clear to me that it’s more precise (except in application to multiple nearby point sources, where, as Jongaramrungruang et al. 2019 point out, using a wind-direction-independent method leads to error cancellation in the emission sum). A more careful comparison of the merits of this work compared to previous methods would be helpful.
- L. 399-410: “where the error in emission rate was greater than 50%”. But as you say, Jongaramrungruang et al. did much better than that. In addition to the possible reasons you give, could it be because they did plume detection/quantification on methane retrieval fields, whereas your model mainly relies on multi-channel radiance data? I.e., they combined physics and ML methods by processing the methane retrieval before applying their network, whereas your network is fully machine-learned. That would be an interesting finding, if true.
- General comment #2: Just a perspective to consider: I feel that the primary strength of your methodology is its potential for rapid application to increasingly large satellite datasets for methane, rather than achieving the most accurate and precise point source masks and emission rate estimates. I would expect careful human analysis to be generally superior in that respect (accuracy + precision), but perhaps not by much, and certainly highly inefficient compared to your work; clearly there aren’t enough human analysts to carefully process all the data from PRISMA, EMIT, EnMAP, Sentinel-2, Landsat, etc. That your method automates plume detection/quantification with performance comparable to human analysis, even with some low bias, is a major accomplishment.
- L. 430: Again, it’s not clear to me that your method is more successful than classical approaches in quantifying emission rates, given the combination of low bias and prediction spread you find. If I'm wrong, please just clarify the comparisons throughout the manuscript.
Technical corrections
- L. 284: section 2.4.1 --> section 2.5.1
- L. 296: section 2.4.1 --> section 2.5.1
- L. 369: section heading duplicated
References
Cusworth et al. (2021) https://pubs.acs.org/doi/10.1021/acs.estlett.1c00173
Jongaramrungruang et al. (2019) https://amt.copernicus.org/articles/12/6667/2019/
Sherwin et al. (2022) https://eartharxiv.org/repository/view/3465/
Citation: https://doi.org/10.5194/egusphere-2022-924-RC1 - AC1: 'Reply on RC1', Peter Joyce, 28 Mar 2023
-
RC2: 'Comment on egusphere-2022-924', Luis Guanter, 02 Dec 2022
The manuscript by Joyce et al. presents an AI-based framework to detect and quantify methane emissions with the PRISMA satellite mission. This data processing formalism is proposed as an alternative to the methods currently used with PRISMA for the same purpose. These typically consist of (i) a data-driven retrieval to derive methane concentration enhancement maps, (ii) plume detection through visual inspection, and (iii) emission rate quantification using IME-based methods. The proposed AI-based formalism helps to circumvent and automate some of those steps, which can be very useful considering the recent increase in the volume of available spaceborne imaging spectroscopy data with the advent of EnMAP and EMIT.
I think the topic fits well in AMTD and is definitely of interest to the growing community dealing with high resolution methane mapping. The manuscript is well written and presented. I generally recommend publication, but I would like to request the authors to address the points below in their revision of the manuscript.
1) Testing on real data: I understand that the proposed methodology is expected to be globally applicable. PRISMA scenes from a wide range of site conditions are actually used for algorithm training (Table S1). However, only results from real plumes in Turkmenistan are presented. Turkmenistan is considered an optimal study region for plume detection, since surfaces are typically bright and homogeneous, and plumes are large. For the readers to get a better impression of the method’s performance, it would be great to see how the it works in other sites. In particular, the authors could use the PRISMA scenes and plume detections in the Permian Basin and the Shanxi region reported by Guanter et al. (2021), to which we gave access to the authors. Why were results from those sites not included?
2) Overall presentation of results: I feel that a stronger effort could be done in the analysis and presentation of results from real data. For example, by providing more information on the comparison between the proposed AI-based method and that of the existing “clustering and thresholding” methods. One could show the potential and limitations of each method, or where the AI-based method does not outperform the supervised method. Also, it would be useful to see more concentration enhancement maps, especially for the plumes which where not detected by the method. Detecting 14 out of 21 plumes with flux rates >1 t/h in Turkmenistan doesn’t sound that impressive, and it would be good if this could be discussed further.
Other points:
L21: “order of km2” the sources or the plumes? I guess the latter?
L29: what is a F1-score?
L20: (line numbering restarted in p4): regarding PRISMA CO2 retrievals, this https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020AV000350 could be cited
L45: Section 2.2: could you introduce here what this retrieval step is needed for? Not clear to me until much later in the document
L57: among data-driven retrieval methods, I think matched-filter retrievals are used in more studies so far than the PCA-based retrievals. Any reason why you chose the latter?
L74 (p7): I think the per-column processing is actually more important because of striping (column-wise changes in the instrument’s radiometric response)
L74 (p11, line numbering restarted again): There is no Sec 2.4.1.
Fig. 6: 3 of the plumes / emission rates selected for this figure are actually huge. Consider to use more normal cases of 1-3 t/h?
The authors might like to discuss this recent preprint on the same topic https://res.cloudinary.com/diywkbi34/image/upload/v1669115401/Marketing/COP27/Kayrros%20Science_%20Detecting%20Methane%20Plumes%20using%20PRISMA:%20Deep%20Learning%20Model%20and%20Data%20Augmentation.pdf
Citation: https://doi.org/10.5194/egusphere-2022-924-RC2 - AC2: 'Reply on RC2', Peter Joyce, 28 Mar 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2022-924', Anonymous Referee #1, 23 Nov 2022
Joyce et al. present a computer vision framework for detecting, masking, and quantifying methane point source plumes in PRISMA satellite data. Their method is successful in application to both synthetic (training/validation) and real satellite observations and shows great promise for application to increasingly large satellite methane datasets to rapidly identify and quantify methane point sources. The paper is creative, well-written, and a valuable contribution to the growing body of literature on space-based monitoring of methane emissions. It is well-suited for publication in AMT. I recommend accepting the paper for publication after the authors address the comments below and those of the other referee(s). My main suggestion is to better balance the discussion of strengths and weaknesses of the machine learning approach compared to classical physics-based methods for plume detection/quantification.
Comments
- L. 21: 1 km2 seems quite large for point sources except landfills. Point sources are usually on the order of m2. Are you referring to the area of a detected plume, or just trying to include a range of source types?
- L. 23: Classical methods are certainly time-consuming, but are they inaccurate? I don’t think so, e.g., see Sherwin et al. (2022): https://eartharxiv.org/repository/view/3465/
- L. 46: One of Daniel Cusworth’s papers would be an appropriate reference for the temporal emission variability of point sources. E.g., consider Cusworth et al. (2021): https://pubs.acs.org/doi/10.1021/acs.estlett.1c00173
- L. 67: “prone to errors owing to the substantial human intervention required”. This strikes me as a bit backwards; I would expect human intervention to produce the highest quality plume detection/delineation, just as human labeling of (for example) photos of cats and dogs produces the most accurate results. The problem is that human intervention is costly.
- L. 73-74: Varon et al. (2018) reported 15-65% additional error from uncertain wind speed.
- L. 198-201: If I understand correctly, the conversion of methane concentration to change in radiance does not account for the plume vertical distribution – i.e., the plume is first vertically integrated and then a single pressure/temperature value is used for the radiative transfer calculation. Do you expect this simplification to have a negligible effect?
- L. 219-226: I found this section hard to understand. Can you explain more clearly how a classical threshold helps with / is applied in the training procedure? Is it simply to create the ground truth plume masks for the automated plume masking task?
- L. 253-255: Suggest identifying which CNNs are encoder CNNs earlier, because it wasn’t clear to me that you are referring to the CNN for binary detection and the emission rate estimator (if I’m not mistaken). And the U-Net also involves an encoder branch, so this feels a bit ambiguous.
- Fig. 2 & Fig. 5: how does the logical output of the binary plume detection model get appended to the data cube that is passed to the emission rate estimator? Is it just a uniform channel of 1’s or 0’s?
- Section 2.5.1: It’s not clear to me what purpose this 1x1 convolution serves, can you rephrase?
- L. 297: Shouldn’t this be the “encoder” part of the model, not decoder? It’s encoding the input data to smaller dimensionality before applying the dense layer.
- L. 333: Can you provide some discussion of why the model might tend to overestimate concentration? Could be 1-2 sentences.
- Table 1 & general: I found myself wondering at several points whether “binary classification” refers only to the yes/no plume detection test, and/or to the U-net binary segmentation task (binary classification per pixel). Please clarify this distinction.
- Fig. 8: It’s interesting that your network overestimates concentrations but underestimates emissions. If the no-plume images are really to blame for this, then they seem to have a strong effect. This could be checked by retraining the final network without no-plume images. Whether or not you do that, I feel this finding deserves a bit more discussion.
- Fig. 10: Are the retrieval fields in the left column of the figure from your network or a physics-based retrieval?
- L. 381: Are you saying that your network found 14/21 plumes? Or was that the result of a classical detection scheme?
- L. 390-391: “only one quarter of the images were deemed suitable to be analysed via clustering algorithms”. On what basis?
- L. 391 & elsewhere: “clustering algorithms”. It’s not clear to me what you mean by this. Classical thresholding is clear, but what clustering techniques are you referring to?
- L. 393: “accurately locate” I thought those statistics were for the binary detection, but this wording would suggest they were for the U-Net segmentation. See my comment above on “Table 1 & general”. Can you clarify?
- L. 396: “which is considerably lower than that obtained from classical methods”. Is that true? 25% mean absolute error seems similar to what's reported in the Sherwin et al. (2022) controlled release study, for which participants used classical methods. Furthermore, the classical methods appear to have near-zero bias (Varon et al., 2018; Sherwin et al., 2022), despite significant error spread, whereas your method has 17% bias. And your 40% interquartile range also seems not so different from a ~50% error standard deviation.
- Adding to my previous comment: How does the error standard deviation (spread) of your method compare with the ~50% classical error? Or put the other way, how does your 25% mean absolute relative error compare to the same quantity for the classical methods? My impression is that your ML network is much more efficient than classical methods, but likely less accurate -- and it’s not clear to me that it’s more precise (except in application to multiple nearby point sources, where, as Jongaramrungruang et al. 2019 point out, using a wind-direction-independent method leads to error cancellation in the emission sum). A more careful comparison of the merits of this work compared to previous methods would be helpful.
- L. 399-410: “where the error in emission rate was greater than 50%”. But as you say, Jongaramrungruang et al. did much better than that. In addition to the possible reasons you give, could it be because they did plume detection/quantification on methane retrieval fields, whereas your model mainly relies on multi-channel radiance data? I.e., they combined physics and ML methods by processing the methane retrieval before applying their network, whereas your network is fully machine-learned. That would be an interesting finding, if true.
- General comment #2: Just a perspective to consider: I feel that the primary strength of your methodology is its potential for rapid application to increasingly large satellite datasets for methane, rather than achieving the most accurate and precise point source masks and emission rate estimates. I would expect careful human analysis to be generally superior in that respect (accuracy + precision), but perhaps not by much, and certainly highly inefficient compared to your work; clearly there aren’t enough human analysts to carefully process all the data from PRISMA, EMIT, EnMAP, Sentinel-2, Landsat, etc. That your method automates plume detection/quantification with performance comparable to human analysis, even with some low bias, is a major accomplishment.
- L. 430: Again, it’s not clear to me that your method is more successful than classical approaches in quantifying emission rates, given the combination of low bias and prediction spread you find. If I'm wrong, please just clarify the comparisons throughout the manuscript.
Technical corrections
- L. 284: section 2.4.1 --> section 2.5.1
- L. 296: section 2.4.1 --> section 2.5.1
- L. 369: section heading duplicated
References
Cusworth et al. (2021) https://pubs.acs.org/doi/10.1021/acs.estlett.1c00173
Jongaramrungruang et al. (2019) https://amt.copernicus.org/articles/12/6667/2019/
Sherwin et al. (2022) https://eartharxiv.org/repository/view/3465/
Citation: https://doi.org/10.5194/egusphere-2022-924-RC1 - AC1: 'Reply on RC1', Peter Joyce, 28 Mar 2023
-
RC2: 'Comment on egusphere-2022-924', Luis Guanter, 02 Dec 2022
The manuscript by Joyce et al. presents an AI-based framework to detect and quantify methane emissions with the PRISMA satellite mission. This data processing formalism is proposed as an alternative to the methods currently used with PRISMA for the same purpose. These typically consist of (i) a data-driven retrieval to derive methane concentration enhancement maps, (ii) plume detection through visual inspection, and (iii) emission rate quantification using IME-based methods. The proposed AI-based formalism helps to circumvent and automate some of those steps, which can be very useful considering the recent increase in the volume of available spaceborne imaging spectroscopy data with the advent of EnMAP and EMIT.
I think the topic fits well in AMTD and is definitely of interest to the growing community dealing with high resolution methane mapping. The manuscript is well written and presented. I generally recommend publication, but I would like to request the authors to address the points below in their revision of the manuscript.
1) Testing on real data: I understand that the proposed methodology is expected to be globally applicable. PRISMA scenes from a wide range of site conditions are actually used for algorithm training (Table S1). However, only results from real plumes in Turkmenistan are presented. Turkmenistan is considered an optimal study region for plume detection, since surfaces are typically bright and homogeneous, and plumes are large. For the readers to get a better impression of the method’s performance, it would be great to see how the it works in other sites. In particular, the authors could use the PRISMA scenes and plume detections in the Permian Basin and the Shanxi region reported by Guanter et al. (2021), to which we gave access to the authors. Why were results from those sites not included?
2) Overall presentation of results: I feel that a stronger effort could be done in the analysis and presentation of results from real data. For example, by providing more information on the comparison between the proposed AI-based method and that of the existing “clustering and thresholding” methods. One could show the potential and limitations of each method, or where the AI-based method does not outperform the supervised method. Also, it would be useful to see more concentration enhancement maps, especially for the plumes which where not detected by the method. Detecting 14 out of 21 plumes with flux rates >1 t/h in Turkmenistan doesn’t sound that impressive, and it would be good if this could be discussed further.
Other points:
L21: “order of km2” the sources or the plumes? I guess the latter?
L29: what is a F1-score?
L20: (line numbering restarted in p4): regarding PRISMA CO2 retrievals, this https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020AV000350 could be cited
L45: Section 2.2: could you introduce here what this retrieval step is needed for? Not clear to me until much later in the document
L57: among data-driven retrieval methods, I think matched-filter retrievals are used in more studies so far than the PCA-based retrievals. Any reason why you chose the latter?
L74 (p7): I think the per-column processing is actually more important because of striping (column-wise changes in the instrument’s radiometric response)
L74 (p11, line numbering restarted again): There is no Sec 2.4.1.
Fig. 6: 3 of the plumes / emission rates selected for this figure are actually huge. Consider to use more normal cases of 1-3 t/h?
The authors might like to discuss this recent preprint on the same topic https://res.cloudinary.com/diywkbi34/image/upload/v1669115401/Marketing/COP27/Kayrros%20Science_%20Detecting%20Methane%20Plumes%20using%20PRISMA:%20Deep%20Learning%20Model%20and%20Data%20Augmentation.pdf
Citation: https://doi.org/10.5194/egusphere-2022-924-RC2 - AC2: 'Reply on RC2', Peter Joyce, 28 Mar 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
485 | 418 | 19 | 922 | 63 | 10 | 15 |
- HTML: 485
- PDF: 418
- XML: 19
- Total: 922
- Supplement: 63
- BibTeX: 10
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
3 citations as recorded by crossref.
- Monitoring and regression analysis of landfill surface temperatures using remote sensing and image processing techniques K. Sharma et al. 10.1080/01431161.2024.2372081
- Using a deep neural network to detect methane point sources and quantify emissions from PRISMA hyperspectral satellite images P. Joyce et al. 10.5194/amt-16-2627-2023
- Semantic segmentation of methane plumes with hyperspectral machine learning models V. Růžička et al. 10.1038/s41598-023-44918-6
Peter Joyce
Cristina Ruiz Villena
Yahui Huang
Alex Webb
Manuel Gloor
Fabien Hubert Wagner
Martyn P. Chipperfield
Rocío Barrio Guilló
Chris Wilson
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(4294 KB) - Metadata XML
-
Supplement
(1064 KB) - BibTeX
- EndNote
- Final revised paper