the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Single-blind test of nine methane-sensing satellite systems from three continents
Sahar El Abbadi
Philippine Burdeau
Zhan Zhang
Zhenlin Chen
Jeffrey Rutherford
Yuanlei Chen
Adam Brandt
Abstract. Satellite-based remote sensing enables detection and mitigation of large point sources of climate-warming methane. These satellites will have the greatest impact if stakeholders have a clear-eyed assessment of their capabilities. We performed a single-blind test of nine methane-sensing satellites from three continents and five countries, including both commercial and government satellites. Over two months, we conducted 82 controlled methane releases during satellite overpasses. Six teams analyzed the resulting data, producing 134 estimates of methane emissions. Of these, 80 (58 %) were correctly identified, with 46 true positive detections (34 %) and 34 true negative non-detections (25 %). There were 41 false negatives and 0 false positives, in which teams incorrectly claimed methane was present. All eight satellites that were given a nonzero emission detected methane at least once, including the first single-blind evaluation of the EnMAP, Gaofen 5, and Ziyuan 1 systems. In percent terms, quantification error across all satellites and teams is similar to aircraft-based methane remote sensing systems, with 55 % of mean estimates falling within ±50 % of the metered value. Although teams correctly detected emissions as low as 0.03 metric tons of methane per hour, it is unclear whether detection performance in this test is representative of real-world field performance. Full retrieval fields submitted by all teams suggest that in some cases it may be difficult to distinguish true emissions from background artifacts without a known source location. Cloud interference is significant and appears to vary across teams and satellites. This work confirms the basic efficacy of the tested satellite systems in detecting and quantifying methane, providing additional insight into detection limits and informing experimental design for future satellite-focused controlled methane release testing campaigns.
Evan Sherwin et al.
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2023-1541', Anonymous Referee #1, 19 Aug 2023
Sherwin et al. evaluated the performance of satellite techniques for detecting and quantifying methane emissions through a single-blinded test. The test is well designed and carried out, providing timely and objective information critical for stakeholders and potential users. The technical complications (e.g., known vs. unknown location, clouds) are also well discussed. I appreciate that the authors documented the study from coordination to implementation in a great detail. I'd recommend the publication of this manuscript after the following comments are addressed.
Main comments
An important conclusion is that "quantification performance of satellite techniques approaches aircraft accuracy". But this statement is not elaborated. Only a brief comparison with values from previous studies is made. In these previous tests, the tested flux may have very a different distribution than that in this study. I wonder how a different distribution of tested fluxes may affect the conclusion. This key finding may be better established if the analysis can be done more carefully. For example, the concern mentioned above may be addressed with an evaluation of the quantification accuracy for subsets with a similar distribution of metered flux. Moreover, this comparison in quantification performance is not the full picture and may mislead readers. Detection performance (detection limit) of satellite and aircraft technologies should also be compared, in addition to quantification performance.
In Page 15, the smallest detected emissions for each satellite are reported as a metric of detection performance. This information may be misleading. For example, both ZY1 and GF5 are only tested once. They are not tested with smaller fluxes, which is different from other missions that are tested with a range of fluxes. So, the "smallest detected emissions" from these missions should be interpreted differently. In addition, I wonder if it is possible to perform a more rigorous analysis of "observed detection limit". This should be possible for missions that are tested with a range of fluxes. And a comparison with theoretical detection limits (e.g., as reported in Jacob et al. 2022 ACP review) should bring additional insight.
Minor comments
Abstract: "...in which teams incorrectly claimed methane was present": The way this sentence is written is kinda confusing that whether this clause defines false positives or both false positives and false negatives.The authors report values as mean [min, max]. But it is never explicitly defined whether [min, max] presents +-SD, interquartile range, or 95% confidence interval. This should be specified at the first appearance.
Figure 3. What does the * sign besides the Gaofen-5 flux mean?
Figure 4. It is mentioned in page 13 that Maxar data are excluded from the main result. To be consistent with this, the authors may want to add regression lines and statistics for data with Maxar excluded in Figure 4.
Page 12: The paragraph starting with "Wind can vary substantially ...". When wind information is revealed to the teams, are they informed of the distribution of the wind, or only the mean wind for the overpass? Is it possible that the distribution contain additional information that can help the teams further improve their estimates.
Page 14: "However, Orbio Earth successfully detected all Sentinel-2 releases above 0.010 t/h...". The statement is misleading. All Sentinel-2 detections are above 1 t/h.
Citation: https://doi.org/10.5194/egusphere-2023-1541-RC1 -
RC2: 'Comment on egusphere-2023-1541', Anonymous Referee #2, 31 Aug 2023
The paper by Sherwin et al. evaluates satellites' performance in detecting and quantifying methane emissions from a fixed location. This work makes novel contribution to the literature as more satellites are being tasked for monitoring methane emissions. Kudos to the team - this is a very complicated field collaboration. I recommend accepting this paper with minor revisions as listed below:
1. The paper has referred to and cited many oil and gas methane studies. Assuming that one of the major usages of satellites is to monitoring methane emissions from oil and gas activities, could the authors add more context around how satellites can be deployed in the ever changing regulatory space? For example, whether these satellite could be used to monitor 'large releases' as defined by the new GHGRP rule.
2. I'm a bit surprised by the comparison between aerial technology and satellite performance (page 12 second to last paragraph). Were the emission rate tested for aerial technologies much lower than that of satellite?
3. Should success rate in generating usable datapoint be considered another metrics in evaluate satellite performance? For example, when an aerial technology is deployed, we expect to received usable data from their flyover. However, it seems like that's not the case for satellite which could be results from uncontrollable factors such as cloud coverage. Well not specific to any satellite, having a sense of the time period needed for a satellite to produce usable data would be helpful in their deployment for constant monitoring.
4. If these satellite are being tested at active oil and gas facilities. How would the testing setup be different?
Citation: https://doi.org/10.5194/egusphere-2023-1541-RC2 -
RC3: 'Comment on egusphere-2023-1541', Anonymous Referee #3, 17 Sep 2023
The manuscript by Sherwin et al. evaluates and documents in detail the capability to detect and measure methane point source emissions from point source satellite imagers that are currently in operation and have sufficient sensitivity to methane to detect emissions below 1.5 t/h. The information gathered in the document is highly important to clearly and transparently demonstrate the capability and limitations of these satellites and to guide stakeholders in assessing the reliability of these measurements. The manuscript is well written, and the experimental procedure is well detailed. I congratulate the authors and collaborators for the excellent work done here, and I would recommend the publication of this manuscript once the points and comments below are considered and corrected:
Major comments:
- Either in Table 1 or in Section S2, the spatial resolution (pixel size) of each satellite should be indicated, an essential parameter to understand the detection and attribution capability of emissions from space. Furthermore, this parameter is mentioned at the beginning of the discussion, but readers do not have this information in the manuscript.
- Page 15, section "Qualitatively assessing detection performance in the field", first sentence "The smallest emission detected by each team gives a sense of the minimum detection capabilities of each instrument," => I think that saying this sentence without nuances is dangerous, especially for satellites that have only been able to observe one emission during the experiment. The values given for each satellite are indeed relatively consistent with the literature for each of them, but in some cases, this leads to contradictions and can cause misunderstandings. For example, at the instrument level, EnMAP and PRISMA are very similar (with slight differences described in Roger et al. https://eartharxiv.org/repository/view/5235/), but the indicative detection limit estimated here is double for EnMAP than for PRISMA. The same happens with GF5, which is also similar to EnMAP and PRISMA, but GF5 has better spectral resolution at the same spatial resolution conditions, so we would expect a better detection capability than the other two satellites (this reasoning is explained in Jacob et al., 2022 https://acp.copernicus.org/articles/22/9617/2022/acp-22-9617-2022.html). I suggest rephrasing the sentence saying that the range of different flux rate emissions detected in this experiment gives an indication of the capabilities and, in the case of satellites not able to see the smallest emissions, of the limitations of each of the instruments. However, to set a detection limit for each of them, a larger number of detections is needed, ranging from true positives (when the satellite can see the emission) to false negatives (when the emission exists but the satellite cannot see it).
- Discussion, beginning of the second paragraph: I would say that the high detection limit of LanSat and Sentinel-2 is more related to their low spectral resolution (bandwidth) than to the swath. WV3 also has a relatively low spectral resolution compared to hyperspectral satellites (EnMAP, PRISMA, GF5, ZY1, HJ2B, and also GHGSat), but this is compensated by its very high spatial resolution. Indeed, spectral resolution is an essential parameter in methane detection capability (Jacob et al., 2022) that is not considered in this paper.
- Discussion, second paragraph, sentence "Of these, only PRISMA was given smaller emissions, with three of four teams correctly detecting 0.413 [0.410, 0.417] t/h, the smallest emission given to PRISMA. ": Again, I think that saying this sentence as it is is dangerous because it can be easily misinterpreted, implying that PRISMA has the best detection capability among the four hyperspectral satellites when EnMAP, GF5, and ZY1 have only had one detection occasion and have not had the opportunity to test their ability with smaller fluxes. I proposed to change this sentence to "Of these, only PRISMA has had the opportunity to be tested with emission fluxes below 1 t/h, correctly detecting 0.413 [0.410, 0.417] t/h, the smallest emission given to PRISMA".
- Considering that one of the major elements in the manuscript is methane (the second most important greenhouse gas whose anthropogenic emissions should be avoided due to its impact on global warming), for transparency, I would appreciate a section where authors clarify the total amount of methane released during the experiment. This can be addressed with a simple sentence in, for example, the experimental design section or with a separate section in the SI. For clarity, it would also be useful to compare that total amount emitted to a well-documented emissions event (equals x% of what was emitted in said event) or estimate for a region or sector to put readers in perspective.
- Section S.4.6.1: I think that adding the wind speed data from the reanalysis product that each group used for the initial estimate indicated in each image would help a lot in the interpretation of the results.
Minor comments:
- Table 1: Coverage HJ2. In the paragraph just before the table, the authors say that it is not clear whether HJ2 is targeted or global, but in the table, it is classified as global. If the text is correct, perhaps the table should read "no data" or similar?
- In Table 1, the revisit time that the authors indicate for WV3 and EnMAP is actually the repetition cycle. For PRISMA, they provide the revisit time but do not specify the repetition cycle. For consistency, I suggest indicating in the table the revisit time (WV3=1 day, PRISMA=7 days, and EnMAP=4 days) and in the annotations the repetition cycle (best resolution by looking at nadir). For PRISMA, the repetition cycle is 29 days https://www.eoportal.org/satellite-missions/prisma-hyperspectral#launch
- Page 7, last paragraph, when the authors say "or the precise location of ground-based equipment.", I would suggest, for clarity, adding "within the provided location coordinates" or similar as, in the first paragraph of the section, the authors are saying that "Participating teams were aware of the precise location coordinates of the test".
- Section "First-time single-blind detections from Chinese and European satellites" I suggest changing the title to "First-time single-blind detections from some of the satellites" or similar, as it may suggest that it is the first single-blind detection test from all European satellites taking part.
- Page 10 section "First-time single-blind detections from Chinese and European satellites" end of the paragraph: EnMAP has also been used and evaluated for methane detection in Roger et al. 2023 (still in preprint) https://eartharxiv.org/repository/view/5235/ which I think should be taken into account in the references.
- Figure 3: EnMAP/NJU window => I think that for consistency, it makes more sense to show the background Google Earth map with nothing overlaid since the authors already show the retrieval of the image "with nothing" in section 4 of the SI along with the rest of the retrievals, although this is not nothing critical.
- In the figure caption of Figure 3, it is not mentioned what the * of the 1.3 t/h of Gaofen5 is
- In Figure 3, Gaofen 5 and Ziyuan 1 should go without a hyphen (-) for consistency with the rest of the text. Similarly, both satellites are presented as Gaofen 5 and Gaofen5-02 and Ziyuan 1 and Ziyuan 1-02 inconsistently throughout the text.
- Bibliographic references should be corrected and adapted to a single format. Some of the references are listed twice in the bibliography, others are not updated, and many have errors:
- References 2 and 44 are the same, but 44 is not updated, referring to the preprint of the paper.
- Reference 4: the correct link is this: https://amt.copernicus.org/articles/15/1657/2022/amt-15-1657-2022.html (no longer in discussion)
- References 15 and 51 are the same.
- The link in reference 57 does not work, but I would say it is the same as in reference 21
- References 32 and 35 are the same.
- Reference 45 is not updated. The revised and published paper is this: https://www.sciencedirect.com/science/article/abs/pii/S0034425721003916
- Reference 70, the link does not work.
- Section S2. Participating satellites: in the description of all satellites (except ZY1), the spectral resolutions (Bandwidth) and spatial resolutions (pixel size) are missing, which are important parameters that significantly determine the sensitivity of the satellite to methane.
- Section S.2.6. PRISMA: "operating with a 7-day maximum revisit frequency." => operating with a 7-day maximum revisit frequency and 29-day nadir revisit frequency.
- Section S.4.6.1: I assume that the value of the estimated flux for each group in each of the images (in white in the figure with the masked plume) corresponds to stage 1, which is why the Maxar PRISMA estimates are not shown. If so (or not), I think it should be indicated at the beginning of the section or in the figure captions, and also the reason why the Maxar PRISMA data is missing.
- Page 6, last paragraph, and page 23, last paragraph, th and nd to November 15 and November 22, are missing (for consistency with the rest of the dates).
Citation: https://doi.org/10.5194/egusphere-2023-1541-RC3
Evan Sherwin et al.
Evan Sherwin et al.
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
272 | 0 | 0 | 272 | 2 | 1 |
- HTML: 272
- PDF: 0
- XML: 0
- Total: 272
- BibTeX: 2
- EndNote: 1
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1