Probabilities of Detection of Methane Plumes by Remote Sensing and Implications for Inferred Emissions Distributions

Manninen, Ethan; Chulakadabba, Apisada; Sargent, Maryann; Zhang, Zhan; Kamdar, Harshil; Warren, Jack; Roche, Sébastien; Chan Miller, Christopher; Kyzivat, Ethan; Benmergui, Joshua; Pittman, Jasna; Walker, Eleanor; Bushey, Jacob; Samra, Jenna; Hawthorne, Jacob; Luo, Bingkun; Nasr, Maya; Sun, Kang; Franklin, Jonathan; Liu, Xiong; Chen, Jia; Wofsy, Steven

doi:10.5194/egusphere-2026-115

Preprints

https://doi.org/10.5194/egusphere-2026-115

Preprints

03 Feb 2026

| 03 Feb 2026

Probabilities of Detection of Methane Plumes by Remote Sensing and Implications for Inferred Emissions Distributions

Ethan Manninen, Apisada Chulakadabba, Maryann Sargent, Zhan Zhang, Harshil Kamdar, Jack Warren, Sébastien Roche, Christopher Chan Miller, Ethan Kyzivat, Joshua Benmergui, Jasna Pittman, Eleanor Walker, Jacob Bushey, Jenna Samra, Jacob Hawthorne, Bingkun Luo, Maya Nasr, Kang Sun, Jonathan Franklin, Xiong Liu, Jia Chen, and Steven Wofsy

Abstract. Strategies for mitigating methane emissions rely on understanding the underlying drivers of methane losses to the atmosphere. Observations of methane plumes emerging from point sources, combined with correct statistical interpretation, can provide key information. In this work, we examine a critical parameter, the probability of detection of a plume. For a given observing system, probability of detection is affected by the properties of the sensor, plume detection algorithm, observing conditions, and emission rate of the source. We parameterize relevant aspects of remotely sensed scenes containing plumes using a nondimensional observability parameter that predicts probability of detection. Our probability of detection model is trained using simulated plumes to capture natural variability in different meteorological conditions, and validated with data from controlled release experiments. We model probability of detection for two airborne imaging spectrometer systems, MethaneAIR and Insight M LeakSurveyorTM, and one high resolution satellite system, MethaneSAT. Monte Carlo simulations of emissions distributions implied by data from the extensive 2023 MAIRX campaign of MethaneAIR demonstrate the importance of an accurate probability of detection model, due to the heavy tailed emission distribution found in most oil and gas basins.

How to cite. Manninen, E., Chulakadabba, A., Sargent, M., Zhang, Z., Kamdar, H., Warren, J., Roche, S., Chan Miller, C., Kyzivat, E., Benmergui, J., Pittman, J., Walker, E., Bushey, J., Samra, J., Hawthorne, J., Luo, B., Nasr, M., Sun, K., Franklin, J., Liu, X., Chen, J., and Wofsy, S.: Probabilities of Detection of Methane Plumes by Remote Sensing and Implications for Inferred Emissions Distributions, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2026-115, 2026.

Received: 09 Jan 2026 – Discussion started: 03 Feb 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Status: final response (author comments only)

RC1: 'Comment on egusphere-2026-115', Anonymous Referee #1, 25 Feb 2026

This manuscript presents a method for modeling the probability of detection (P_d) of methane plumes from airborne and satellite remote sensing systems, and demonstrates its application to emissions distribution analysis of the Permian Basin. The topic is timely and relevant, but the manuscript has three major issues that should be addressed before publication: (1) structural and narrative weaknesses across the introduction, methods, and conclusion; (2) insufficient methodological detail to allow reproducibility; (3) the validation of the P_d model and the treatment of uncertainty would benefit from further development.

Major Comment 1
The paper would benefit from a clearer narrative structure. The introduction would be strengthened by background explanations of what probability of detection means, what threshold refers to, why P_d depends on emission rate, and why the research question is scientifically important. Much of Section 2 contains content (previous P_d studies, controlled release background, the concept of dispersed emissions, and their limitations) that would be more appropriate in the introduction. The introduction also has two logical gaps: the connection between characterizing emission rate contributions and the need for P_d is never explained, and the emission rate dependence of P_d is omitted from the third sentence in Introduction despite being central to the research question. Section 2 would be clearer if it focused on describing the methods and data used: what data were collected, what they look like, their uncertainties, and what steps were taken to address those uncertainties. The conclusion would also benefit from summarizing the key findings and answering the research question posed in the introduction, rather than focusing primarily on future implications.
Minor Comments:
Abstract, Sentence 1
The term "methane losses" often refers to the removal of methane from the atmosphere, for example through oxidation by OH radicals, which is not discussed anywhere in this manuscript. If the authors mean methane leakage or emissions to the atmosphere, the wording should be revised accordingly.
Introduction, Paragraph 1
P_d is introduced without definition. What is probability of detection and what does it mean physically?
The connection between the second and third sentences is unclear. The reader is told that characterizing contributions of different emission rates is important, but it is not explained why this requires knowledge of P_d.
The third sentence describes P_d as a function of sensor properties, observing conditions, and detection algorithms, but omits its dependence on emission rate, which is central to the research question that follows.
Introduction, Paragraph 2
"We develop a generalized approach to account for the factors that affect P_d." What does "generalized" mean here, and generalized compared to what?
"We utilize data from controlled release experiments, extended with image processing techniques, and supplemented with simulated plumes. Our approach enables us to model systems lacking available controlled release data such as MethaneSAT." The first sentence implies controlled release data is the primary dataset, while the second claims the approach works without it. These two sentences appear contradictory and could be clarified.
Introduction, Paragraph 3
The third paragraph discusses previous studies and their limitations, but would read more naturally before the research question in Paragraph 2.
The term "threshold" is used without definition. What does it mean for an observing system to have a higher or lower threshold?
The third paragraph mentions that simulated plumes have been used in previous studies (Rouet-Leduc and Hulbert (2024); Roger et al. (2025)), but does not state what is novel about this paper's approach. It would be helpful to explicitly state the novelty.
The concept of dispersed emissions is central to the analysis, as total emissions are defined as the sum of plume emissions and dispersed emissions, yet it is introduced for the first time in Section 2.7. This framework would be clearer if established in the Introduction.
P_d is never connected to its actual application in the paper as a post-detection correction tool used in the Monte Carlo simulation to correct for missed detections. This connection would help the reader understand why modeling P_d matters for the emissions distribution analysis.
Section 2.1
The content of Section 2.1, explaining what a useful P_d model should do and why intercomparison across systems requires accounting for sensor sensitivity, algorithm skill, and observing conditions, reads more as background material. The discussion of Conrad et al. (2023) and Bruno et al. (2024) similarly fits more naturally in a prior work review in the Introduction.
Conclusion
The conclusion would be strengthened by stating how the proposed P_d model compares to previous approaches, and what effect accounting for P_d had on the emissions distribution analysis.
The conclusion would also benefit from addressing the research question posed in the introduction: how can observations from different remote sensing systems be integrated to assess the emission-rate-dependent distribution of point sources?

Major Comment 2
Several sections would benefit from additional methodological detail to support reproducibility. The distinctions between MAIR, MAIR-E, and MAIRX campaigns are not explained well: which campaigns correspond to controlled release experiments, which data are used for training versus validation versus Monte Carlo analysis, and why they are treated separately. The controlled release experiment description leaves unanswered who releases the methane, and how the emission rate is controlled. The WRF-LES simulation setup would benefit from additional details including model resolution, domain, simulation time, inputs, and how emission rates are varied. The connection between the binary detection outcomes from Section 2.5 and the logistic regression training in Section 2.6 could be made more explicit. The Monte Carlo simulation in Section 2.7 would also benefit from additional details including the number of iterations, what is calculated in each iteration, and how results are aggregated.
Minor Comments:
Section 2.2
It is not clarified whether MAIR and MAIR-E correspond to the controlled release experiments described in Section 2.3. The relationship between campaign names and their purpose would be helpful to state explicitly.
The methodological differences between MethaneAIR, MethaneSAT, and LeakSurveyor are not described beyond pixel dimensions. What are the methods used for detecting XCH4, what are the measurement uncertainties, and what are the sampling frequencies?
Section 2.3
Several basic questions about the controlled release experiment are not addressed: who releases the methane, how is the emission rate controlled, how are the release and detection teams kept separate, and how are overpasses coordinated with the release timing?
It is unclear exactly what controlled release data this paper uses. Are both the 2021 and 2022 campaigns used? How many flights were conducted, how many releases were created and captured, and what range of conditions were tested? A summary table would be helpful.
"Controlled release experiments were attempted for MethaneSAT prior to its loss, but coordinating overpasses with release durations failed on available trials." This sentence raises questions, and could either be removed or expanded with references and details.
Section 2.4
The WRF-LES simulation setup would benefit from additional details: what is the model resolution and domain size, what is the simulation time period, how are emission rates varied, and what range of emission rates was tested?
It is not clear how the approximately 80,000 training scenes were generated. Were wind speed, cloud fraction, and aerosol optical depth varied? Was only one source assumed per pixel?
The process of creating the validation dataset from approximately 40 controlled release overpasses is not explained. Were model winds constrained by observed winds from the controlled release site?
"To reduce computation times, we used an idealized version of WRF-LES, which uses a time invariant upwind boundary condition." References or discussion of uncertainties associated with this idealization would be helpful.
Section 2.5
It is not clearly stated which algorithm is used by which observing system. If MethaneAIR used the divergence integral algorithm and MethaneSAT used the wavelet algorithm, this should be explicitly stated, as it affects the interpretation of the results throughout the paper.
The divergence integral and wavelet algorithms are not described in sufficient detail. What are the underlying assumptions, inputs, and known limitations of each method?
"Here, plume detection is fully automated; there is no manual quality assurance step to identify false positives, which are not a focus of this study." Since emission locations and rates are known in the WRF-LES simulations, false positives could in principle be identified. It would be helpful to clarify whether this was assessed.
Section 2.6
It is not explicitly stated what the training labels for the logistic regression are. We assume that the binary detection outcomes from Section 2.5 serve as labels and g as the input feature. Could the authors confirm this and make it explicit in the text?
Section 2.7
MAIRX appears without explanation of how it differs from MAIR and MAIR-E. It would be helpful to clarify why MAIR and MAIR-E are used for validation while MAIRX is used for the Monte Carlo analysis.
The Monte Carlo simulation would benefit from additional details: the number of iterations is not stated, it is unclear what quantity is calculated in each iteration, and it is not explained how results are aggregated.
Dispersed emissions are fixed at the average estimate across all valid scenes without justification. It would be worth discussing whether this choice affects the uncertainty in the emissions distribution estimate.
It is not described how the effect of P_d is isolated, that is, how the corrected and uncorrected emissions distributions are compared.

Major Comment 3
The paper would benefit from a more thorough validation of the P_d model and a more complete treatment of uncertainty. Model performance in Section 3.1 is described only qualitatively and validation is shown only for MethaneAIR. In Section 3.1, the explanation for the poorer performance of the divergence integral algorithm relies on algorithmic details that were not introduced in Section 2.5, making the discussion difficult to follow. In Section 3.2, the interpretation of Figures 4A and 4B is unclear because it is not established in Section 2 how both detection algorithms are applied to each observing system. In Section 3.3, the discrepancy with Warren et al. (2024) is discussed but it is not clear what the key methodological difference is between the two approaches and which is more appropriate. The steady state assumption is identified as a limitation without quantifying its impact on the results.
Minor Comments:
Section 3.1
No quantitative metrics such as RMSE, R2, or bias are reported. Figure 3 provides only a slope, which is insufficient to assess model fit. It would be helpful to include quantitative validation metrics for all systems and algorithms.
Validation is shown only for MethaneAIR. It is unclear whether the model was validated for MethaneSAT, and given that MethaneSAT P_d predictions are used in the Monte Carlo analysis, this would be an important addition.
The manuscript attributes the poorer performance of the divergence integral algorithm to its sensitivity to missing data, explaining that it requires intact boxes of XCH4 pixels while the wavelet does not. However, since neither algorithm is described in Section 2.5, I cannot follow this reasoning. It would be helpful to describe both algorithms in sufficient detail in Section 2.5 so that this discussion is interpretable.
Section 3.2, Figure 4
Figures 4A and 4B are difficult to interpret because it is not clearly established in Section 2 how both the divergence integral and wavelet algorithms are applied to MethaneAIR and MethaneSAT. It would be helpful to clarify which algorithm is used by which system, and whether the P_d curves for MethaneAIR and MethaneSAT are estimated using simulated training data while LeakSurveyor P_d is calculated directly from controlled release data.
The Insight M P_d curve is derived from approximately 100 controlled release scenes while the other algorithms use approximately 80,000 simulated scenes. This difference in training data size makes a direct comparison of algorithm skill difficult to interpret, and it would be worth discussing whether the higher uncertainty of the Insight M curve affects the conclusions drawn about its relative performance.
Section 3.3
The discrepancy with Warren et al. (2024) is attributed to two differences: the inclusion of wavelet-detected plumes and the absence of persistence weighting. However, it is not clear which of these is the more important factor, nor whether the steady state assumption used in this manuscript is more or less appropriate than the persistence weighting approach of Warren et al. (2024). Some discussion of the relative merits of the two approaches would help the reader assess the reliability of the results.
The impact of the steady state assumption on the results is not quantified, despite being acknowledged as a limitation. Given that this assumption underpins the entire emissions distribution analysis, some assessment of its impact would strengthen the paper.
The confidence intervals for the hypothetical lower sensitivity satellite are described as "much wider" without numerical values. Reporting confidence intervals across all three systems would help the reader assess the practical significance of the P_d correction.

Citation: https://doi.org/10.5194/egusphere-2026-115-RC1
RC2:
'Comment on egusphere-2026-115', Anonymous Referee #2, 10 Mar 2026
Manninen et al. present an analysis of probability of detection (POD) functions for three systems remotely measuring methane emissions from air (MethaneAIR and Insight-M LeakSurveyor) and space (MethaneSAT). This work is relevant to the readership of AMT – indeed there are plenty of similar studies in the literature – but requires major revisions to (1) enable reproduction of the study and (2) justify this work as an improvement over existing methods.
Major comments:
The method seems to lean heavily on simulated plumes. The authors justify this need (e.g., controlled releases are challenging and expensive) but do not fully detail the methods to create the plumes nor justify to the reader that these simulated plumes are realistic. For example, some simulated plumes were generated “using image processing methods” without further description. As written, the study is not reproducible.

There were controlled release experiments but also myriad simulation data (e.g., lines 98 and 110 suggest up to 142,000 simulated images). It is unclear what data exactly are being used in their POD fitting
Were all the data used in the fitting algorithm?

How do fitted POD models vary if considering just one of the simulated datasets at a time (i.e., what is the sensitivity to the fitting data)?

Are the controlled release data (the manuscript does not indicate the quantity) sufficient to not be overpowered by the huge simulation datasets when fitting?

Could the “skillfulness” of Insight-M’s plume detection algorithms be due to bias in the data included in the fit (I don’t believe it uses any simulation data)?

The method disregards published approaches in fitting POD models on the argument of “simplicity”. Specifically, they use the model of Bruno et al. that forces a dimensionless “observability” coupled with a logistic link function. This dimensionless metric assumes that, with all else constant, POD is constant in wind-normalized emission rate. This might be justified for Gaussian-like plumes, but it is known to be erroneous for real-world plumes. Conrad et al. and later Thorpe et al. (DOI: 10.1016/j.rse.2024.114435), who used more complex approaches, both explicitly showed that “observability” is not simply inversely proportional to wind. Moreover, these papers show that model quality varies greatly across different functional forms. How do the authors justify using only the simple model presented? Have they assessed the performance of other fits? Other fits that could fit the underlying data better may yield different inferences/conclusions.

The authors simulate a “hypothetical satellite with lower XCH4 sensitivity [than MethaneSat]”. How much lower? Is this intended to simulate a real-world instrument? This seems arbitrary.

The emission distributions “with Pd weighting” is not described. What does this mean and how is it calculated?

The notion that controlled release testing can be “dispensed with” (line 246) because of the authors’ approach is editorializing and inappropriate. They have not proven that this method can be used in place of actual controlled release data; especially since the quality of the simulated data has not been addressed.

The authors finish by noting that “MethaneAIR and MethaneSAT are capable of detecting small enough plumes to effectively probe the heavy tailed emission rate distributions found in the Permian, as well as other North American basins”. Without noting that there are many other instruments that outperform (the now offline) MethaneSAT, this reads as editorializing – can’t Insight-M and also Bridger Photonics, PRISMA, EnMAP, EMIT, Carbon Mapper’s Tanager, GHGSat, etc. also do this? Moreover, I don’t believe that the authors have assessed other North American basins.

Figure A1 is not discussed in the main text and only cursorily discussed in the appendix (it isn’t clear how we should interpret ROC or how this was derived). Is it relevant to the paper? Moreover, (see line 264), Conrad et al.’s method does explicitly consider aircraft altitude and (see line 137) Thorpe et al.’s method inherently considers flight altitude in their gas concentration noise parameterization.

Minor comments:
Line 37: The study of POD is in service of many objectives, not just “plume counts”.

Line 40: XCH4 is introduced without definition.

Line 42: “algorithms and 3)”

Line 70: replace “MethaneSAT is” with “MethaneSAT was”.

Figure 1: Rendering issue in legend.

Line 123: A nonzero false positive rate implies a nonzero POD at an emission rate of zero. What then are the implications of not quality controlling these data?

Line 146: If a scene is mostly clean, the 10% cutoff on pixel’s could significantly bias the estimate of gas concentration noise. Have the authors assessed sensitivity to this cutoff?

Line 192: I’m surprised that the POD between an aircraft and satellite is relatively small even at low wind speeds. Can you describe how this can be true?

Line 204-205: This is editorializing. Why is this better than the aforementioned methods already in the literature (see also comment at line 264)?

Line 212: Should this be Table 2?

Line 220: Presumably, the emission rate of an observed plume depends on the resolution of the instrument. A low-spatial resolution satellite could effectively aggregate many small sources within a single pixel. Do the authors have a comment on this?

Table A1: Include the functional forms of the fitted model.

Throughout: I believe Insight-M was recently acquired by Zeitview. Consider noting this.
Citation: https://doi.org/10.5194/egusphere-2026-115-RC2

Viewed

Total article views: 408 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
238	150	20	408	15	30

HTML: 238
PDF: 150
XML: 20
Total: 408
BibTeX: 15
EndNote: 30

Views and downloads (calculated since 03 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	183	123	18	324
Mar 2026	55	27	2	84

Cumulative views and downloads (calculated since 03 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	183	123	18	324
Mar 2026	55	27	2	84

Viewed (geographical distribution)

Total article views: 417 (including HTML, PDF, and XML) Thereof 417 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 18 Mar 2026

Short summary

In this work, my coauthors and I interpret methane emissions observed by remote sensing systems as plumes, or point sources. We model the probability that a general imaging spectrometer will detect a plume, and apply this framework to multiple remote sensing systems. We show how two recent systems- MethanAIR and MethaneSAT- provide enough sensitivity to facility scale point sources to effectively characterize most plume emissions in the Permian Basin.


Total:	0
HTML:	0
PDF:	0
XML:	0