the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Probabilities of Detection of Methane Plumes by Remote Sensing and Implications for Inferred Emissions Distributions
Abstract. Strategies for mitigating methane emissions rely on understanding the underlying drivers of methane losses to the atmosphere. Observations of methane plumes emerging from point sources, combined with correct statistical interpretation, can provide key information. In this work, we examine a critical parameter, the probability of detection of a plume. For a given observing system, probability of detection is affected by the properties of the sensor, plume detection algorithm, observing conditions, and emission rate of the source. We parameterize relevant aspects of remotely sensed scenes containing plumes using a nondimensional observability parameter that predicts probability of detection. Our probability of detection model is trained using simulated plumes to capture natural variability in different meteorological conditions, and validated with data from controlled release experiments. We model probability of detection for two airborne imaging spectrometer systems, MethaneAIR and Insight M LeakSurveyorTM, and one high resolution satellite system, MethaneSAT. Monte Carlo simulations of emissions distributions implied by data from the extensive 2023 MAIRX campaign of MethaneAIR demonstrate the importance of an accurate probability of detection model, due to the heavy tailed emission distribution found in most oil and gas basins.
- Preprint
(6669 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 19 Mar 2026)
- RC1: 'Comment on egusphere-2026-115', Anonymous Referee #1, 25 Feb 2026 reply
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 176 | 116 | 18 | 310 | 10 | 16 |
- HTML: 176
- PDF: 116
- XML: 18
- Total: 310
- BibTeX: 10
- EndNote: 16
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript presents a method for modeling the probability of detection (P_d) of methane plumes from airborne and satellite remote sensing systems, and demonstrates its application to emissions distribution analysis of the Permian Basin. The topic is timely and relevant, but the manuscript has three major issues that should be addressed before publication: (1) structural and narrative weaknesses across the introduction, methods, and conclusion; (2) insufficient methodological detail to allow reproducibility; (3) the validation of the P_d model and the treatment of uncertainty would benefit from further development.
Major Comment 1
The paper would benefit from a clearer narrative structure. The introduction would be strengthened by background explanations of what probability of detection means, what threshold refers to, why P_d depends on emission rate, and why the research question is scientifically important. Much of Section 2 contains content (previous P_d studies, controlled release background, the concept of dispersed emissions, and their limitations) that would be more appropriate in the introduction. The introduction also has two logical gaps: the connection between characterizing emission rate contributions and the need for P_d is never explained, and the emission rate dependence of P_d is omitted from the third sentence in Introduction despite being central to the research question. Section 2 would be clearer if it focused on describing the methods and data used: what data were collected, what they look like, their uncertainties, and what steps were taken to address those uncertainties. The conclusion would also benefit from summarizing the key findings and answering the research question posed in the introduction, rather than focusing primarily on future implications.
Minor Comments:
Abstract, Sentence 1
The term "methane losses" often refers to the removal of methane from the atmosphere, for example through oxidation by OH radicals, which is not discussed anywhere in this manuscript. If the authors mean methane leakage or emissions to the atmosphere, the wording should be revised accordingly.
Introduction, Paragraph 1
P_d is introduced without definition. What is probability of detection and what does it mean physically?
The connection between the second and third sentences is unclear. The reader is told that characterizing contributions of different emission rates is important, but it is not explained why this requires knowledge of P_d.
The third sentence describes P_d as a function of sensor properties, observing conditions, and detection algorithms, but omits its dependence on emission rate, which is central to the research question that follows.
Introduction, Paragraph 2
"We develop a generalized approach to account for the factors that affect P_d." What does "generalized" mean here, and generalized compared to what?
"We utilize data from controlled release experiments, extended with image processing techniques, and supplemented with simulated plumes. Our approach enables us to model systems lacking available controlled release data such as MethaneSAT." The first sentence implies controlled release data is the primary dataset, while the second claims the approach works without it. These two sentences appear contradictory and could be clarified.
Introduction, Paragraph 3
The third paragraph discusses previous studies and their limitations, but would read more naturally before the research question in Paragraph 2.
The term "threshold" is used without definition. What does it mean for an observing system to have a higher or lower threshold?
The third paragraph mentions that simulated plumes have been used in previous studies (Rouet-Leduc and Hulbert (2024); Roger et al. (2025)), but does not state what is novel about this paper's approach. It would be helpful to explicitly state the novelty.
The concept of dispersed emissions is central to the analysis, as total emissions are defined as the sum of plume emissions and dispersed emissions, yet it is introduced for the first time in Section 2.7. This framework would be clearer if established in the Introduction.
P_d is never connected to its actual application in the paper as a post-detection correction tool used in the Monte Carlo simulation to correct for missed detections. This connection would help the reader understand why modeling P_d matters for the emissions distribution analysis.
Section 2.1
The content of Section 2.1, explaining what a useful P_d model should do and why intercomparison across systems requires accounting for sensor sensitivity, algorithm skill, and observing conditions, reads more as background material. The discussion of Conrad et al. (2023) and Bruno et al. (2024) similarly fits more naturally in a prior work review in the Introduction.
Conclusion
The conclusion would be strengthened by stating how the proposed P_d model compares to previous approaches, and what effect accounting for P_d had on the emissions distribution analysis.
The conclusion would also benefit from addressing the research question posed in the introduction: how can observations from different remote sensing systems be integrated to assess the emission-rate-dependent distribution of point sources?
Major Comment 2
Several sections would benefit from additional methodological detail to support reproducibility. The distinctions between MAIR, MAIR-E, and MAIRX campaigns are not explained well: which campaigns correspond to controlled release experiments, which data are used for training versus validation versus Monte Carlo analysis, and why they are treated separately. The controlled release experiment description leaves unanswered who releases the methane, and how the emission rate is controlled. The WRF-LES simulation setup would benefit from additional details including model resolution, domain, simulation time, inputs, and how emission rates are varied. The connection between the binary detection outcomes from Section 2.5 and the logistic regression training in Section 2.6 could be made more explicit. The Monte Carlo simulation in Section 2.7 would also benefit from additional details including the number of iterations, what is calculated in each iteration, and how results are aggregated.
Minor Comments:
Section 2.2
It is not clarified whether MAIR and MAIR-E correspond to the controlled release experiments described in Section 2.3. The relationship between campaign names and their purpose would be helpful to state explicitly.
The methodological differences between MethaneAIR, MethaneSAT, and LeakSurveyor are not described beyond pixel dimensions. What are the methods used for detecting XCH4, what are the measurement uncertainties, and what are the sampling frequencies?
Section 2.3
Several basic questions about the controlled release experiment are not addressed: who releases the methane, how is the emission rate controlled, how are the release and detection teams kept separate, and how are overpasses coordinated with the release timing?
It is unclear exactly what controlled release data this paper uses. Are both the 2021 and 2022 campaigns used? How many flights were conducted, how many releases were created and captured, and what range of conditions were tested? A summary table would be helpful.
"Controlled release experiments were attempted for MethaneSAT prior to its loss, but coordinating overpasses with release durations failed on available trials." This sentence raises questions, and could either be removed or expanded with references and details.
Section 2.4
The WRF-LES simulation setup would benefit from additional details: what is the model resolution and domain size, what is the simulation time period, how are emission rates varied, and what range of emission rates was tested?
It is not clear how the approximately 80,000 training scenes were generated. Were wind speed, cloud fraction, and aerosol optical depth varied? Was only one source assumed per pixel?
The process of creating the validation dataset from approximately 40 controlled release overpasses is not explained. Were model winds constrained by observed winds from the controlled release site?
"To reduce computation times, we used an idealized version of WRF-LES, which uses a time invariant upwind boundary condition." References or discussion of uncertainties associated with this idealization would be helpful.
Section 2.5
It is not clearly stated which algorithm is used by which observing system. If MethaneAIR used the divergence integral algorithm and MethaneSAT used the wavelet algorithm, this should be explicitly stated, as it affects the interpretation of the results throughout the paper.
The divergence integral and wavelet algorithms are not described in sufficient detail. What are the underlying assumptions, inputs, and known limitations of each method?
"Here, plume detection is fully automated; there is no manual quality assurance step to identify false positives, which are not a focus of this study." Since emission locations and rates are known in the WRF-LES simulations, false positives could in principle be identified. It would be helpful to clarify whether this was assessed.
Section 2.6
It is not explicitly stated what the training labels for the logistic regression are. We assume that the binary detection outcomes from Section 2.5 serve as labels and g as the input feature. Could the authors confirm this and make it explicit in the text?
Section 2.7
MAIRX appears without explanation of how it differs from MAIR and MAIR-E. It would be helpful to clarify why MAIR and MAIR-E are used for validation while MAIRX is used for the Monte Carlo analysis.
The Monte Carlo simulation would benefit from additional details: the number of iterations is not stated, it is unclear what quantity is calculated in each iteration, and it is not explained how results are aggregated.
Dispersed emissions are fixed at the average estimate across all valid scenes without justification. It would be worth discussing whether this choice affects the uncertainty in the emissions distribution estimate.
It is not described how the effect of P_d is isolated, that is, how the corrected and uncorrected emissions distributions are compared.
Major Comment 3
The paper would benefit from a more thorough validation of the P_d model and a more complete treatment of uncertainty. Model performance in Section 3.1 is described only qualitatively and validation is shown only for MethaneAIR. In Section 3.1, the explanation for the poorer performance of the divergence integral algorithm relies on algorithmic details that were not introduced in Section 2.5, making the discussion difficult to follow. In Section 3.2, the interpretation of Figures 4A and 4B is unclear because it is not established in Section 2 how both detection algorithms are applied to each observing system. In Section 3.3, the discrepancy with Warren et al. (2024) is discussed but it is not clear what the key methodological difference is between the two approaches and which is more appropriate. The steady state assumption is identified as a limitation without quantifying its impact on the results.
Minor Comments:
Section 3.1
No quantitative metrics such as RMSE, R2, or bias are reported. Figure 3 provides only a slope, which is insufficient to assess model fit. It would be helpful to include quantitative validation metrics for all systems and algorithms.
Validation is shown only for MethaneAIR. It is unclear whether the model was validated for MethaneSAT, and given that MethaneSAT P_d predictions are used in the Monte Carlo analysis, this would be an important addition.
The manuscript attributes the poorer performance of the divergence integral algorithm to its sensitivity to missing data, explaining that it requires intact boxes of XCH4 pixels while the wavelet does not. However, since neither algorithm is described in Section 2.5, I cannot follow this reasoning. It would be helpful to describe both algorithms in sufficient detail in Section 2.5 so that this discussion is interpretable.
Section 3.2, Figure 4
Figures 4A and 4B are difficult to interpret because it is not clearly established in Section 2 how both the divergence integral and wavelet algorithms are applied to MethaneAIR and MethaneSAT. It would be helpful to clarify which algorithm is used by which system, and whether the P_d curves for MethaneAIR and MethaneSAT are estimated using simulated training data while LeakSurveyor P_d is calculated directly from controlled release data.
The Insight M P_d curve is derived from approximately 100 controlled release scenes while the other algorithms use approximately 80,000 simulated scenes. This difference in training data size makes a direct comparison of algorithm skill difficult to interpret, and it would be worth discussing whether the higher uncertainty of the Insight M curve affects the conclusions drawn about its relative performance.
Section 3.3
The discrepancy with Warren et al. (2024) is attributed to two differences: the inclusion of wavelet-detected plumes and the absence of persistence weighting. However, it is not clear which of these is the more important factor, nor whether the steady state assumption used in this manuscript is more or less appropriate than the persistence weighting approach of Warren et al. (2024). Some discussion of the relative merits of the two approaches would help the reader assess the reliability of the results.
The impact of the steady state assumption on the results is not quantified, despite being acknowledged as a limitation. Given that this assumption underpins the entire emissions distribution analysis, some assessment of its impact would strengthen the paper.
The confidence intervals for the hypothetical lower sensitivity satellite are described as "much wider" without numerical values. Reporting confidence intervals across all three systems would help the reader assess the practical significance of the P_d correction.