the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Probabilities of Detection of Methane Plumes by Remote Sensing and Implications for Inferred Emissions Distributions
Abstract. Strategies for mitigating methane emissions rely on understanding the underlying drivers of methane losses to the atmosphere. Observations of methane plumes emerging from point sources, combined with correct statistical interpretation, can provide key information. In this work, we examine a critical parameter, the probability of detection of a plume. For a given observing system, probability of detection is affected by the properties of the sensor, plume detection algorithm, observing conditions, and emission rate of the source. We parameterize relevant aspects of remotely sensed scenes containing plumes using a nondimensional observability parameter that predicts probability of detection. Our probability of detection model is trained using simulated plumes to capture natural variability in different meteorological conditions, and validated with data from controlled release experiments. We model probability of detection for two airborne imaging spectrometer systems, MethaneAIR and Insight M LeakSurveyorTM, and one high resolution satellite system, MethaneSAT. Monte Carlo simulations of emissions distributions implied by data from the extensive 2023 MAIRX campaign of MethaneAIR demonstrate the importance of an accurate probability of detection model, due to the heavy tailed emission distribution found in most oil and gas basins.
- Preprint
(6669 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2026-115', Anonymous Referee #1, 25 Feb 2026
-
RC2: 'Comment on egusphere-2026-115', Anonymous Referee #2, 10 Mar 2026
Manninen et al. present an analysis of probability of detection (POD) functions for three systems remotely measuring methane emissions from air (MethaneAIR and Insight-M LeakSurveyor) and space (MethaneSAT). This work is relevant to the readership of AMT – indeed there are plenty of similar studies in the literature – but requires major revisions to (1) enable reproduction of the study and (2) justify this work as an improvement over existing methods.
Major comments:
- The method seems to lean heavily on simulated plumes. The authors justify this need (e.g., controlled releases are challenging and expensive) but do not fully detail the methods to create the plumes nor justify to the reader that these simulated plumes are realistic. For example, some simulated plumes were generated “using image processing methods” without further description. As written, the study is not reproducible.
- There were controlled release experiments but also myriad simulation data (e.g., lines 98 and 110 suggest up to 142,000 simulated images). It is unclear what data exactly are being used in their POD fitting
- Were all the data used in the fitting algorithm?
- How do fitted POD models vary if considering just one of the simulated datasets at a time (i.e., what is the sensitivity to the fitting data)?
- Are the controlled release data (the manuscript does not indicate the quantity) sufficient to not be overpowered by the huge simulation datasets when fitting?
- Could the “skillfulness” of Insight-M’s plume detection algorithms be due to bias in the data included in the fit (I don’t believe it uses any simulation data)?
- The method disregards published approaches in fitting POD models on the argument of “simplicity”. Specifically, they use the model of Bruno et al. that forces a dimensionless “observability” coupled with a logistic link function. This dimensionless metric assumes that, with all else constant, POD is constant in wind-normalized emission rate. This might be justified for Gaussian-like plumes, but it is known to be erroneous for real-world plumes. Conrad et al. and later Thorpe et al. (DOI: 10.1016/j.rse.2024.114435), who used more complex approaches, both explicitly showed that “observability” is not simply inversely proportional to wind. Moreover, these papers show that model quality varies greatly across different functional forms. How do the authors justify using only the simple model presented? Have they assessed the performance of other fits? Other fits that could fit the underlying data better may yield different inferences/conclusions.
- The authors simulate a “hypothetical satellite with lower XCH4 sensitivity [than MethaneSat]”. How much lower? Is this intended to simulate a real-world instrument? This seems arbitrary.
- The emission distributions “with Pd weighting” is not described. What does this mean and how is it calculated?
- The notion that controlled release testing can be “dispensed with” (line 246) because of the authors’ approach is editorializing and inappropriate. They have not proven that this method can be used in place of actual controlled release data; especially since the quality of the simulated data has not been addressed.
- The authors finish by noting that “MethaneAIR and MethaneSAT are capable of detecting small enough plumes to effectively probe the heavy tailed emission rate distributions found in the Permian, as well as other North American basins”. Without noting that there are many other instruments that outperform (the now offline) MethaneSAT, this reads as editorializing – can’t Insight-M and also Bridger Photonics, PRISMA, EnMAP, EMIT, Carbon Mapper’s Tanager, GHGSat, etc. also do this? Moreover, I don’t believe that the authors have assessed other North American basins.
- Figure A1 is not discussed in the main text and only cursorily discussed in the appendix (it isn’t clear how we should interpret ROC or how this was derived). Is it relevant to the paper? Moreover, (see line 264), Conrad et al.’s method does explicitly consider aircraft altitude and (see line 137) Thorpe et al.’s method inherently considers flight altitude in their gas concentration noise parameterization.
Minor comments:
- Line 37: The study of POD is in service of many objectives, not just “plume counts”.
- Line 40: XCH4 is introduced without definition.
- Line 42: “algorithms and 3)”
- Line 70: replace “MethaneSAT is” with “MethaneSAT was”.
- Figure 1: Rendering issue in legend.
- Line 123: A nonzero false positive rate implies a nonzero POD at an emission rate of zero. What then are the implications of not quality controlling these data?
- Line 146: If a scene is mostly clean, the 10% cutoff on pixel’s could significantly bias the estimate of gas concentration noise. Have the authors assessed sensitivity to this cutoff?
- Line 192: I’m surprised that the POD between an aircraft and satellite is relatively small even at low wind speeds. Can you describe how this can be true?
- Line 204-205: This is editorializing. Why is this better than the aforementioned methods already in the literature (see also comment at line 264)?
- Line 212: Should this be Table 2?
- Line 220: Presumably, the emission rate of an observed plume depends on the resolution of the instrument. A low-spatial resolution satellite could effectively aggregate many small sources within a single pixel. Do the authors have a comment on this?
- Table A1: Include the functional forms of the fitted model.
- Throughout: I believe Insight-M was recently acquired by Zeitview. Consider noting this.
Citation: https://doi.org/10.5194/egusphere-2026-115-RC2 -
AC1: 'Comment on egusphere-2026-115', Ethan Manninen, 26 May 2026
Dear Editor,We appreciate the comments by the reviewers.Referee one provided comments on narrative, structure, and reproduceability.We extensively restructured, improved methods clarity, and improved focus on the effects of probability of detection on emissions distributions.Referee two provided comments on suitability of P_d model training with simulated plumes, reproduceability.We added further justificatoin for training with simulated plumes.Detailed responses to both reviewers follow:---------- Response to Reviewer One ----------This manuscript presents a method for modeling the probability of detection (P_d) of methane plumes from airborne and satellite remote sensing systems, and demonstrates its application to emissions distribution analysis of the Permian Basin. The topic is timely and relevant, but the manuscript has three major issues that should be addressed before publication: (1) structural and narrative weaknesses across the introduction, methods, and conclusion; (2) insufficient methodological detail to allow reproducibility; (3) the validation of the P_d model and the treatment of uncertainty would benefit from further development.Major Comment 1The paper would benefit from a clearer narrative structure. The introduction would be strengthened by background explanations of what probability of detection means, what threshold refers to, why P_d depends on emission rate, and why the research question is scientifically important.a) Much of Section 2 contains content (previous P_d studies, controlled release background, the concept of dispersed emissions, and their limitations) that would be more appropriate in the introduction.Response:We added a description of the division of total emissions into dispersed emissions and discrete emissions in the introduction's first paragraph ("The instantaneous total CH$_4$ emissions of a region can be separated..." line 23 revised manuscript).We created a new paragraph in the introduction describing how P_d has been modeled in the past- the choice of functional form, and the procurement of training data e.g. from controlled release experiments (line 58 revised manuscript).We moved the pieces of the paragraph about the P_d functional form used by Conrad et al. (line 43 submitted manuscript) to this paragraph (60 revised manuscript).We moved the introduction of controlled release studies to a new paragraph (line 53 revised manuscript).b) The introduction also has two logical gaps: the connection between characterizing emission rate contributions and the need for P_d is never explained, and the emission rate dependence of P_d is omitted from the third sentence in Introduction despite being central to the research question.Response:We added inserted a new second paragraph ("When an XCH4 image overlaps a CH4 plume..." line 29 revised manuscript) describing the importance of emissions distributions to CH4 emissions mitigation efforts, and the relevance of P_d models to emissions distributions (line 34 revised manuscript).This paragraph fills the logical gap of "the emission rate dependance of P_d", ("Higher emission rates accumulate CH4 faster...", line 29 revised manuscript).This paragraph also details the relationship between P_d and emissions distributions to fill the logical gap between "characterizing emission rate contributions" and the "need for P_d" ("Without consideration of P_d..." in line 33 revised manuscript).c) Section 2 would be clearer if it focused on describing the methods and data used: what data were collected, what they look like, their uncertainties, and what steps were taken to address those uncertaintiesResponse: We added more detail explaining controlled release experiments (line 109) and simulations (line 130)To improve the focus of the methods, we moved Pd desiderata and motivations to the introduction.d) . The conclusion would also benefit from summarizing the key findings and answering the research question posed in the introduction, rather than focusing primarily on future implications.Response:We summarized the key findings w/r/t emissions distributions and Pd and improved focus by removing "Missions that seek to constrain ...."Minor Comments:Abstract, Sentence 1- The term "methane losses" often refers to the removal of methane from the atmosphere, for example through oxidation by OH radicals, which is not discussed anywhere in this manuscript. If the authors mean methane leakage or emissions to the atmosphere, the wording should be revised accordingly.Response: In line 1, we replaced "methane losses" with "methane emissions" to remove ambiguity with chemical loss processes.Introduction, Paragraph 1- P_d is introduced without definition. What is probability of detection and what does it mean physically?Response: We added a sentence introducing P_d and explaining that plumes are not always able to be isolated from background variability- "When an XCH4 scene overlaps a CH4 plume..." (line 29 revised manuscript)- The connection between the second and third sentences is unclear. The reader is told that characterizing contributions of different emission rates is important, but it is not explained why this requires knowledge of P_d.Response: We added sentences to explain the relationship between P_d and emissions distributions "Without consideration of P_d..." (line 33 revised manuscript) and "More broadly..." (line 34 revised manuscript)- The third sentence describes P_d as a function of sensor properties, observing conditions, and detection algorithms, but omits its dependence on emission rate, which is central to the research question that follows.Response: We added Higher emission rates accumulate CH$_4$ faster, resulting in plumes with higher contrast over larger areas, which are easier to discern in XCH$_4$ images. (line 31)Introduction, Paragraph 2- "We develop a generalized approach to account for the factors that affect P_d." What does "generalized" mean here, and generalized compared to what?Response: We replace "We develop a generalized approach..." (line 20, submitted manuscript) with "We develop an approach to model Pd that does not require ..." (line 79 revised manuscript).- "We utilize data from controlled release experiments, extended with image processing techniques, and supplemented with simulated plumes. Our approach enables us to model systems lacking available controlled release data such as MethaneSAT." The first sentence implies controlled release data is the primary dataset, while the second claims the approach works without it. These two sentences appear contradictory and could be clarified.Response: We explained in more detail the use of simulated plumes (training Pd) models) and controlled release plumes with image processing (validation) ("In the following sections, we explain how we fit our $P_d$ model on simulated data, and validated the model on image processing enhanced controlled release scenes." line 125).We added a new Table 1 to summarize this information.Introduction, Paragraph 3- The third paragraph discusses previous studies and their limitations, but would read more naturally before the research question in Paragraph 2.Response: We added two new paragraphs explaining previous methods for 1) P_d functional forms (line 58, ), and 2) procuring model training/fitting data (line 53, 70 revised manuscript). Some of the material for these paragraphs was originally in paragraph 3 of the introduction, and Section 2, some is new.- The term "threshold" is used without definition. What does it mean for an observing system to have a higher or lower threshold?Response: We replaced "threshold" with "detection threshold", which is now introduced "Some analyses of plume observations rely on a detection threshold..." (line 41 revised manuscript) (formerly in Section 2).- The third paragraph mentions that simulated plumes have been used in previous studies (Rouet-Leduc and Hulbert (2024); Roger et al. (2025)), but does not state what is novel about this paper's approach. It would be helpful to explicitly state the novelty.Response: We added a sentence explaining some of the differences in our study and the Roger et al study:"Roger et al. (preprint) use a different set of simulated plumes..." (line 77)- The concept of dispersed emissions is central to the analysis, as total emissions are defined as the sum of plume emissions and dispersed emissions, yet it is introduced for the first time in Section 2.7. This framework would be clearer if established in the Introduction.Response: We added a sentence to the first paragraph ("... total emissions can be separated into two components..." line 23 revised manuscript) to explain how total emissions can be separated into dispersed or area emissions and point or discrete or plume emissions.- P_d is never connected to its actual application in the paper as a post-detection correction tool used in the Monte Carlo simulation to correct for missed detections. This connection would help the reader understand why modeling P_d matters for the emissions distribution analysis.Response: In the new second paragraph, we explain how emissions distributions can be corrected via scaling by P_d ("More broadly..." line 33 revised manuscript)Section 2.1- The content of Section 2.1, explaining what a useful P_d model should do and why intercomparison across systems requires accounting for sensor sensitivity, algorithm skill, and observing conditions, reads more as background material. The discussion of Conrad et al. (2023) and Bruno et al. (2024) similarly fits more naturally in a prior work review in the Introduction.Response: We moved the Pd desiderata section and the prior work on Pd to the introduction.Conclusion- The conclusion would be strengthened by stating how the proposed P_d model compares to previous approaches, and what effect accounting for P_d had on the emissions distribution analysis.Response: We added explanation in the conclusion how this Pd method differs from prior Pd methods (line 370).We further explained the effect of modeled Pd on emissions distributions. (line 375)- The conclusion would also benefit from addressing the research question posed in the introduction: how can observations from different remote sensing systems be integrated to assess the emission-rate-dependent distribution of point sources?Response: We removed "how can observations from different remote sensing systems be integrated ..." as the driving science question of the work. Now more focus is on the effect of Pd on emissions distributions.Major Comment 2Several sections would benefit from additional methodological detail to support reproducibility.a) The distinctions between MAIR, MAIR-E, and MAIRX campaigns are not explained well: which campaigns correspond to controlled release experiments, which data are used for training versus validation versus Monte Carlo analysis, and why they are treated separately.Response: We clarified with "MethaneAIR flew the MAIR (summer 2021) and MAIR-E (summer 2022) research campaigns to develop emissions quantification algorithms. The MAIRX campaign quantified emissions from basins across North America in May-October 2023." (line 90)and "MethaneAIR flew controlled release experiments in 2021 in the Permian Basin, Texas, USA (MAIR) and in 2022 in central Arizona, USA (MAIR-E) (\cite{chulakadabbaMethanePointSource2023,elabbadiTechnologicalMaturityAircraftBased2024})." (line 112)b) The controlled release experiment description leaves unanswered who releases the methane, and how the emission rate is controlled.Response:To increase clarity, we added: "For both controlled release experiments, CH4 was released..." line 113and included reference to further details on the release methods contained in Chulakadabba et al. 2023 and El Abbadi et al 2024.We acknowledge uncertainty in the metered rate, which is not accounted for in our validation set "There is uncertainty in the metered release rate..." (line 110).c) The WRF-LES simulation setup would benefit from additional details including model resolution, domain, simulation time, inputs, and how emission rates are varied.Response: We added detail on the WRF-LES setup: "We created 16 WRF-LES simulations, one for each 1 m/s increment..." line 134, and several sentences thereafter.d) The connection between the binary detection outcomes from Section 2.5 and the logistic regression training in Section 2.6 could be made more explicit. The Monte Carlo simulation in Section 2.7 would also benefit from additional details including the number of iterations, what is calculated in each iteration, and how results are aggregated.Response: We added more detail on # of iterations and each iteration's calculation and aggregation in the Monte Carlo, beginning with:For 1,000 simulations, we resampled with replacement ..." line 252Minor Comments:Section 2.2- It is not clarified whether MAIR and MAIR-E correspond to the controlled release experiments described in Section 2.3. The relationship between campaign names and their purpose would be helpful to state explicitly.Response: We clarified with "MethaneAIR flew the MAIR (summer 2021) and MAIR-E (summer 2022) research campaigns to develop emissions quantification algorithms. The MAIRX campaign quantified emissions from basins across North America in May-October 2023." (line 90)and "MethaneAIR flew controlled release experiments in 2021 in the Permian Basin, Texas, USA (MAIR) and in 2022 in central Arizona, USA (MAIR-E) (\cite{chulakadabbaMethanePointSource2023,elabbadiTechnologicalMaturityAircraftBased2024})." (line 90)- The methodological differences between MethaneAIR, MethaneSAT, and LeakSurveyor are not described beyond pixel dimensions. What are the methods used for detecting XCH4, what are the measurement uncertainties, and what are the sampling frequencies?Response:We added to " XCH4 is inverted from retrieved backscattered solar spectra following the XCO2 proxy method described by \cite{chanmillerMethaneRetrievalMethaneAIR2024}, with the prior XCO2 derived from profiles generated using the GINPUT algorithm (\cite{laughnerNewAlgorithmGenerate2023})." line 95"MethaneAIR XCH$_4$ retrievals have been validated to ~1% precision against ground based measurements Chan Miller et al. 2024." line 97"The LeakSurveyor retrieval algorithms, XCH4 images, and XCH4 sensitivities are all proprietary." line 104Section 2.3- Several basic questions about the controlled release experiment are not addressed: who releases the methane, how is the emission rate controlled, how are the release and detection teams kept separate, and how are overpasses coordinated with the release timing?Response:To increase clarity, we added: "For both controlled release experiments, CH4 was released..." line 113and included reference to further details on the release methods contained in Chulakadabba et al. 2023 and El Abbadi et al 2024.We acknowledge uncertainty in the metered rate, which is not accounted for in our validation set "There is uncertainty in the metered release rate..." line 110.- It is unclear exactly what controlled release data this paper uses. Are both the 2021 and 2022 campaigns used? How many flights were conducted, how many releases were created and captured, and what range of conditions were tested? A summary table would be helpful.Response: We add "Details of each controlled release overpass can be found in Tables S2-S6 in \cite{chulakadabbaMethanePointSource2023}." line 116- "Controlled release experiments were attempted for MethaneSAT prior to its loss, but coordinating overpasses with release durations failed on available trials." This sentence raises questions, and could either be removed or expanded with references and details.Response: We replaced this sentence with "There is no controlled release data available for MethaneSAT." (line 128)Section 2.4- The WRF-LES simulation setup would benefit from additional details: what is the model resolution and domain size, what is the simulation time period, how are emission rates varied, and what range of emission rates was tested?Response: We added detail on the WRF-LES setup: "We created 16 WRF-LES simulations, one for each 1 m/s increment..." line 134, and several sentences thereafter.- It is not clear how the approximately 80,000 training scenes were generated. Were wind speed, cloud fraction, and aerosol optical depth varied? Was only one source assumed per pixel?Response:We added: "We did not vary cloud fraction or any other source of missing data in the simulated scenes" (line 145).We added: "Varying gas concentration noise would vary aerosol optical depth" (145).We added: "Each simulated scene contained only one source" (line 146).- The process of creating the validation dataset from approximately 40 controlled release overpasses is not explained. Were model winds constrained by observed winds from the controlled release site?Response:"We scaled the XCH$_4$ values only within the plume..." line 150"Thus, these enhanced controlled release scenes have varying emission rate, pixel area, and gas concentration noise, but the same set of wind speeds as the original controlled release experiment." line 154- "To reduce computation times, we used an idealized version of WRF-LES, which uses a time invariant upwind boundary condition." References or discussion of uncertainties associated with this idealization would be helpful.Response: We added " Idealized WRF-LES has been used to accurately quantify emission rates from plume images, indicating that it is a realistic representation of plumes in the atmosphere (\cite{varonQuantifyingMethanePoint2018, chulakadabbaMethanePointSource2023})." line 131Section 2.5a) It is not clearly stated which algorithm is used by which observing system. If MethaneAIR used the divergence integral algorithm and MethaneSAT used the wavelet algorithm, this should be explicitly stated, as it affects the interpretation of the results throughout the paper.Response: We added: "Both the divergence integral and wavelet are applied to both MethaneAIR and MethaneSAT observations." line 167We also added further detail in the description of the methods of the emissions distribution: "In this analysis, we used the probability of either method detecting the plume = 1 - (1-Pd_div.int)(1-Pd_wavelet)" line 261b) The divergence integral and wavelet algorithms are not described in sufficient detail. What are the underlying assumptions, inputs, and known limitations of each method?Response: We discuss in more detail the thresholding and growing box method used for divergence integral, starting with "First, we threshold..." line 159.We added more detail about the wavelet method, and added a new reference to the paper in review describing the development of the wavelet plume detection methodology. "Further details on the wavelet algorthim.." line line 167c) "Here, plume detection is fully automated; there is no manual quality assurance step to identify false positives, which are not a focus of this study." Since emission locations and rates are known in the WRF-LES simulations, false positives could in principle be identified. It would be helpful to clarify whether this was assessed.Response: False positives in the WRF-LES simulation could be assessed, however a full study of false positive plume detections is under development.We also added a discussion of false positives, QA/QC, and effects on emission rate distributions beginning with "When generating the training data set, plume detection is fully automated..." line 170Section 2.6a) It is not explicitly stated what the training labels for the logistic regression are. We assume that the binary detection outcomes from Section 2.5 serve as labels and g as the input feature. Could the authors confirm this and make it explicit in the text?Response:We replaced "Then, we fitted a logistic regression on the simulated scenes." with "Then, we fitted a logistic regression on the binary successful/failed detection ..." line 233Section 2.7a) MAIRX appears without explanation of how it differs from MAIR and MAIR-E. It would be helpful to clarify why MAIR and MAIR-E are used for validation while MAIRX is used for the Monte Carlo analysis.Response: We clarified with "MethaneAIR flew the MAIR (summer 2021) and MAIR-E (summer 2022) research campaigns to develop emissions quantification algorithms. The MAIRX campaign quantified emissions from basins across North America in May-October 2023." (line 90)and "MethaneAIR flew controlled release experiments in 2021 in the Permian Basin, Texas, USA (MAIR) and in 2022 in central Arizona, USA (MAIR-E) (Chulakadabba et al 2023)." (line 90)and "Targets for the MAIRX campaign are shown in Figure 1." line 93b) The Monte Carlo simulation would benefit from additional details: the number of iterations is not stated, it is unclear what quantity is calculated in each iteration, and it is not explained how results are aggregated.Response: We added more detail on # of iterations and each iteration's calculation and aggregation in the Monte Carlo, beginning with:For 1,000 simulations, we resampled with replacement ..." line 252c) Dispersed emissions are fixed at the average estimate across all valid scenes without justification. It would be worth discussing whether this choice affects the uncertainty in the emissions distribution estimate.Response: In the newly added details of the Monte Carlo, we added: "We included a Gaussian error term to represent uncertainty in the area emissions, with zero mean ..." line 254d) It is not described how the effect of P_d is isolated, that is, how the corrected and uncorrected emissions distributions are compared.Response: "Additionally, we calculated Pd using our fitted models, gas concentration noises and wind speeds from MAIRX scene." line 257"This enabled an emissions distribution with a Pd correction term, as described in the introduction. " line 257Major Comment 3The paper would benefit from a more thorough validation of the Pd model and a more complete treatment of uncertainty.a) Model performance in Section 3.1 is described only qualitatively and validation is shown only for MethaneAIR.Response: We added "We did not validate Pd model performance for either Insight M or MethaneSAT..." line 274b) In Section 3.1, the explanation for the poorer performance of the divergence integral algorithm relies on algorithmic details that were not introduced in Section 2.5, making the discussion difficult to follow.Response: We discuss in more detail the thresholding and growing box method used for divergence integral, starting with "First, we threshold..." line 159.We added more detail about the wavelet method, and added a new reference to the paper in review describing the development of the wavelet plume detection methodology. "Further details on the wavelet algorthim.." line 167c) In Section 3.2, the interpretation of Figures 4A and 4B is unclear because it is not established in Section 2 how both detection algorithms are applied to each observing system.Response: We clarified that both algorithms are applied to both MethaneAIR and MethaneSAT: We added: "Both the divergence integral and wavelet are applied to both MethaneAIR and MethaneSAT observations." line 167Insight M uses proprietary algorithms.d) In Section 3.3, the discrepancy with Warren et al. (2024) is discussed but it is not clear what the key methodological difference is between the two approaches and which is more appropriate. The steady state assumption is identified as a limitation without quantifying its impact on the results.Response: We added a new figure 7, showing uncertainties in steady state plume emissions distributions for N. American oil and gas basins.Qualitatively, these seem to indicate that frequency of rare high impact events is a key driver of uncertainty:( Paragraphs beginning with line 355)We quantitatively estimate uncertainties associated with steady state analyses of plume observations in a separate work currently in progress.Minor Comments:Section 3.1- No quantitative metrics such as RMSE, R2, or bias are reported. Figure 3 provides only a slope, which is insufficient to assess model fit. It would be helpful to include quantitative validation metrics for all systems and algorithms.Response: We moved reporting of R2 from the figure caption to the main body for readability (line 269).We note that we are unable to validate using controlled release for MethaneSAT and Insight M "We did not validate Pd model performance for either Insight M or MethaneSAT..." line 274- Validation is shown only for MethaneAIR. It is unclear whether the model was validated for MethaneSAT, and given that MethaneSAT P_d predictions are used in the Monte Carlo analysis, this would be an important addition.Response: We note that we are unable to validate using controlled release for MethaneSAT and Insight M "We did not validate Pd model performance for either Insight M or MethaneSAT..." line 274- The manuscript attributes the poorer performance of the divergence integral algorithm to its sensitivity to missing data, explaining that it requires intact boxes of XCH4 pixels while the wavelet does not. However, since neither algorithm is described in Section 2.5, I cannot follow this reasoning. It would be helpful to describe both algorithms in sufficient detail in Section 2.5 so that this discussion is interpretable.Response: We discuss in more detail the thresholding and growing box method used for divergence integral, starting with "First, we threshold..." line 159.We added more detail about the wavelet method, and added a new reference to the paper in review describing the development of the wavelet plume detection methodology. "Further details on the wavelet algorthim.." line 167Section 3.2, Figure 4- Figures 4A and 4B are difficult to interpret because it is not clearly established in Section 2 how both the divergence integral and wavelet algorithms are applied to MethaneAIR and MethaneSAT. It would be helpful to clarify which algorithm is used by which system, and whether the P_d curves for MethaneAIR and MethaneSAT are estimated using simulated training data while LeakSurveyor P_d is calculated directly from controlled release data.Response: We added: "Both the divergence integral and wavelet are applied to both MethaneAIR and MethaneSAT observations." line 167We also added further detail in the description of the methods of the emissions distribution: "In this analysis, we used the probability of either method detecting the plume = 1 - (1-Pd_div.int)(1-Pd_wavelet)" line 261W/r/t Insight M, we qualified our previous statement about the comparative skill of Insight M and added:"However, because the Insight M $P_d$ was only fit on limited controlled release scenes, and the images are proprietary, a complete analysis is impossible" line 296- The Insight M P_d curve is derived from approximately 100 controlled release scenes while the other algorithms use approximately 80,000 simulated scenes. This difference in training data size makes a direct comparison of algorithm skill difficult to interpret, and it would be worth discussing whether the higher uncertainty of the Insight M curve affects the conclusions drawn about its relative performance.W/r/t Insight M, we qualified our previous statement about the comparative skill of Insight M and added:"However, because the Insight M $P_d$ was only fit on limited controlled release scenes, and the images are proprietary, it is not possible to say for certain." line 296Section 3.3- The discrepancy with Warren et al. (2024) is attributed to two differences: the inclusion of wavelet-detected plumes and the absence of persistence weighting. However, it is not clear which of these is the more important factor, nor whether the steady state assumption used in this manuscript is more or less appropriate than the persistence weighting approach of Warren et al. (2024). Some discussion of the relative merits of the two approaches would help the reader assess the reliability of the results.Response:We add discussion of the impact of uncertainty from rare events.We suggest that these dominate other forms of uncertainty, ie due to area emissions estimates andWe will quantitatively treat the effect of persistence weighting in the aforementioned work in progress study on the steady state and uncertainty due to rare events.- The impact of the steady state assumption on the results is not quantified, despite being acknowledged as a limitation. Given that this assumption underpins the entire emissions distribution analysis, some assessment of its impact would strengthen the paper.Response:We added a new figure 7 that suggests that uncertainty from rare events is extremely significant.We added discussion to this effect starting at:"The Monte Carlo uncertainties in Figure \ref{emiss_dist_all_basins} indicate that the total emissions are sensitive to small shifts in the steady state prevalence of high emission rate events." line 355We will quantitatively treat the effect of persistence weighting in the aforementioned work in progress study on the steady state and uncertainty due to rare events.- The confidence intervals for the hypothetical lower sensitivity satellite are described as "much wider" without numerical values. Reporting confidence intervals across all three systems would help the reader assess the practical significance of the P_d correction.Response:We add in line reporting of the CI values from now table 3, on the relative contributions of different emission rates here: "(0\% - 63\% of total emissions from $>$500 \unit{kg/hr}..." line 329---------- Response to Reviewer Two ----------Major comments:1 The method seems to lean heavily on simulated plumes. The authors justify this need (e.g., controlled releases are challenging and expensive) but do not fully detail the methods to create the plumes nor justify to the reader that these simulated plumes are realistic. For example, some simulated plumes were generated “using image processing methods” without further description. As written, the study is not reproducible.Response:As justification that the simulated plumes are realistic representations of plumes in the atmosphere, we note prior successful works quantify emission rates from plume images:"Idealized WRF-LES has been used to accurately quantify emission rates from plume images, indicating that it is a realistic representation of plumes in the atmosphere (\cite{varonQuantifyingMethanePoint2018, chulakadabbaMethanePointSource2023})." line 130and"The Weather Research Forecasting, Large Eddy Simulation (WRF-LES) modeleffectively captures the stochastic behavior of plumes due to boundary layer turbulence \cite{gaudetExplorationImpactNearby2017, varonQuantifyingMethanePoint2018,chulakadabbaMethanePointSource2023}." line 742a) There were controlled release experiments but also myriad simulation data (e.g., lines 98 and 110 suggest up to 142,000 simulated images). It is unclear what data exactly are being used in their POD fittingResponse: We explained in more detail the use of simulated plumes (training Pd) models) and controlled release plumes with image processing (validation) ("In the following sections, we explain how we fit our $P_d$ model on simulated data, and validated the model on image processing enhanced controlled release scenes." line 125).We added a new Table 1 to summarize this information.b) Were all the data used in the fitting algorithm?Response: Only simulated scenes were used to train the Pd models.In the manuscript, We explained in more detail the use of simulated plumes (training Pd) models) and controlled release plumes with image processing (validation) ("In the following sections, we explain how we fit our $P_d$ model on simulated data, and validated the model on image processing enhanced controlled release scenes." line 125).We added a new Table 1 to summarize this information.c) How do fitted POD models vary if considering just one of the simulated datasets at a time (i.e., what is the sensitivity to the fitting data)?Response: Only simulated scenes were used to train the Pd models.In the manuscript, We explained in more detail the use of simulated plumes (training Pd) models) and controlled release plumes with image processing (validation) ("In the following sections, we explain how we fit our $P_d$ model on simulated data, and validated the model on image processing enhanced controlled release scenes." line 125). We added a new Table 1 to summarize this information.d) Are the controlled release data (the manuscript does not indicate the quantity) sufficient to not be overpowered by the huge simulation datasets when fitting?Response: We added more clarity indicating the number of controlled release overpasses in new table 1.e) Could the “skillfulness” of Insight-M’s plume detection algorithms be due to bias in the data included in the fit (I don’t believe it uses any simulation data)?Response: We qualified our previous statement about the comparative skill of Insight M and added:"However, because the Insight M $P_d$ was only fit on limited controlled release scenes, and the images are proprietary, it is not possible to say for certain." line 2963 a) The method disregards published approaches in fitting POD models on the argument of “simplicity”. Specifically, they use the model of Bruno et al. that forces a dimensionless “observability” coupled with a logistic link function. This dimensionless metric assumes that, with all else constant, POD is constant in wind-normalized emission rate. This might be justified for Gaussian-like plumes, but it is known to be erroneous for real-world plumes.Response:We added a paragraph of more context justifying our nondimensional form of Pd model:"We used a single, nondimensional form for two reasons: to enhance generalizability to systems without controlled release data, and to avoid overfitting..." line 194We acknowledged the lower limit of the wind speed scaling: "We expect the eddy scale wind speed to be the boundary of viability for the observability parameter." line 187b) Conrad et al. and later Thorpe et al. (DOI: 10.1016/j.rse.2024.114435), who used more complex approaches, both explicitly showed that “observability” is not simply inversely proportional to wind. Moreover, these papers show that model quality varies greatly across different functional forms.We added a paragraph of more context justifying our nondimensional form of Pd model:"We used a single, nondimensional form for two reasons: to enhance generalizability to systems without controlled release data, and to avoid overfitting..." line 194We acknowledged the lower limit of the wind speed scaling: "We expect the eddy scale wind speed to be the boundary of viability for the observability parameter." line 187c) How do the authors justify using only the simple model presented? Have they assessed the performance of other fits? Other fits that could fit the underlying data better may yield different inferences/conclusions.Response: We added a paragraph of more context justifying our nondimensional form of Pd model:"We used a single, nondimensional form for two reasons: to enhance generalizability to systems without controlled release data, and to avoid overfitting..." line 194We acknowledged the lower limit of the wind speed scaling: "We expect the eddy scale wind speed to be the boundary of viability for the observability parameter." line 1874 a) The authors simulate a “hypothetical satellite with lower XCH4 sensitivity [than MethaneSat]”. How much lower? Is this intended to simulate a real-world instrument? This seems arbitrary.Response: We further explain the purpose of the hypothetical satellite: "This is an excercise..." line 322We added characteristic gas concentration noises: "We The lower XCH$_4$ sensitivity is reflected in a higher characteristic gas concentration noise..." line 320We added explanation of the choice to vary gas concentration noise, to demonstrate the emissions distribution recovered by a variety of satellites that drop off in their detection capability in the heavy tail of emission rates.5 a) The emission distributions “with Pd weighting” is not described. What does this mean and how is it calculated?Response: We added a definition of Pd weighting to the introduction (More broadly, emission rate distributions can be adjusted by scaling..., line 33).6 a) The notion that controlled release testing can be “dispensed with” (line 246) because of the authors’ approach is editorializing and inappropriate. They have not proven that this method can be used in place of actual controlled release data; especially since the quality of the simulated data has not been addressed.Response:We qualified the statement with "Controlled release data should still be considered a gold standard for estimating Pd." (line 369)We go on to more concretely describe the value of the ability to use our method for planned systems (controlled release does not yet exist) and other systems that do not have controlled release data.We refer to prior work that shows that WRF-LES plumes are good enough representations of plumes"The Weather Research Forecasting, Large Eddy Simulation (WRF-LES) modeleffectively captures the stochastic behavior of plumes due to boundary layer turbulence \cite{gaudetExplorationImpactNearby2017, varonQuantifyingMethanePoint2018,chulakadabbaMethanePointSource2023}." line 747 a) The authors finish by noting that “MethaneAIR and MethaneSAT are capable of detecting small enough plumes to effectively probe the heavy tailed emission rate distributions found in the Permian, as well as other North American basins”. Without noting that there are many other instruments that outperform (the now offline) MethaneSAT, this reads as editorializing – can’t Insight-M and also Bridger Photonics, PRISMA, EnMAP, EMIT, Carbon Mapper’s Tanager, GHGSat, etc. also do this? Moreover, I don’t believe that the authors have assessed other North American basins.Response:We removed "MethaneAIR and MethaneSAT are capable of detecting small enough..."We added analysis of other N. American basins in a new Figure 7.8 a) Figure A1 is not discussed in the main text and only cursorily discussed in the appendix (it isn’t clear how we should interpret ROC or how this was derived). Is it relevant to the paper? Moreover, (see line 264), Conrad et al.’s method does explicitly consider aircraft altitude and (see line 137) Thorpe et al.’s method inherently considers flight altitude in their gas concentration noise parameterization.Response:We moved the ROC curve to the main text.We further explained the interpretation of ROC curves: "ROC curves are a commonly used tool for evaluating models of binary outcomes. Predictive skill of a model is captured by area of the ROC curve in sensitivity/specificity space above the 1:1 line, which represents an indiscriminate model." line 206For clarity, we reproduced the equation fitted model from Conrad et al. that does not contain altitude or gas concentration noise "..." (line 211) (new equation 3).Minor comments:b) Line 37: The study of POD is in service of many objectives, not just “plume counts”.Response: We added a more broad "As such, the $P_d$ of a CH$_4$ plume by different imaging spectrometers in different observing conditions is a key piece of information needed for interpretation of plume observations by different observing systems..." (line 36) to indicate that P_d is useful for more than just emissions distributions.c) Line 40: XCH4 is introduced without definition.Response: To the introduction, we added a definition for XCH4: "Observations from CH4 imaging spectrometers take the form of images of column averaged CH4 concentrations (XCH4)" (line 17).d) Line 42: “algorithms and 3)”Response: We added "and" to this sentence, now line 40.e) Line 70: replace “MethaneSAT is” with “MethaneSAT was”.Response: We replaced all references to MethaneSAT in the present tense with past tense.f) Figure 1: Rendering issue in legend.Response: We fixed the legend in the updated figure 1.g) Line 123: A nonzero false positive rate implies a nonzero POD at an emission rate of zero. What then are the implications of not quality controlling these data?Response: We added a discussion of false positives, QA/QC, and effects on emission rate distributions ("When generating the training data set, plume detection is fully automated..." line 170)Pd is now specifically defined in the intro. to only account for true positives and false negatives ("Here, Pd only accounts for..." line 30).We note that plume observations from field MethaneAIR campaigns are manually quality controlled for false positives.h) Line 146: If a scene is mostly clean, the 10% cutoff on pixel’s could significantly bias the estimate of gas concentration noise. Have the authors assessed sensitivity to this cutoff?Response: We analyzed a clean controlled release scene and found low sensitivity to the XCH4 cutoff (New Figure A1).To the methods section discussion of gas concentration noise, we added "For a MAIR-E controlled release scene in Arizona with no proximal emissions, gas concentration noise shows low sensitivity to the choice of XCH$_4$ quantile (Figure A1)." (line 225).i) Line 192: I’m surprised that the POD between an aircraft and satellite is relatively small even at low wind speeds. Can you describe how this can be true?Response: We added "For our Pd model, the coarser spatial resolution of MethaneSAT is compensated by higher sensitivity to XCH4..." (line 284) to discuss the similarity between MSAT and MAIR Pd.j) Line 204-205: This is editorializing. Why is this better than the aforementioned methods already in the literature (see also comment at line 264)?Response: We added a paragraph of more context justifying our nondimensional form of Pd model:"We used a single, nondimensional form for two reasons: to enhance generalizability to systems without controlled release data, and to avoid overfitting..." line 194We acknowledged the lower limit of the wind speed scaling: "We expect the eddy scale wind speed to be the boundary of viability for the observability parameter." line 187k) Line 212: Should this be Table 2?Response: We corrected the reference from table A1 to table 3.l) Line 220: Presumably, the emission rate of an observed plume depends on the resolution of the instrument. A low-spatial resolution satellite could effectively aggregate many small sources within a single pixel. Do the authors have a comment on this?Response: At "Plumes are emissions that can be associated with emitters at various spatial scales (\cite{pandeyRelatingMultiScalePlume2024}), we now specify "When discussing emissions distributions in this study, we consider facitility scale plumes and emissions, following other studies (\cite{cusworthIntermittencyLargeMethane2021,sherwinUSOilGas2024})" line 24m) Table A1: Include the functional forms of the fitted model.Response:For clarity, we reproduced the equation fitted model from Conrad et al. that does not contain altitude or gas concentration noise (new equation 3) (line 213).n) Throughout: I believe Insight-M was recently acquired by Zeitview. Consider noting this.Response: We added "Insight M was acquired by Zeitview..." (line 107)Citation: https://doi.org/
10.5194/egusphere-2026-115-AC1
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,549 | 959 | 128 | 2,636 | 90 | 190 |
- HTML: 1,549
- PDF: 959
- XML: 128
- Total: 2,636
- BibTeX: 90
- EndNote: 190
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript presents a method for modeling the probability of detection (P_d) of methane plumes from airborne and satellite remote sensing systems, and demonstrates its application to emissions distribution analysis of the Permian Basin. The topic is timely and relevant, but the manuscript has three major issues that should be addressed before publication: (1) structural and narrative weaknesses across the introduction, methods, and conclusion; (2) insufficient methodological detail to allow reproducibility; (3) the validation of the P_d model and the treatment of uncertainty would benefit from further development.
Major Comment 1
The paper would benefit from a clearer narrative structure. The introduction would be strengthened by background explanations of what probability of detection means, what threshold refers to, why P_d depends on emission rate, and why the research question is scientifically important. Much of Section 2 contains content (previous P_d studies, controlled release background, the concept of dispersed emissions, and their limitations) that would be more appropriate in the introduction. The introduction also has two logical gaps: the connection between characterizing emission rate contributions and the need for P_d is never explained, and the emission rate dependence of P_d is omitted from the third sentence in Introduction despite being central to the research question. Section 2 would be clearer if it focused on describing the methods and data used: what data were collected, what they look like, their uncertainties, and what steps were taken to address those uncertainties. The conclusion would also benefit from summarizing the key findings and answering the research question posed in the introduction, rather than focusing primarily on future implications.
Minor Comments:
Abstract, Sentence 1
The term "methane losses" often refers to the removal of methane from the atmosphere, for example through oxidation by OH radicals, which is not discussed anywhere in this manuscript. If the authors mean methane leakage or emissions to the atmosphere, the wording should be revised accordingly.
Introduction, Paragraph 1
P_d is introduced without definition. What is probability of detection and what does it mean physically?
The connection between the second and third sentences is unclear. The reader is told that characterizing contributions of different emission rates is important, but it is not explained why this requires knowledge of P_d.
The third sentence describes P_d as a function of sensor properties, observing conditions, and detection algorithms, but omits its dependence on emission rate, which is central to the research question that follows.
Introduction, Paragraph 2
"We develop a generalized approach to account for the factors that affect P_d." What does "generalized" mean here, and generalized compared to what?
"We utilize data from controlled release experiments, extended with image processing techniques, and supplemented with simulated plumes. Our approach enables us to model systems lacking available controlled release data such as MethaneSAT." The first sentence implies controlled release data is the primary dataset, while the second claims the approach works without it. These two sentences appear contradictory and could be clarified.
Introduction, Paragraph 3
The third paragraph discusses previous studies and their limitations, but would read more naturally before the research question in Paragraph 2.
The term "threshold" is used without definition. What does it mean for an observing system to have a higher or lower threshold?
The third paragraph mentions that simulated plumes have been used in previous studies (Rouet-Leduc and Hulbert (2024); Roger et al. (2025)), but does not state what is novel about this paper's approach. It would be helpful to explicitly state the novelty.
The concept of dispersed emissions is central to the analysis, as total emissions are defined as the sum of plume emissions and dispersed emissions, yet it is introduced for the first time in Section 2.7. This framework would be clearer if established in the Introduction.
P_d is never connected to its actual application in the paper as a post-detection correction tool used in the Monte Carlo simulation to correct for missed detections. This connection would help the reader understand why modeling P_d matters for the emissions distribution analysis.
Section 2.1
The content of Section 2.1, explaining what a useful P_d model should do and why intercomparison across systems requires accounting for sensor sensitivity, algorithm skill, and observing conditions, reads more as background material. The discussion of Conrad et al. (2023) and Bruno et al. (2024) similarly fits more naturally in a prior work review in the Introduction.
Conclusion
The conclusion would be strengthened by stating how the proposed P_d model compares to previous approaches, and what effect accounting for P_d had on the emissions distribution analysis.
The conclusion would also benefit from addressing the research question posed in the introduction: how can observations from different remote sensing systems be integrated to assess the emission-rate-dependent distribution of point sources?
Major Comment 2
Several sections would benefit from additional methodological detail to support reproducibility. The distinctions between MAIR, MAIR-E, and MAIRX campaigns are not explained well: which campaigns correspond to controlled release experiments, which data are used for training versus validation versus Monte Carlo analysis, and why they are treated separately. The controlled release experiment description leaves unanswered who releases the methane, and how the emission rate is controlled. The WRF-LES simulation setup would benefit from additional details including model resolution, domain, simulation time, inputs, and how emission rates are varied. The connection between the binary detection outcomes from Section 2.5 and the logistic regression training in Section 2.6 could be made more explicit. The Monte Carlo simulation in Section 2.7 would also benefit from additional details including the number of iterations, what is calculated in each iteration, and how results are aggregated.
Minor Comments:
Section 2.2
It is not clarified whether MAIR and MAIR-E correspond to the controlled release experiments described in Section 2.3. The relationship between campaign names and their purpose would be helpful to state explicitly.
The methodological differences between MethaneAIR, MethaneSAT, and LeakSurveyor are not described beyond pixel dimensions. What are the methods used for detecting XCH4, what are the measurement uncertainties, and what are the sampling frequencies?
Section 2.3
Several basic questions about the controlled release experiment are not addressed: who releases the methane, how is the emission rate controlled, how are the release and detection teams kept separate, and how are overpasses coordinated with the release timing?
It is unclear exactly what controlled release data this paper uses. Are both the 2021 and 2022 campaigns used? How many flights were conducted, how many releases were created and captured, and what range of conditions were tested? A summary table would be helpful.
"Controlled release experiments were attempted for MethaneSAT prior to its loss, but coordinating overpasses with release durations failed on available trials." This sentence raises questions, and could either be removed or expanded with references and details.
Section 2.4
The WRF-LES simulation setup would benefit from additional details: what is the model resolution and domain size, what is the simulation time period, how are emission rates varied, and what range of emission rates was tested?
It is not clear how the approximately 80,000 training scenes were generated. Were wind speed, cloud fraction, and aerosol optical depth varied? Was only one source assumed per pixel?
The process of creating the validation dataset from approximately 40 controlled release overpasses is not explained. Were model winds constrained by observed winds from the controlled release site?
"To reduce computation times, we used an idealized version of WRF-LES, which uses a time invariant upwind boundary condition." References or discussion of uncertainties associated with this idealization would be helpful.
Section 2.5
It is not clearly stated which algorithm is used by which observing system. If MethaneAIR used the divergence integral algorithm and MethaneSAT used the wavelet algorithm, this should be explicitly stated, as it affects the interpretation of the results throughout the paper.
The divergence integral and wavelet algorithms are not described in sufficient detail. What are the underlying assumptions, inputs, and known limitations of each method?
"Here, plume detection is fully automated; there is no manual quality assurance step to identify false positives, which are not a focus of this study." Since emission locations and rates are known in the WRF-LES simulations, false positives could in principle be identified. It would be helpful to clarify whether this was assessed.
Section 2.6
It is not explicitly stated what the training labels for the logistic regression are. We assume that the binary detection outcomes from Section 2.5 serve as labels and g as the input feature. Could the authors confirm this and make it explicit in the text?
Section 2.7
MAIRX appears without explanation of how it differs from MAIR and MAIR-E. It would be helpful to clarify why MAIR and MAIR-E are used for validation while MAIRX is used for the Monte Carlo analysis.
The Monte Carlo simulation would benefit from additional details: the number of iterations is not stated, it is unclear what quantity is calculated in each iteration, and it is not explained how results are aggregated.
Dispersed emissions are fixed at the average estimate across all valid scenes without justification. It would be worth discussing whether this choice affects the uncertainty in the emissions distribution estimate.
It is not described how the effect of P_d is isolated, that is, how the corrected and uncorrected emissions distributions are compared.
Major Comment 3
The paper would benefit from a more thorough validation of the P_d model and a more complete treatment of uncertainty. Model performance in Section 3.1 is described only qualitatively and validation is shown only for MethaneAIR. In Section 3.1, the explanation for the poorer performance of the divergence integral algorithm relies on algorithmic details that were not introduced in Section 2.5, making the discussion difficult to follow. In Section 3.2, the interpretation of Figures 4A and 4B is unclear because it is not established in Section 2 how both detection algorithms are applied to each observing system. In Section 3.3, the discrepancy with Warren et al. (2024) is discussed but it is not clear what the key methodological difference is between the two approaches and which is more appropriate. The steady state assumption is identified as a limitation without quantifying its impact on the results.
Minor Comments:
Section 3.1
No quantitative metrics such as RMSE, R2, or bias are reported. Figure 3 provides only a slope, which is insufficient to assess model fit. It would be helpful to include quantitative validation metrics for all systems and algorithms.
Validation is shown only for MethaneAIR. It is unclear whether the model was validated for MethaneSAT, and given that MethaneSAT P_d predictions are used in the Monte Carlo analysis, this would be an important addition.
The manuscript attributes the poorer performance of the divergence integral algorithm to its sensitivity to missing data, explaining that it requires intact boxes of XCH4 pixels while the wavelet does not. However, since neither algorithm is described in Section 2.5, I cannot follow this reasoning. It would be helpful to describe both algorithms in sufficient detail in Section 2.5 so that this discussion is interpretable.
Section 3.2, Figure 4
Figures 4A and 4B are difficult to interpret because it is not clearly established in Section 2 how both the divergence integral and wavelet algorithms are applied to MethaneAIR and MethaneSAT. It would be helpful to clarify which algorithm is used by which system, and whether the P_d curves for MethaneAIR and MethaneSAT are estimated using simulated training data while LeakSurveyor P_d is calculated directly from controlled release data.
The Insight M P_d curve is derived from approximately 100 controlled release scenes while the other algorithms use approximately 80,000 simulated scenes. This difference in training data size makes a direct comparison of algorithm skill difficult to interpret, and it would be worth discussing whether the higher uncertainty of the Insight M curve affects the conclusions drawn about its relative performance.
Section 3.3
The discrepancy with Warren et al. (2024) is attributed to two differences: the inclusion of wavelet-detected plumes and the absence of persistence weighting. However, it is not clear which of these is the more important factor, nor whether the steady state assumption used in this manuscript is more or less appropriate than the persistence weighting approach of Warren et al. (2024). Some discussion of the relative merits of the two approaches would help the reader assess the reliability of the results.
The impact of the steady state assumption on the results is not quantified, despite being acknowledged as a limitation. Given that this assumption underpins the entire emissions distribution analysis, some assessment of its impact would strengthen the paper.
The confidence intervals for the hypothetical lower sensitivity satellite are described as "much wider" without numerical values. Reporting confidence intervals across all three systems would help the reader assess the practical significance of the P_d correction.