the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
U-Plume: Automated algorithm for plume detection and source quantification by satellite point-source imagers
Jack Bruno
Dylan Jervis
Daniel Varon
Daniel Jacob
Abstract. Current methods for detecting atmospheric plumes and inferring point source rates from high-resolution satellite imagery are labor intensive and not scalable to the growing satellite dataset available for methane point sources. Here we present a two-step algorithm called U-Plume for automated detection and quantification of point sources from satellite imagery. The first step delivers plume detection and delineation (masking) with a machine learning U-Net architecture for image segmentation. The second step quantifies point source rate from the masked plume using wind speed information and either a convolution neural network (CNN) or a physics-based Integrated Mass Enhancement (IME) method. The algorithm can process 62 128×128 images per second on a single core. We train the algorithm with large-eddy simulations of methane plumes superimposed on noisy and variable methane background scenes from the GHGSat-C1 satellite instrument. We introduce the concept of point source observability Ops = Q/(UWΔB) as a single dimensionless number to predict plume detectability and source rate quantification error from an instrument as a function of source rate Q, wind speed U, instrument pixel size W, and instrument-dependent background noise ΔB. We show that Ops can powerfully diagnose the ability of an imaging instrument to observe point sources of a certain magnitude under given conditions. U-Plume successfully detects and masks plumes from sources as small as 100 kg h-1 over surfaces with low background noise and succeeds for larger point sources over surfaces with substantial background noise. We find that the IME method for source quantification is unbiased over the full range of source rates while the CNN method is biased toward the mean of its training range. The total error in source rate quantification is dominated by wind speed at low wind speeds and by the masking algorithm at high wind speeds. A wind speed of 2–4 m s-1 is optimal for detection and quantification of point sources from satellite data.
- Preprint
(1053 KB) - Metadata XML
- BibTeX
- EndNote
Jack Bruno et al.
Status: open (extended)
-
RC1: 'Comment on egusphere-2023-1343', Anonymous Referee #1, 08 Sep 2023
reply
The manuscript presents a novel method to infer CH4 plume point source rates using the U-Net architecture for image segmentation followed by either a convolutional neural network or integrated mass enhancement to estimate the point source rate. They find the approach to be successful across a range of source rates and background noises and suggest a general functional relationship of the point source observability based on the source rate, wind speed, pixel size, and background noise. However, the manuscript lacks some important details about the methodology, specifically data normalization, model training, and model evaluation, which inhibit assessment and reproducibility of the work. Additionally, some claims are more broad than the presented results support; these claims should either be rephrased with more balanced language, or further work should be conducted to better substantiate the claims. Detailed comments are provided below.
1. 6870 scenes were used for training. For ML, this is considered a very small data set size, and small data sets often lead to shortcomings in the trained models when compared to identical models trained on larger data sets drawn from the same distribution. How is the model performance affected when adjusting the data set size by a factor of 2 in each direction?2. It is stated that 90% of the images were used for training and 10% for testing. In ML applications, a validation set is used to monitor for overfitting during training. Was a validation set used during training? If so, please state that along with associated information (e.g., what fraction of the data set was used for validation, what type of cross validation was used, and any early stopping criteria contingent on the validation loss). If not, please state that and justify why it was not used.
If instead what the authors refer to as the testing set is actually the validation set, please correct the language accordingly, and please introduce a testing set to statistically evaluate the generalization of the model beyond the training and validation data.
3. Related to the above, what loss functions are minimized over which data set (training vs. validation), and what learning rate policies and stopping criteria (if any) were used when training the ML models? How many epochs were used to train each model?
4. Please add details on data normalization for the inputs/outputs of the U-Net and CNN models used in this investigation. This information is necessary to include in the manuscript as it is crucial for both assessment and reproducibility.
5. Please explicitly define the neural network architectures used in this work. The U-Net architecture is clearly defined in the source reference, so it is only necessary to state any deviations from that architecture, if any. The CNN architecture is poorly described here, providing no details on the number of layers, convolutional feature maps or kernel sizes, pooling sizes, the number of nodes in the fully-connected layers, nor activation functions. This information must be included in the manuscript for completeness.
6. Related to the above, how was the CNN architecture determined for this problem? Simple grid search, Bayesian optimization, or something else? Please add details about this in the manuscript.
7. Lines 157-158: Which specific Intel Core i7 CPU was used for this quoted benchmark? The clock speed of i7 CPUs spans a factor of ~4 depending on architecture and TDP, ranging from under 1.1 GHz to over 4 GHz, let alone other variables such as cache amounts. Additionally, please specify whether file I/O was included in this benchmark as well as whether the data were all loaded into RAM at once or batches were loaded on the fly, as there may be significant differences in performance for these scenarios.
8. This is more a general comment on the testing of the presented models. The test set solely consists of synthetic images produced in the same manner as the training data, that is, the test set does not include any real images where the predicted source rate could be compared with an existing data product. For completeness, the authors should perform some comparisons using real GHGSat-C1 scenes which contain an obvious CH4 plume that has had its source rate estimated by one or more existing methods and statistically summarize the differences between U-Plume and those methods. Ideally, many such cases should be included if feasible to better illustrate any trends in biases/deviations between this U-Plume approach and more traditional methods.
9. Lines 230-235: There are contradictory statements. deltaB < 5% is described as both "high background noise" and "low background noise", but only one of these can be true. Please correct this typo so that readers may understand what the authors consider to be low/high background noise.
10. Lines 239-242: Can you comment more on these false positives? How are they distributed with respect to the variables considered in this study? Were any other approaches considered to address them? Are the estimated source rates from these false positives generally small and could be filtered that way rather than based on number of pixels in the mask?
Using the 5-pixel-mask filtering loses 1.6% of the true positive detections; can you comment more on this? Are these generally small source rates at low wind speeds, cases with high background noise, or are they more uniformly distributed throughout the domain of interest?
11. Line 262: It says that "Detection probability is 10% for O_ps = 0.2". In Figure 4, O_ps has a minimum value of 0.5, at which it has a detection probability of 0%. O_ps = 2 looks to be closer to the 10% detection probability mentioned. Please correct this typo.
12. Lines 291-298: It is mentioned that the CNN method is biased towards the mean, that is, it overestimates small sources and underestimates large sources. What is the distribution of source rates in the training set used in this investigation? If that distribution is biased towards the mean, then that could also explain the CNN's reported behavior. If that is the case, the authors should either address this bias in the data set to improve model performance at the extrema (whether in the data set itself, or in the loss function used to train the model) or use more balanced language when discussing this limitation.
This bias towards the mean can also be a consequence of how the CNN was trained (though the manuscript lacks sufficient detail on how these models were trained to determine the likelihood of this being the case - see comments above).
While the CNN is likely to still perform poorly when extrapolating regardless of data set, the authors have not conclusively ruled out bias in the training data set or particular training methodology as the reason for the CNN's worse performance vs. the IME method over the domain that the CNN was trained on. Furthermore, it should be mentioned that expanding the training data set down to 100 kg/h source rates would likely enable the CNN to more accurately recover those scenarios.
13. Lines 380-381: What CPU was used for this quoted benchmark? Please be specific. (See also comment #7 above.)
14. Line 385-387: It states that "Evaluation with an independent dataset ...", but based on the described methodology earlier in the manuscript, it is misleading to describe the test set as "an independent dataset" given that it is drawn from the same distribution as the training data. It is only independent in the sense that it was not part of the training process, but the manuscript has not conclusively demonstrated that the model generalizes to independent data (real measurements). Please use more balanced language here, or perform the tests suggested above in comment #12 to better substantiate this claim.
15. Figure 7: The orange line looks to be biased by the outliers at O_ps ~ 2, as the line is above the vast majority of the data for O_ps < 10. Given that O_ps > 30 is omitted from the fit due to non-linearity (the error bottoms out around 10%, as mentioned), the authors may wish to consider also omitting O_ps < 3 from the fit due to non-linearity.
16. Figure 8: In the left and middle panels, some of the plotted lines are covered by the legend. Please relocate the legend in these panels to avoid this behavior. In the left panel, it looks like it could be placed in the center-left, while the middle panel could relocate the legend to the top-left or bottom-right of the plot.
Citation: https://doi.org/10.5194/egusphere-2023-1343-RC1
Jack Bruno et al.
Jack Bruno et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
297 | 112 | 9 | 418 | 5 | 5 |
- HTML: 297
- PDF: 112
- XML: 9
- Total: 418
- BibTeX: 5
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1