Flight Contrail Segmentation via Augmented Transfer Learning with Novel SR Loss Function in Hough Space

Sun, Junzi; Roosenbrand, Esther

doi:https://doi.org/10.48550/arXiv.2307.12032

Preprints

Abstract
Assets
Discussion

Preprints

https://doi.org/10.48550/arXiv.2307.12032

Preprints

Abstract
Assets
Discussion

17 Oct 2023

| 17 Oct 2023

Status: this preprint has been withdrawn by the authors.

Flight Contrail Segmentation via Augmented Transfer Learning with Novel SR Loss Function in Hough Space

Junzi Sun and Esther Roosenbrand

Abstract. Air transport poses significant environmental challenges, particularly regarding the role of flight contrails in climate change due to their potential global warming impact. Traditional computer vision techniques struggle under varying remote sensing image conditions, and conventional machine learning approaches using convolutional neural networks are limited by the scarcity of hand-labeled contrail datasets. To address these issues, we employ few-shot transfer learning to introduce an innovative approach for accurate contrail segmentation with minimal labeled data. Our methodology leverages backbone segmentation models pre-trained on extensive image datasets and fine-tuned using an augmented contrail-specific dataset. We also introduce a novel loss function, termed SR Loss, which enhances contrail line detection by transforming the image space into Hough space. This transformation results in a significant performance improvement over generic image segmentation loss functions. Our approach offers a robust solution to the challenges posed by limited labeled data and significantly advances the state of contrail detection models.

This preprint has been withdrawn.

How to cite. This is an external preprint publicly discussed on EGUsphere. Please find the correct citation on the original preprint server’s site at: https://doi.org/10.48550/arXiv.2307.12032

Received: 24 Sep 2023 – Discussion started: 17 Oct 2023

Download & links

Withdrawal notice
This preprint has been withdrawn.
Preprint (0 KB)

Download & links

This preprint has been withdrawn.

Preprint

Junzi Sun and Esther Roosenbrand

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2189', Anonymous Referee #1, 02 Nov 2023
General comments
This study presents a new loss function that tries to leverage the linear structure of young contrails in order to improve the training of deep learning models for satellite-based contrail detection.
Unfortunately, I will have to recommend this paper to be rejected. The new loss function idea is intuitive and seems worthy of a publication in the future, but the current state of the paper is in my opinion not of high enough quality to become publishable after one or more review rounds.
One of the most prominent issues I have with the paper is the seeming lack of awareness of other work done in the field. A major motivator for the paper is a “lack of large hand-labeled datasets” for contrail detection on satellite images. I strongly disagree with that sentiment, as there are multiple efforts (McCloskey et al. 2021, Ng et al. 2023) that have provided open-source datasets of contrails labeled on satellite images. Furthermore, the authors present several aspects of their methodology as “novel/innovative” (e.g. image augmentation, pre-training) whereas these techniques are a mainstay in deep learning and have been used by previous approaches for contrail detection as well (Kulik 2019, Meijer et al. 2022). Moreover, the literature on deep learning and novel loss functions for specific applications is vast, and no comparison of the introduced loss function to other existing, perhaps similar alternatives (e.g. Deep Hough Transforms) is made.
Another problem is the qualitative nature of the analysis and discussion in this paper. Very few numbers are produced regarding the performance of the detection model and the effect of various changes to the loss function. Instead, the analysis is mostly qualitative (“this looks better than that”) and therefore subjective. Furthermore, there seems to be a consistent discrepancy between the actual results and how the authors present them. I don’t see how statements like “This new loss function has demonstrated superior performance” are warranted given the paper’s contents. I recommend the authors to provide more objective reflections of the data that allow the reader to draw their own conclusions.
There are small mistakes, inaccuracies and (what I believe to be) confusing statements throughout the paper. I have tried to point these out in the specific comments below.
Overall, I suggest that the authors proceed as follows to improve this work and publish it eventually:
Review the literature on contrail detection (and perhaps also scour the deep learning literature to compare the novel loss function with existing approaches in other application domains).

Utilize existing labeled contrail datasets to assess the effect on quantitative performance that the SR loss function has (and perhaps some of the data augmentation techniques as well).

Specific and technical comments
My apologies for the confusing way of referring to certain parts of the manuscript: no line numbers were available.
Abstract: The abstract can be improved as follows. It is not very clear what the link is between sentence 1 and 2. The abstract suddenly jumps from “aviation climate impact” to “computer vision”. I would recommend to first state the relevance of computer vision to the topic at hand: techniques to automatically gather satellite-based data on contrails. Also, the statement “scarcity of hand-labeled contrail dataset” is questionable, see e.g. McCloskey et al. (2021) and Ng et al. (2023) who introduce large hand-labeled datasets of contrails.
Introduction: The first sentence does not flow well: I recommend rewriting it. The second sentence contains “They are emit” which I think should be “They emit”. Next, the second sentence states that aircraft emit nitrous oxide (which is ), which is incorrect I believe: perhaps the authors mean “oxides of nitrogen”.
The third sentence “unlike other types of transports, aircraft-generated contrails…” seems a bit odd as well: perhaps it is better to contrast this with other aviation emissions? Also, it states “could cause impact the climate” which I think should be either “could impact the climate” or “could cause climate impact”. The last sentence uses “are” instead of “is”.
I think this section (introduction) could use a rewrite as well. It does not capture the idea that we have several models that we can use to estimate the climate impact of contrails, but that these model outputs are associated with great amounts of uncertainty. Gathering more observational data, by e.g. satellites, could be a way to reduce these uncertainties.
Computer vision approaches:
“Contrail detection traditionally utilizes computer vision tasks, given that it involves identifying linear features”. This sentence does not really make sense to me. Indeed, automated contrail detection on satellite imagery is best done using computer vision techniques. However, it is not the idea that contrails tend to be linear (at least when they’re young) that makes it so that we use computer vision to automatically detect them. It is however a key property that motivated some of the older approaches like Mannstein et al. (1999).

“Hough transform” I would say “the Hough transform” or “Hough transforms”

“methods have been extender” I believe this is a typo and should be “extended”

“tests for contrails in an image”. This sounds a bit odd, and I think the authors can be more specific by specifically mentioning that this is the algorithm by Mannstein et al. (1999).

“Zhang et al. (2017) combined visual contrail detection” what is meant with “visual” contrail detection? The reader could interpret this as contrail detection using visible satellite imagery, which I do not think is the case here.

“With advanced image processing…” I have two problems with this sentence:
“With advanced image processing”: firstly, Minnis et al. use an extension of Mannstein’s algorithm, and given that this paper was published in 2013 (when the first highly effective deep learning algorithms for computer vision were available) I think that the use of the word “advanced” here is unwarranted.

The use of “With advanced image processing” and then at the end of the sentence again “a blend of … imagery processing techniques” seems tautological, and I would recommend choosing one or the other.

“The study utilized …”
If the authors want to cite a paper fundamental to the brightness temperature difference technique, I believe Inoue (1985) is more appropriate than Ackerman (1996).

I would augment this sentence with an explanation of what these techniques actually were used for (retrieve contrail optical depth etc.)

Machine learning approaches:
Firstly, I would simply call this section “Deep learning approaches” as this is more specific and still covers all the papers cited in this section.

“auto-encoder based convolutional neural network” I would argue that Kulik (2019) used a convolutional neural network and not an auto-encoder of any kind.

“However, despite its success… the model could not determine their exact location due to the simplicity of the employed machine learning model” What is meant with “their exact location”?
The model from Kulik finds the pixels in a satellite image that it estimates to be part of contrails, therefore it does in fact locate them in the horizontal sense. Do the authors perhaps mean to distinguish between segmentation (i.e. find which pixels are part of contrails) and object detection (find the pixels that are part of each contrail) here?

“Simplicity of the employed machine learning model” How is the model employed by Kulik (2019) simple? It is actually incredibly similar to the one (the U-net) used in this study.

“A parallel approach…” how is this approach parallel to that of Kulik (2019)? Furthermore, detecting contrails in ground-based camera images is a completely different task than detecting contrails in satellite imagery, so I do not see the relevance of mentioning the Saddiqui (2020) thesis here.

“has provided a modest set” what makes this dataset modest?

“landsat” I suggest capitalizing Landsat and being specific in that these are Landsat-8 imagery

“Another recent study by…”
“in a more comprehensive effort” why is this more comprehensive than McCloskey et al. (2021) ?

“contrails over the United States” The dataset created by Ng et al. (2023) covers not only the United States but also other parts of the GOES-16 full disk imagery including the Southern Hemisphere.

“the GOES-16 satellite imagery” I would leave out “the”

“In the course of assembling this dataset, a convolutional…” I believe this was not the case, there was no convolutional neural network involved in creating the dataset of hand-labeled contrails.

“The research shows promising results…”
The authors can be more precise here by stating that the model presented in Ng et al. (2023) outperforms single-frame models

“high performance hardwares” I would say “hardware” which is already plural I believe

I don’t see how the need for high performance hardware (which is not a very precise term) is a detriment to the paper’s results.

This section could also cite Meijer et al. (2022): who use a different neural network than Kulik (2019) and apply this to a large dataset.

Research gaps
“It has been found that traditional computer vision based…” It would be appropriate to back up this statement with a citation.

“The complexity of satellite images captured under varying conditions” This is a rather vague sentence: what are the authors trying to say here?

“Earlier machine learning approaches…” Again, would be more specific here and cite the appropriate references. And why do such methods require larger amounts of data than more recent methods?

“In particular, there are no adequate loss functions optimized for linear features…” What about papers like Shit et al. (2021)? I would be more specific here and say that there exist such approaches but no study has investigated their usefulness for contrail detection.

“This shortcoming makes detection…”
Why does this lead to more difficult detection at lower resolutions? And what are lower resolutions in this case? 1 km, 2 km or more? Doesn’t lower resolution make detection of an object harder in general, no matter its similarity to a straight line?

How would adequate loss functions resolve the issue of more difficult detection of multiple contrails that are close to each other? And what is meant with “close to each other”? I imagine that the relative orientation of the contrails in a cluster is relevant here.

Contributions of this study
“This paper aims to present a machine learning model… with augmented satellite images to improve the training efficiency”
Augmented in what way?

What is meant with “Training efficiency”?

“This approach enables the effective training … with a minimal dataset on standard computer hardware”
What is meant with a “minimal” dataset?

“Firstly we complement … with data augmentation methods …” Almost all (if not all) previous deep learning models for contrail detection use data augmentation (e.g. Kulik (2019), Meijer et al. (2022), Ng et al. (2023)) as this is a standard approach in deep learning to regularize models.

“SR loss” The abbreviation has not yet been defined (here or anywhere else in the paper)

“We also offer open access to … imagery data, …” Most satellite image data is publicly available, so I would remove that from this sentence.

“we perform thorough evaluations of the performances of our models openly”
I would remove “thorough” as this is subjective

“openly” what is an “open evaluation” of a model? Does this imply that other previous papers have not been “open” in their evaluation?

GOES data
Please mention that GOES-16 ABI (advanced baseline imager) data is used, and provide a citation to a relevant paper. This helps the GOES-R team to see what their data is used for!

“Contrails and cirrus clouds share atmospheric similarities” What is meant with “atmospheric similarities”? Perhaps a sentence like “Contrails and cirrus clouds feature similar microphysics” would be more appropriate

“pre-process” should be “pre-processing”

“the difference” instead of just “difference”

“difference between 12.3 and 10.35 is obtained” Does this mean that you subtract the image corresponding to the band from the3 m band? This is exactly opposite to the typical convention for the BTD. Thin cirrus tends to absorb more radiation near 12.3 than near 10.35 . Therefore, less radiance is measured in the 12.3 band than in the band, such that subtracting 12.3 from 10.35 leads to thin cirrus and contrails having positive BTD.

“After days and regions are identified with contrail occurrence” Please expand on how this is done

I do not see the added value of mentioning the software packages used to download “goes2go” and process “netCDF4” satellite data. Given that GOES-16 ABI data is primarily disseminated in netCDF files, I think it is rather trivial that one uses the netCDF library to handle that data.

Figure 1
The convention is to plot a satellite image with lighter colors corresponding to higher values. As such, contrails typically appear as darker features on infrared imagery. Not following this convention may confuse readers.

I would provide the time and date of the GOES-16 ABI image.

The BTD defined here (channel 13 – channel 15) is opposite to that described in the text.

Contrail labeling
“contrails are first traced with paths” I do not believe the average reader will be familiar enough with GIMP to know that a “path” is. Could you please provide more detail on whether this is a polygon-shape, or a line?

“the mask image is generated with strokes of approximately two pixels on all paths”
Contrails vary in width, and based on the above statement this is not accounted for in the labeling process. This may lead to biases in the training process (depending on the loss function used), and this should therefore be acknowledged.

“In total, around 30 images at two different locations … are selected and …” Why only at two different locations, and why specifically San Francisco (note the erroneous use of a hyphen in the paper) and Florida? This obviously has impacts on the model’s generalization to other regions within the GOES-16 ABI domain.

Given that San Francisco is quite far away from the GOES-16 sub-satellite point (and thus the satellite viewing zenith angle is quite large), the images are likely to be distorted. Is a correction (e.g. a map projection) applied before the labeling and/or training process?

“which are not use” “use” should be “used” I presume

Regarding the labeling procedure: nothing is mentioned on how the labelers identify contrails? What kind of decision making process was used to distinguish between contrails and natural cirrus and/or background features? See for example Meijer et al. (2022) for such a description.

Figure 2
I suggest to add the date and time of the GOES-16 image.

U-Net
What is meant with “generation research”?

“High accuracy and efficiency” as measured by or compared to?

You could cite both Kulik (2019) and Meijer et al. (2022) here as both use a U-Net model as well.

“the traditional structure of a CNN”: what is a traditional structure here?

“This process expects the abstracted semantics to be captured” Perhaps better said using “the deeper layers capture higher-level image features”.

“using transposed convolutions” The U-Net as proposed by Ronneberger et al. (2015) does not use transposed convolutions, I believe.

“and thus lead to” “lead” should be “leads” I believe

“for classification” Isn’t the task at hand image segmentation?

Figure 3:
I think the use of a visible ground-based image of contrails is a bit misleading, and I would recommend the authors to use a satellite image instead.

“Activation” is only placed at the end of the network, although there are likely many more activation functions present throughout the neural network. Perhaps this could be clarified in the figure caption.

The various colors and boxes are not explained. I think this would be helpful (perhaps adding a legend would resolve this).

ResNet
I do not think this section is necessary at all. Residual blocks are a mainstay in deep learning and the authors can mention their usage but an explanation of how they work and why they are useful does not fit in a paper that focuses on contrail detection, in my opinion. The authors can also cite previous contrail detection papers here, as most deep learning approaches utilize residual blocks.
Combining U-Net and ResNet
Again, this section could be omitted. However, I would include an explanation of exactly which ResNet model is used (ResNet18 or 52 etc).
Conventional loss functions
“The choice of loss function is critical part…”
“is a” instead of “is”

There are machine learning approaches that do not require a loss function. Nearest neighbors is an example. You can of course interpret the distance metric used as a “loss function” but I believe that this statement is far too general.

“Two conventional loss functions…”
Why not use the cross-entropy loss function?

“are highly imbalanced in our study” I would remove “our study” and simply state this is a general issue with satellite-based contrail detection, and back it up with some numbers regarding the occurrence rate of contrails (see for example Meijer et al. 2022, Ng et al. 2023)

“these loss functions do not heavily penalize the prediction the majority class”
What is “heavily penalize”?

“the prediction the majority class” should be “the prediction of the majority class”

“Focal loss” would replace with “The focal loss”

“misclassified and hard classifications”
What makes a classification “hard”? And how is classification relevant in this image segmentation task? (I presume the authors refer to the fact that image segmentation comes down to a classification problem for each pixel, but that is not clear from the text at all.)

“misclassified” would say “misclassified pixels”

“it applies a specific modulating factor…”
None of the symbols , and are defined.

“otherwise” given that is undefined, it is not clear what other values can take.

“Where ” I believe this should be , and does not make the focal loss equal to the cross-entropy loss, does, right?

“It is based on the Dice coefficient…” The equation that follows is not for the Dice coefficient, but for the Dice loss (which I find a bit confusing)

“where represents the probability of prediction for a pixel belonging to the target class (contrail or non-contrail).” Perhaps it is better to state “target class (contrail)” instead, as the reader might now be confused about what the target class is.

SR Loss at Hough space to improve contrail segmentation
“via convolutions” How are convolutions relevant in the evaluation of the previously introduced loss functions?

“And they …”
This sentence could best be merged with the previous one, I believe

“explicitly considers” should be “explicitly consider”

“Predicted contrail formations” Should this instead read “contrail detections”? I don’t see how contrail formation is relevant to the discussion at hand.

Hough space and transformation
“The Hough transformation first converts the common linear representation of” Do you mean “representation of a line”? It may also be helpful here to provide definitions of the symbols used, especially since was used in the previous section as well for a seemingly different purpose.

“We denote this polar coordinate system” Would the word “refer” here be better than “denote”?

“where each point … pixel space.” Should “correspond” be “corresponds” ?

“Only lines that are close to a sufficient amount of masked pixels are selected” What is a sufficient amount and how is that threshold determined?

Combining Dice loss at original and Hough spaces
“Aforementioned dice loss at the pixel space” Perhaps “in the pixel space” would be better here?

“similarity at the Hough space” Maybe it is better to say “in the Hough space”.

What is the motivation for also using a Dice loss in the Hough space? Moreover, with the way the transformation to Hough space is being performed (either a particular cell is occupied or not) does this not lead to under-representation of very long contrails (which would not be weighted accordingly) in your loss function?

Figure 7
I would replace the fully connected neural network diagram with one of a CNN.

I would replace “predict” with “predicted”

Image augmentation
I think that this section can be made much more terse: image augmentation is extremely common in deep learning and it has been applied in previous studies that look at contrail detection. I would remove figures 8, 9, and 10 or move them towards the supplementary materials.
“One way in” would sound better as “one way of”

“is to train neural network model” I would opt for the plural “models”

“Large, high-quality datasets are not always available” The validity of this statement is questionable given the work done by McCloskey et al. (2021) and Ng et al. (2023).

“Image augmentation provides an efficient way to generate training data using a small labeled dataset.” I would use different language here with more nuance like “image augmentation can mitigate overfitting”.

“Essentially, this can generate several order of magnitude more images for model training based on a limited number of manually labelled images” Is this really true? Do the “augmented” images contain as much new information in them as “true” additional images? If so, I would provide some citations to back up this claim.

“We apply … which includes” These 2 sentences are more or less a repetition of the previous 2.

“Perspective of the contrails” what is the perspective of a contrail?

“lighting variations”: given that many data augmentation techniques have been designed with visible imagery in mind, I wonder how applicable these techniques are to infrared imagery.

“During training … we apply a sequence … contrail mask”. Could you specify the probabilities used for the various transformations?

“robust to varying image quality and contrails” Firstly, how does image quality come into play here? Secondly, what is “robust to varying contrails”?

I think more detail should be given for how the viewing angle variation is implemented, as that is potentially very interesting.

Transfer learning based on pre-trained models
Again, I would make this section far shorter. Pre-trained models have been used by among others Kulik (2019) and Meijer et al. (2022). It is very common to do so. I would only mention the following two points (which are currently missing)
The exact dataset that has been used for pre-training

How a pre-trained model (which I presume was trained on some kind of natural image dataset with 3 channels: RGB) was modified to accept 1 input channel (which I interpret as being the case for the satellite images in this study)

Evaluating the model with unseen GOES data
“Unlike common image objective identification tasks” What are image objective identification tasks?

“easily computed ground truth” How does one compute a ground truth?

“Persistent contrails can disperse into cirrus clouds, which become indistinguishable from clouds in many cases”. This sentence sounds a bit odd. I would replace it with something like “Persistent contrails spread as they age and may become indistinguishable from natural cirrus clouds”.

“Thus, the performance evaluation is mostly based on the visual inspection of the results.” I disagree with the line of reasoning that leads up to the decision to evaluate the performance at a visual level. Other deep learning based studies on contrail detection have performed quantitative performance evaluation, and there is obvious correlation between the metrics used and “qualitative performance”. If there wasn’t, such metrics (think of the precision and recall) would not be used as frequently as they are.

In this section, the terminology “clear” and “unclear” contrail is used multiple times. What exactly is meant with these terms?

“The third row”. Firstly, is the data corruption here an image artifact (it looks rather strange for an image artifact to me, and more like a saturation issue) or something that has been added to the image by the authors? And why is the image resolution worse for this case? Because it is at a higher viewing zenith angle?

Performance under different loss functions
“given the higher computational time required” Does this imply that the use of the SR loss function increases the computational time by a factor of two? (Since 4000 steps are used now rather than 8000). I would also simply use the same number of steps for each loss function as to do a fairer comparison.

What is mean with “steps” here? Are these epochs or iterations (i.e. gradient updates)?

Could you specify the optimizer used and its hyperparameters?

“are seem with” seem should be seen, I believe?

“clean background” what does it mean for a background to be clean?

“As it can focus on forming masks for longer lines of contrails…” Referring back to one of my earlier comments regarding the way in which the Dice loss is applied in Hough space: doesn’t that loss function fail to distinguish between a longer and a shorter contrail as long as they have the same coordinate?

“In later discussion section” I would rephrase this as “Later in the discussion section”

Evaluating the model with other image sources
“The resulting contrail detection model has demonstrated an ability …” I think this statement should be more nuanced, given that no quantitative evidence (or even qualitative) has been offered to back it up.

“The contrail detection model can be directly applied to different types of image sources without additional training”
I would rephrase “different types of image sources” as “images from different sources”

I think there are many models that can readily be applied to different types of images, what we are really interested in here is how well they perform!

“MeteoSat” I believe the “s” need not be capitalized. And please cite the appropriate source for this data, and mention that the image is captured by the SEVIRI instrument.

“Which shares similar image properties …” What exactly are these similar image properties?

“The second image is a color photograph from the NASA Terra satellite” Same comment as for Meteosat regarding the citation, and I would add that the image was captured by the MODIS instrument.

“Where the model has proven capable of managing a broad dynamic range of color inputs.” This statement sounds a bit grandiose in the context of what actually has been shown here, in my opinion. I would state something like “We see that for this example some line-shaped structures, likely to be contrails, are indeed detected by the algorithm.”

“NOAA Suomi-NPP” same comments as for the previous two satellites.

“contrails from ships”. The appropriate term here is “ship tracks”, I believe.

“It is clear that the model maintains consistent performance” Why is this clear? Is this quantified among the various image sources? The algorithm seems to detect line-like structures in all images, yes, but an algorithm for edge detection would as well.

Also, why is this Google Street View image so oddly colored?

For all satellite images shown, I would add what the image bands that have been used are, as well as the date and time of the images.

Data and few-shot learning
“leveraging few-shot learning” where exactly is this few-shot learning approach discussed in the paper?

“Unlike traditional end-to-end training…” This paragraph describes exactly what Kulik (2019) and Meijer et al. (2022) have done as well, albeit with a slightly larger dataset.

“enhancing the model’s robustness and adaptability” These terms have not been defined in the paper: what is meant by them?

“The model exhibits exceptional performance” Do the discussion or results presented by this paper really warrant such a statement?

“With the introduction of more diverse image sources, we anticipate further improvements” How would the introduction of more diverse image sources lead to improvement? Wouldn’t it be better to focus resources on one particular image source and increase performance there? I think this statement needs to be rephrased.

“For further refinement, we recommend compiling a dataset of carefully…” One could already do this by combining the datasets from McCloskey et al. (2021) and Ng et al. (2023).

Implementation
This section can be replaced with a short statement in the appropriate “data availability” section, I believe.
Loss functions
“to improve the model training” As far as I can see in the results, the performance metrics do not indicate such an improvement. Perhaps the discussion needs to be more specific here.

“We can observe that … outperforms” Again, this is hard to justify given the absence of any quantitative improvements shown by the model trained with the SR loss function, in my opinion.

“A relatively fast implementation” What is this relative to? Isn’t this the first time the SR loss function is discussed and implemented?

“Which allows it to be computed on GPU…” I believe it is not often that the loss function is a computational bottleneck for deep learning models, as long as it is differentiable.

“Is already better” Again, I do not believe there is enough quantitative or qualitative argumentation to back up such a statement.

Accuracy metrics
“with regardless” I think “with” should be omitted

“This is due to the fact that human-labeled contrail images can not be complete”
The way this sentence is written now, it seems to imply that there is some physical limit to how well a certain image can be labeled, which I do not think is the case.

Furthermore, labels will always be imperfect, and that certainly has not hindered the computer vision field to use metrics like the IoU and F1 score effectively.

“Secondly, the contrail masks hardly ever cover the contrail”. I see this is a self-imposed limitation due to the choice of labeling procedure, and it is something that I find a weak motivation to not rely on quantitative metrics in evaluating the performance of a contrail detection model.

Conclusions
“Flight contrails” I would use “aircraft contrails” instead.

“using by fine-tune a pre-train” should be “by fine-tuning a pre-trained”

“carefully labeled images” What makes these image so carefully labeled? This is orthogonal to what has just been claimed about the masks not fully covering the actual contrails!

“diverse transformations” what makes these transformations “diverse”?

“demonstrates strong performance” I’m not sure if this statement is warranted.

“demonstrated superior performance” I’m not sure if this statement is warranted

“Providing an innovative solution to the lack of large hand-labeled datasets” I do not believe this is true given earlier work done, see earlier comments.

Additional references
Inoue, Toshiro. "On the temperature and effective emissivity determination of semi-transparent cirrus clouds by bi-spectral measurements in the 10μm window region." Journal of the Meteorological Society of Japan. Ser. II 63.1 (1985): 88-99.
Shit, Suprosanna, et al. "clDice-a novel topology-preserving loss function for tubular structure segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
Meijer, Vincent R., et al. "Contrail coverage over the United States before and during the COVID-19 pandemic." Environmental Research Letters 17.3 (2022): 034039.
Citation: https://doi.org/10.5194/egusphere-2023-2189-RC1
- AC1: 'Reply on RC1', Junzi Sun, 19 Dec 2023
  
  We would like to thank the review for the comprehensive review of our paper. The following replies contain our responses to the major comments from the review. Other minor comments related to language and writing pointed out by the reviewer will be fixed in the revised text.
  1. Response to comment on lack of awareness of other work in field
  We thank the reviewer for pointing out the importance of recognizing existing work in contrail detection. We talked about McCloskey et al. (2021) and Ng et al. (2023) in this paper, but we will provide more insights in our updated literature review regarding the methodologies. Our scope is to focused on the limited availability of large, hand-labeled datasets. The revised manuscript will reflect a more in-depth analysis of the current state of contrail detection research.
  2. Response to comment on methodology not novel
  We appreciate the reviewer's insights on our methodology. However, we believe there are several aspects which may have been overlooked regarding the novelty of our approach:
  1. SR Loss Function in Hough Space: Our paper introduces a new loss function, SR Loss, that works in both the original and transformed Hough spaces. This is a significant change from traditional loss functions, like Dice and Focal losses, which do not consider the inherent linear shape of contrails in this way.
  2. Augmented Transfer Learning: While transfer learning and image augmentation are common, our specific implementation and combination of these techniques are tailored for the unique challenges of contrail detection in satellite imagery. This specific application represents an innovative approach in this domain.
  3. Few-Shot Learning with Limited Data: Our methodology emphasizes training contrail detection models with minimal labeled data, combining few-shot learning with augmented transfer learning, a relevant approach given the challenges in obtaining large, high-quality datasets for remote sensing imagery.
  
  We will clarify these points in our revised manuscript to better highlight the novelty of our approach.
  3. Response to comment on the lack of quantitative analysis
  The critique about the qualitative nature of our analysis is well-taken. We will conduct a more rigorous quantitative analysis in our revised paper. This will include detailed performance metrics of our detection model and the impact of various adjustments to the loss function. We aim to present our findings in a way that allows for objective interpretation and comparison, correcting any perceived overstatements about the performance of our new loss function.
  
  However, we also want to point out the the existing metrics for image detection is not fully applicable in the contrail detection research. We want to look for alternative from flight trajectory based approaches, but these maybe beyond the scope of this paper
  4. Response to comments on technical mistakes and confusing statements
  We are committed to addressing each specific comment provided by the reviewer to correct any mistakes and clarify confusing statements. Our revision will go through the paper carefully to ensure accuracy and clarity throughout.
  We fully thankful and intend to follow the reviewer’s valuable suggestions for improving our work. This includes a extensive review of both the contrail detection and deep learning literature to better compare our novel loss function with existing approaches. Additionally, we will use existing labeled contrail datasets to assess the quantitative performance of our SR loss function and data augmentation techniques.
  Finally, we believe the revisions will greatly enhance the quality and impact of our paper, addressing the concerns raised by the reviewer and making a valuable contribution to the field.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2189-AC1
RC2:
'Comment on egusphere-2023-2189', Anonymous Referee #2, 10 Nov 2023
In some conditions, air traffic is generating contrails. These clouds of anthropogenic origin exhibit a significant radiative forcing, and improved methods for classifying contrails in satellite imagery are required for better characterising their present climate impact and how to decrease the impact by adjusting flight routes and switching to alternative fuels. This manuscript addresses this classification problem with machine learning. Despite that at least one novel approach is introduced, unfortunately, the overall judgement is that the manuscript does not meet the requirements to be considered for publication. That is, the suggested action is to reject the manuscript.
The main shortcoming is a need for quantitative results backing up the claims to prove that the techniques introduced actually lead to clear improvements. The claims are primarily subjective opinions. Publicly available datasets of satellite images with contrail labelling exist. In fact, the authors cite two such datasets (McCloskey et al., 2021; Ng et al., 2023). However, these datasets are not considered for evaluating the model suggested for contrail detection, but rather only four images from different sources. The paper would benefit from such an evaluation, concerning a possible improvement in the classification skill as well as supporting that the data augmentation used for the few-shot learning strategy used is sufficient.
The scope of the paper needs to be clarified. The training is solely done on infrared geostationary data, and interpreting such data appears to be the scope through most of the manuscript. However, towards the end, also photos from ground level are considered. In the Conclusion, it is said, “... to detecting flight contrails in remote sensing imagery data”. This statement includes ground-based observations, as well as including measurements across the electromagnetic spectrum.
Contrails are manually identified in a set of images. Nothing is said about who has done this labelling nor the expertise of that person. That is, the quality of the labelled dataset is hard to judge. In addition, nothing is said on the process followed to select images with contrails, only that "days and regions are identified with contrail occurrence". Further, the dataset size is small, with a total of 30 images, with 20 for training. It is argued that the data augmentation applied compensates for this fact, and it is agreed that rotation, scaling, etc., are valid ways to augment the effective size of the dataset. However, the parameters of the data augmentation transforms used are not stated. On the other hand, it is doubtful if the contrast and brightness changes are valid when the observations from the thermal infrared range are used. There is no variation in “luminance” for these observations. There seems to be confusion with visible and near-IR measurements.
The language and structure of the paper need to be revisited. Poor phrasings, incorrect spellings and grammatical errors induce an insufficient reading flow. Moreover, the arrangement of some paragraphs and sections is non-standard, leading the reader to question the origin of specific information or statements. For example, in Sect. 5.2 an example of a pre-trained model is given, while the actual used one is given first in Sect. 7.2 (i.e. as part of the discussion, with no reference to it in Sect. 5.2). Although the authors selected to include a completely new acronym (SR) in the title, its meaning is not explained.
Concerning the network and training approach used, the authors should offer technical clarifications. For example, it is unclear how the network trained with SR loss predicts in the untransformed and transformed space as opposed to the networks trained with Focal or Dice losses alone: does it have two network heads or does it use only one head? If the latter is true, is any information given to the network when predicting in Hough space or untransformed? That is, what is fed to the network, and how does it work at test time? From the manuscript, the understanding is that the network takes the brightness temperature difference as input alone. In addition, the authors mention that the networks are pre-trained with ImageNet, that is, RGB images (3 channels), and present more RGB images in Fig. 14. Does the network also support RGB channels, which are not captured with the near-IR training data, or has any pre-processing applied to the RGB images?
Some more specific comments are given below, with no ambition to be complete and language issues ignored completely.
The description of the libraries used could be moved into an appendix, and the descriptions of deep learning tools, such as residual blocks or data augmentation, could be more concise or use fewer figures. The authors should instead emphasise what exact network they used, for example, if their network matches the original U-Net with residual connections, and if yes, how these residual connections are built in the network; it is difficult to judge from Figs. 4 and 5.

Sect. 1.2, "the details of the models are not made available by the paper". It is unclear what the authors want to convey with this statement.

Sect. 1.3, "no adequate loss functions optimized for linear features" is not convincing enough. The deep learning literature is broad, and likely similar approaches have been developed for detecting linear features. This statement could benefit of showing that such research has been done; a quick search with "hough transform" for recent years reveals many papers making use of this transform and deep learning.

Sect. 3.4, It is suggested to introduce the notation first and then focus on the loss functions. Is the meaning of p in eq. (3) the same as in eq. (4), explained below eq. (4)? What values can g in eq. (4) take, \pm 1 or 0 and 1?

Sect. 6.1, "corrupted data occurs in the image (bottom right)". It is not explained how this is handled in the training nor if the corrupted pixels are inherent in the data; with the small dataset (30 images), this can be inspected manually.

Sect. 6.2, the explanation of the training is incomplete: it is only given the number of steps (which is unclear what is meant with 'steps'), but no other details, such as learning rates or if it were employed, for example, any optimizers or mini-batches for the gradient updates.

Sect. 7.1, it is not explained how was the training-test split done.

Sect. 7.3, "However, we can observe that the model trained with 4000 steps using SR Loss is already better than the other two models trained with 8000 steps". I understand this statement as that SR Loss is more effective, but the SR Loss is using two spaces for the optimization problem (is that, then, 2 spaces x 4000 steps = 8000 effective steps?),; I think this statement should be reviewed.

Figure 1 would require a colour bar describing the BTD values.

Figure 12 caption states that the models are trained with 8000 steps, but the last column indicates 4000 steps.
Citation: https://doi.org/10.5194/egusphere-2023-2189-RC2
- AC2: 'Reply on RC2', Junzi Sun, 19 Dec 2023
  
  We would like to thank the reviewer for the in-depth review of our paper. The following replies contain our responses to the major comments from the review. Other minor comments related to language and writing pointed out by the reviewer will be fixed in the revised text.
  1. Response to comment on quantitative results and dataset utilization
  We acknowledge the reviewer's concern regarding the lack of quantitative results and limited dataset usage. In our revised manuscript, we will incorporate a detailed quantitative analysis using the datasets cited (McCloskey et al., 2021; Ng et al., 2023). This will allow us to comprehensively evaluate our model's performance and validate our claims with objective data. We understand the importance of leveraging these datasets to demonstrate clear improvements in classification skill and the effectiveness of our few-shot learning strategy.
  2. Response to comment on the scope clarifications
  We appreciate the feedback on the need for clearer scope delineation. We will revise the manuscript to consistently focus on few-shot learning with the proposed new loss functions for facilitate the contrail detections in segmentation tasks.
  3. Response to comment on dataset quality and data augmentation
  The reviewer’s point about the quality of the labeled dataset and the details of data augmentation is well-taken. In the revised manuscript, we will provide clear information about the expertise of the individuals who labeled the contrails and elaborate on the selection process for the images. Additionally, we will specify the parameters used in our data augmentation techniques, addressing concerns about the validity of contrast and brightness changes in thermal infrared observations. Currently, they are all available in the open source code, but they will be elaborated in the paper as well
  4. Response to comment on language and structure
  The revised manuscript will undergo more proofreading to correct grammatical errors and improve phrasing. We will reorganize the content for better logical flow, ensuring that all sections and paragraphs present information coherently. The meaning of the newly introduced acronym (SR) will be explicitly explained early in the manuscript.
  5. Response to comment on technical clarifications of network and training approach
  Currently a lot of these nuances are in the source code we shared openly together with the paper. In the revised paper, we will also clarify how the network trained with SR loss functions in both untransformed and transformed spaces in text. Additionally, we will conduct more tests that aim to address the use of pre-training with ImageNet and its compatibility with our near-IR training data, including any preprocessing applied to RGB images.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2189-AC2

Report abuse

Please provide a reason why you see this comment as being abusive.
You might include your name and email but you can also stay anonymous.

Please provide a reason why you see this comment as being abusive.

Please confirm reCaptcha.

Comment*

Name:

Email:

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2189', Anonymous Referee #1, 02 Nov 2023
General comments
This study presents a new loss function that tries to leverage the linear structure of young contrails in order to improve the training of deep learning models for satellite-based contrail detection.
Unfortunately, I will have to recommend this paper to be rejected. The new loss function idea is intuitive and seems worthy of a publication in the future, but the current state of the paper is in my opinion not of high enough quality to become publishable after one or more review rounds.
One of the most prominent issues I have with the paper is the seeming lack of awareness of other work done in the field. A major motivator for the paper is a “lack of large hand-labeled datasets” for contrail detection on satellite images. I strongly disagree with that sentiment, as there are multiple efforts (McCloskey et al. 2021, Ng et al. 2023) that have provided open-source datasets of contrails labeled on satellite images. Furthermore, the authors present several aspects of their methodology as “novel/innovative” (e.g. image augmentation, pre-training) whereas these techniques are a mainstay in deep learning and have been used by previous approaches for contrail detection as well (Kulik 2019, Meijer et al. 2022). Moreover, the literature on deep learning and novel loss functions for specific applications is vast, and no comparison of the introduced loss function to other existing, perhaps similar alternatives (e.g. Deep Hough Transforms) is made.
Another problem is the qualitative nature of the analysis and discussion in this paper. Very few numbers are produced regarding the performance of the detection model and the effect of various changes to the loss function. Instead, the analysis is mostly qualitative (“this looks better than that”) and therefore subjective. Furthermore, there seems to be a consistent discrepancy between the actual results and how the authors present them. I don’t see how statements like “This new loss function has demonstrated superior performance” are warranted given the paper’s contents. I recommend the authors to provide more objective reflections of the data that allow the reader to draw their own conclusions.
There are small mistakes, inaccuracies and (what I believe to be) confusing statements throughout the paper. I have tried to point these out in the specific comments below.
Overall, I suggest that the authors proceed as follows to improve this work and publish it eventually:
Review the literature on contrail detection (and perhaps also scour the deep learning literature to compare the novel loss function with existing approaches in other application domains).

Utilize existing labeled contrail datasets to assess the effect on quantitative performance that the SR loss function has (and perhaps some of the data augmentation techniques as well).

Specific and technical comments
My apologies for the confusing way of referring to certain parts of the manuscript: no line numbers were available.
Abstract: The abstract can be improved as follows. It is not very clear what the link is between sentence 1 and 2. The abstract suddenly jumps from “aviation climate impact” to “computer vision”. I would recommend to first state the relevance of computer vision to the topic at hand: techniques to automatically gather satellite-based data on contrails. Also, the statement “scarcity of hand-labeled contrail dataset” is questionable, see e.g. McCloskey et al. (2021) and Ng et al. (2023) who introduce large hand-labeled datasets of contrails.
Introduction: The first sentence does not flow well: I recommend rewriting it. The second sentence contains “They are emit” which I think should be “They emit”. Next, the second sentence states that aircraft emit nitrous oxide (which is ), which is incorrect I believe: perhaps the authors mean “oxides of nitrogen”.
The third sentence “unlike other types of transports, aircraft-generated contrails…” seems a bit odd as well: perhaps it is better to contrast this with other aviation emissions? Also, it states “could cause impact the climate” which I think should be either “could impact the climate” or “could cause climate impact”. The last sentence uses “are” instead of “is”.
I think this section (introduction) could use a rewrite as well. It does not capture the idea that we have several models that we can use to estimate the climate impact of contrails, but that these model outputs are associated with great amounts of uncertainty. Gathering more observational data, by e.g. satellites, could be a way to reduce these uncertainties.
Computer vision approaches:
“Contrail detection traditionally utilizes computer vision tasks, given that it involves identifying linear features”. This sentence does not really make sense to me. Indeed, automated contrail detection on satellite imagery is best done using computer vision techniques. However, it is not the idea that contrails tend to be linear (at least when they’re young) that makes it so that we use computer vision to automatically detect them. It is however a key property that motivated some of the older approaches like Mannstein et al. (1999).

“Hough transform” I would say “the Hough transform” or “Hough transforms”

“methods have been extender” I believe this is a typo and should be “extended”

“tests for contrails in an image”. This sounds a bit odd, and I think the authors can be more specific by specifically mentioning that this is the algorithm by Mannstein et al. (1999).

“Zhang et al. (2017) combined visual contrail detection” what is meant with “visual” contrail detection? The reader could interpret this as contrail detection using visible satellite imagery, which I do not think is the case here.

“With advanced image processing…” I have two problems with this sentence:
“With advanced image processing”: firstly, Minnis et al. use an extension of Mannstein’s algorithm, and given that this paper was published in 2013 (when the first highly effective deep learning algorithms for computer vision were available) I think that the use of the word “advanced” here is unwarranted.

The use of “With advanced image processing” and then at the end of the sentence again “a blend of … imagery processing techniques” seems tautological, and I would recommend choosing one or the other.

“The study utilized …”
If the authors want to cite a paper fundamental to the brightness temperature difference technique, I believe Inoue (1985) is more appropriate than Ackerman (1996).

I would augment this sentence with an explanation of what these techniques actually were used for (retrieve contrail optical depth etc.)

Machine learning approaches:
Firstly, I would simply call this section “Deep learning approaches” as this is more specific and still covers all the papers cited in this section.

“auto-encoder based convolutional neural network” I would argue that Kulik (2019) used a convolutional neural network and not an auto-encoder of any kind.

“However, despite its success… the model could not determine their exact location due to the simplicity of the employed machine learning model” What is meant with “their exact location”?
The model from Kulik finds the pixels in a satellite image that it estimates to be part of contrails, therefore it does in fact locate them in the horizontal sense. Do the authors perhaps mean to distinguish between segmentation (i.e. find which pixels are part of contrails) and object detection (find the pixels that are part of each contrail) here?

“Simplicity of the employed machine learning model” How is the model employed by Kulik (2019) simple? It is actually incredibly similar to the one (the U-net) used in this study.

“A parallel approach…” how is this approach parallel to that of Kulik (2019)? Furthermore, detecting contrails in ground-based camera images is a completely different task than detecting contrails in satellite imagery, so I do not see the relevance of mentioning the Saddiqui (2020) thesis here.

“has provided a modest set” what makes this dataset modest?

“landsat” I suggest capitalizing Landsat and being specific in that these are Landsat-8 imagery

“Another recent study by…”
“in a more comprehensive effort” why is this more comprehensive than McCloskey et al. (2021) ?

“contrails over the United States” The dataset created by Ng et al. (2023) covers not only the United States but also other parts of the GOES-16 full disk imagery including the Southern Hemisphere.

“the GOES-16 satellite imagery” I would leave out “the”

“In the course of assembling this dataset, a convolutional…” I believe this was not the case, there was no convolutional neural network involved in creating the dataset of hand-labeled contrails.

“The research shows promising results…”
The authors can be more precise here by stating that the model presented in Ng et al. (2023) outperforms single-frame models

“high performance hardwares” I would say “hardware” which is already plural I believe

I don’t see how the need for high performance hardware (which is not a very precise term) is a detriment to the paper’s results.

This section could also cite Meijer et al. (2022): who use a different neural network than Kulik (2019) and apply this to a large dataset.

Research gaps
“It has been found that traditional computer vision based…” It would be appropriate to back up this statement with a citation.

“The complexity of satellite images captured under varying conditions” This is a rather vague sentence: what are the authors trying to say here?

“Earlier machine learning approaches…” Again, would be more specific here and cite the appropriate references. And why do such methods require larger amounts of data than more recent methods?

“In particular, there are no adequate loss functions optimized for linear features…” What about papers like Shit et al. (2021)? I would be more specific here and say that there exist such approaches but no study has investigated their usefulness for contrail detection.

“This shortcoming makes detection…”
Why does this lead to more difficult detection at lower resolutions? And what are lower resolutions in this case? 1 km, 2 km or more? Doesn’t lower resolution make detection of an object harder in general, no matter its similarity to a straight line?

How would adequate loss functions resolve the issue of more difficult detection of multiple contrails that are close to each other? And what is meant with “close to each other”? I imagine that the relative orientation of the contrails in a cluster is relevant here.

Contributions of this study
“This paper aims to present a machine learning model… with augmented satellite images to improve the training efficiency”
Augmented in what way?

What is meant with “Training efficiency”?

“This approach enables the effective training … with a minimal dataset on standard computer hardware”
What is meant with a “minimal” dataset?

“Firstly we complement … with data augmentation methods …” Almost all (if not all) previous deep learning models for contrail detection use data augmentation (e.g. Kulik (2019), Meijer et al. (2022), Ng et al. (2023)) as this is a standard approach in deep learning to regularize models.

“SR loss” The abbreviation has not yet been defined (here or anywhere else in the paper)

“We also offer open access to … imagery data, …” Most satellite image data is publicly available, so I would remove that from this sentence.

“we perform thorough evaluations of the performances of our models openly”
I would remove “thorough” as this is subjective

“openly” what is an “open evaluation” of a model? Does this imply that other previous papers have not been “open” in their evaluation?

GOES data
Please mention that GOES-16 ABI (advanced baseline imager) data is used, and provide a citation to a relevant paper. This helps the GOES-R team to see what their data is used for!

“Contrails and cirrus clouds share atmospheric similarities” What is meant with “atmospheric similarities”? Perhaps a sentence like “Contrails and cirrus clouds feature similar microphysics” would be more appropriate

“pre-process” should be “pre-processing”

“the difference” instead of just “difference”

“difference between 12.3 and 10.35 is obtained” Does this mean that you subtract the image corresponding to the band from the3 m band? This is exactly opposite to the typical convention for the BTD. Thin cirrus tends to absorb more radiation near 12.3 than near 10.35 . Therefore, less radiance is measured in the 12.3 band than in the band, such that subtracting 12.3 from 10.35 leads to thin cirrus and contrails having positive BTD.

“After days and regions are identified with contrail occurrence” Please expand on how this is done

I do not see the added value of mentioning the software packages used to download “goes2go” and process “netCDF4” satellite data. Given that GOES-16 ABI data is primarily disseminated in netCDF files, I think it is rather trivial that one uses the netCDF library to handle that data.

Figure 1
The convention is to plot a satellite image with lighter colors corresponding to higher values. As such, contrails typically appear as darker features on infrared imagery. Not following this convention may confuse readers.

I would provide the time and date of the GOES-16 ABI image.

The BTD defined here (channel 13 – channel 15) is opposite to that described in the text.

Contrail labeling
“contrails are first traced with paths” I do not believe the average reader will be familiar enough with GIMP to know that a “path” is. Could you please provide more detail on whether this is a polygon-shape, or a line?

“the mask image is generated with strokes of approximately two pixels on all paths”
Contrails vary in width, and based on the above statement this is not accounted for in the labeling process. This may lead to biases in the training process (depending on the loss function used), and this should therefore be acknowledged.

“In total, around 30 images at two different locations … are selected and …” Why only at two different locations, and why specifically San Francisco (note the erroneous use of a hyphen in the paper) and Florida? This obviously has impacts on the model’s generalization to other regions within the GOES-16 ABI domain.

Given that San Francisco is quite far away from the GOES-16 sub-satellite point (and thus the satellite viewing zenith angle is quite large), the images are likely to be distorted. Is a correction (e.g. a map projection) applied before the labeling and/or training process?

“which are not use” “use” should be “used” I presume

Regarding the labeling procedure: nothing is mentioned on how the labelers identify contrails? What kind of decision making process was used to distinguish between contrails and natural cirrus and/or background features? See for example Meijer et al. (2022) for such a description.

Figure 2
I suggest to add the date and time of the GOES-16 image.

U-Net
What is meant with “generation research”?

“High accuracy and efficiency” as measured by or compared to?

You could cite both Kulik (2019) and Meijer et al. (2022) here as both use a U-Net model as well.

“the traditional structure of a CNN”: what is a traditional structure here?

“This process expects the abstracted semantics to be captured” Perhaps better said using “the deeper layers capture higher-level image features”.

“using transposed convolutions” The U-Net as proposed by Ronneberger et al. (2015) does not use transposed convolutions, I believe.

“and thus lead to” “lead” should be “leads” I believe

“for classification” Isn’t the task at hand image segmentation?

Figure 3:
I think the use of a visible ground-based image of contrails is a bit misleading, and I would recommend the authors to use a satellite image instead.

“Activation” is only placed at the end of the network, although there are likely many more activation functions present throughout the neural network. Perhaps this could be clarified in the figure caption.

The various colors and boxes are not explained. I think this would be helpful (perhaps adding a legend would resolve this).

ResNet
I do not think this section is necessary at all. Residual blocks are a mainstay in deep learning and the authors can mention their usage but an explanation of how they work and why they are useful does not fit in a paper that focuses on contrail detection, in my opinion. The authors can also cite previous contrail detection papers here, as most deep learning approaches utilize residual blocks.
Combining U-Net and ResNet
Again, this section could be omitted. However, I would include an explanation of exactly which ResNet model is used (ResNet18 or 52 etc).
Conventional loss functions
“The choice of loss function is critical part…”
“is a” instead of “is”

There are machine learning approaches that do not require a loss function. Nearest neighbors is an example. You can of course interpret the distance metric used as a “loss function” but I believe that this statement is far too general.

“Two conventional loss functions…”
Why not use the cross-entropy loss function?

“are highly imbalanced in our study” I would remove “our study” and simply state this is a general issue with satellite-based contrail detection, and back it up with some numbers regarding the occurrence rate of contrails (see for example Meijer et al. 2022, Ng et al. 2023)

“these loss functions do not heavily penalize the prediction the majority class”
What is “heavily penalize”?

“the prediction the majority class” should be “the prediction of the majority class”

“Focal loss” would replace with “The focal loss”

“misclassified and hard classifications”
What makes a classification “hard”? And how is classification relevant in this image segmentation task? (I presume the authors refer to the fact that image segmentation comes down to a classification problem for each pixel, but that is not clear from the text at all.)

“misclassified” would say “misclassified pixels”

“it applies a specific modulating factor…”
None of the symbols , and are defined.

“otherwise” given that is undefined, it is not clear what other values can take.

“Where ” I believe this should be , and does not make the focal loss equal to the cross-entropy loss, does, right?

“It is based on the Dice coefficient…” The equation that follows is not for the Dice coefficient, but for the Dice loss (which I find a bit confusing)

“where represents the probability of prediction for a pixel belonging to the target class (contrail or non-contrail).” Perhaps it is better to state “target class (contrail)” instead, as the reader might now be confused about what the target class is.

SR Loss at Hough space to improve contrail segmentation
“via convolutions” How are convolutions relevant in the evaluation of the previously introduced loss functions?

“And they …”
This sentence could best be merged with the previous one, I believe

“explicitly considers” should be “explicitly consider”

“Predicted contrail formations” Should this instead read “contrail detections”? I don’t see how contrail formation is relevant to the discussion at hand.

Hough space and transformation
“The Hough transformation first converts the common linear representation of” Do you mean “representation of a line”? It may also be helpful here to provide definitions of the symbols used, especially since was used in the previous section as well for a seemingly different purpose.

“We denote this polar coordinate system” Would the word “refer” here be better than “denote”?

“where each point … pixel space.” Should “correspond” be “corresponds” ?

“Only lines that are close to a sufficient amount of masked pixels are selected” What is a sufficient amount and how is that threshold determined?

Combining Dice loss at original and Hough spaces
“Aforementioned dice loss at the pixel space” Perhaps “in the pixel space” would be better here?

“similarity at the Hough space” Maybe it is better to say “in the Hough space”.

What is the motivation for also using a Dice loss in the Hough space? Moreover, with the way the transformation to Hough space is being performed (either a particular cell is occupied or not) does this not lead to under-representation of very long contrails (which would not be weighted accordingly) in your loss function?

Figure 7
I would replace the fully connected neural network diagram with one of a CNN.

I would replace “predict” with “predicted”

Image augmentation
I think that this section can be made much more terse: image augmentation is extremely common in deep learning and it has been applied in previous studies that look at contrail detection. I would remove figures 8, 9, and 10 or move them towards the supplementary materials.
“One way in” would sound better as “one way of”

“is to train neural network model” I would opt for the plural “models”

“Large, high-quality datasets are not always available” The validity of this statement is questionable given the work done by McCloskey et al. (2021) and Ng et al. (2023).

“Image augmentation provides an efficient way to generate training data using a small labeled dataset.” I would use different language here with more nuance like “image augmentation can mitigate overfitting”.

“Essentially, this can generate several order of magnitude more images for model training based on a limited number of manually labelled images” Is this really true? Do the “augmented” images contain as much new information in them as “true” additional images? If so, I would provide some citations to back up this claim.

“We apply … which includes” These 2 sentences are more or less a repetition of the previous 2.

“Perspective of the contrails” what is the perspective of a contrail?

“lighting variations”: given that many data augmentation techniques have been designed with visible imagery in mind, I wonder how applicable these techniques are to infrared imagery.

“During training … we apply a sequence … contrail mask”. Could you specify the probabilities used for the various transformations?

“robust to varying image quality and contrails” Firstly, how does image quality come into play here? Secondly, what is “robust to varying contrails”?

I think more detail should be given for how the viewing angle variation is implemented, as that is potentially very interesting.

Transfer learning based on pre-trained models
Again, I would make this section far shorter. Pre-trained models have been used by among others Kulik (2019) and Meijer et al. (2022). It is very common to do so. I would only mention the following two points (which are currently missing)
The exact dataset that has been used for pre-training

How a pre-trained model (which I presume was trained on some kind of natural image dataset with 3 channels: RGB) was modified to accept 1 input channel (which I interpret as being the case for the satellite images in this study)

Evaluating the model with unseen GOES data
“Unlike common image objective identification tasks” What are image objective identification tasks?

“easily computed ground truth” How does one compute a ground truth?

“Persistent contrails can disperse into cirrus clouds, which become indistinguishable from clouds in many cases”. This sentence sounds a bit odd. I would replace it with something like “Persistent contrails spread as they age and may become indistinguishable from natural cirrus clouds”.

“Thus, the performance evaluation is mostly based on the visual inspection of the results.” I disagree with the line of reasoning that leads up to the decision to evaluate the performance at a visual level. Other deep learning based studies on contrail detection have performed quantitative performance evaluation, and there is obvious correlation between the metrics used and “qualitative performance”. If there wasn’t, such metrics (think of the precision and recall) would not be used as frequently as they are.

In this section, the terminology “clear” and “unclear” contrail is used multiple times. What exactly is meant with these terms?

“The third row”. Firstly, is the data corruption here an image artifact (it looks rather strange for an image artifact to me, and more like a saturation issue) or something that has been added to the image by the authors? And why is the image resolution worse for this case? Because it is at a higher viewing zenith angle?

Performance under different loss functions
“given the higher computational time required” Does this imply that the use of the SR loss function increases the computational time by a factor of two? (Since 4000 steps are used now rather than 8000). I would also simply use the same number of steps for each loss function as to do a fairer comparison.

What is mean with “steps” here? Are these epochs or iterations (i.e. gradient updates)?

Could you specify the optimizer used and its hyperparameters?

“are seem with” seem should be seen, I believe?

“clean background” what does it mean for a background to be clean?

“As it can focus on forming masks for longer lines of contrails…” Referring back to one of my earlier comments regarding the way in which the Dice loss is applied in Hough space: doesn’t that loss function fail to distinguish between a longer and a shorter contrail as long as they have the same coordinate?

“In later discussion section” I would rephrase this as “Later in the discussion section”

Evaluating the model with other image sources
“The resulting contrail detection model has demonstrated an ability …” I think this statement should be more nuanced, given that no quantitative evidence (or even qualitative) has been offered to back it up.

“The contrail detection model can be directly applied to different types of image sources without additional training”
I would rephrase “different types of image sources” as “images from different sources”

I think there are many models that can readily be applied to different types of images, what we are really interested in here is how well they perform!

“MeteoSat” I believe the “s” need not be capitalized. And please cite the appropriate source for this data, and mention that the image is captured by the SEVIRI instrument.

“Which shares similar image properties …” What exactly are these similar image properties?

“The second image is a color photograph from the NASA Terra satellite” Same comment as for Meteosat regarding the citation, and I would add that the image was captured by the MODIS instrument.

“Where the model has proven capable of managing a broad dynamic range of color inputs.” This statement sounds a bit grandiose in the context of what actually has been shown here, in my opinion. I would state something like “We see that for this example some line-shaped structures, likely to be contrails, are indeed detected by the algorithm.”

“NOAA Suomi-NPP” same comments as for the previous two satellites.

“contrails from ships”. The appropriate term here is “ship tracks”, I believe.

“It is clear that the model maintains consistent performance” Why is this clear? Is this quantified among the various image sources? The algorithm seems to detect line-like structures in all images, yes, but an algorithm for edge detection would as well.

Also, why is this Google Street View image so oddly colored?

For all satellite images shown, I would add what the image bands that have been used are, as well as the date and time of the images.

Data and few-shot learning
“leveraging few-shot learning” where exactly is this few-shot learning approach discussed in the paper?

“Unlike traditional end-to-end training…” This paragraph describes exactly what Kulik (2019) and Meijer et al. (2022) have done as well, albeit with a slightly larger dataset.

“enhancing the model’s robustness and adaptability” These terms have not been defined in the paper: what is meant by them?

“The model exhibits exceptional performance” Do the discussion or results presented by this paper really warrant such a statement?

“With the introduction of more diverse image sources, we anticipate further improvements” How would the introduction of more diverse image sources lead to improvement? Wouldn’t it be better to focus resources on one particular image source and increase performance there? I think this statement needs to be rephrased.

“For further refinement, we recommend compiling a dataset of carefully…” One could already do this by combining the datasets from McCloskey et al. (2021) and Ng et al. (2023).

Implementation
This section can be replaced with a short statement in the appropriate “data availability” section, I believe.
Loss functions
“to improve the model training” As far as I can see in the results, the performance metrics do not indicate such an improvement. Perhaps the discussion needs to be more specific here.

“We can observe that … outperforms” Again, this is hard to justify given the absence of any quantitative improvements shown by the model trained with the SR loss function, in my opinion.

“A relatively fast implementation” What is this relative to? Isn’t this the first time the SR loss function is discussed and implemented?

“Which allows it to be computed on GPU…” I believe it is not often that the loss function is a computational bottleneck for deep learning models, as long as it is differentiable.

“Is already better” Again, I do not believe there is enough quantitative or qualitative argumentation to back up such a statement.

Accuracy metrics
“with regardless” I think “with” should be omitted

“This is due to the fact that human-labeled contrail images can not be complete”
The way this sentence is written now, it seems to imply that there is some physical limit to how well a certain image can be labeled, which I do not think is the case.

Furthermore, labels will always be imperfect, and that certainly has not hindered the computer vision field to use metrics like the IoU and F1 score effectively.

“Secondly, the contrail masks hardly ever cover the contrail”. I see this is a self-imposed limitation due to the choice of labeling procedure, and it is something that I find a weak motivation to not rely on quantitative metrics in evaluating the performance of a contrail detection model.

Conclusions
“Flight contrails” I would use “aircraft contrails” instead.

“using by fine-tune a pre-train” should be “by fine-tuning a pre-trained”

“carefully labeled images” What makes these image so carefully labeled? This is orthogonal to what has just been claimed about the masks not fully covering the actual contrails!

“diverse transformations” what makes these transformations “diverse”?

“demonstrates strong performance” I’m not sure if this statement is warranted.

“demonstrated superior performance” I’m not sure if this statement is warranted

“Providing an innovative solution to the lack of large hand-labeled datasets” I do not believe this is true given earlier work done, see earlier comments.

Additional references
Inoue, Toshiro. "On the temperature and effective emissivity determination of semi-transparent cirrus clouds by bi-spectral measurements in the 10μm window region." Journal of the Meteorological Society of Japan. Ser. II 63.1 (1985): 88-99.
Shit, Suprosanna, et al. "clDice-a novel topology-preserving loss function for tubular structure segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
Meijer, Vincent R., et al. "Contrail coverage over the United States before and during the COVID-19 pandemic." Environmental Research Letters 17.3 (2022): 034039.
Citation: https://doi.org/10.5194/egusphere-2023-2189-RC1
- AC1: 'Reply on RC1', Junzi Sun, 19 Dec 2023
  
  We would like to thank the review for the comprehensive review of our paper. The following replies contain our responses to the major comments from the review. Other minor comments related to language and writing pointed out by the reviewer will be fixed in the revised text.
  1. Response to comment on lack of awareness of other work in field
  We thank the reviewer for pointing out the importance of recognizing existing work in contrail detection. We talked about McCloskey et al. (2021) and Ng et al. (2023) in this paper, but we will provide more insights in our updated literature review regarding the methodologies. Our scope is to focused on the limited availability of large, hand-labeled datasets. The revised manuscript will reflect a more in-depth analysis of the current state of contrail detection research.
  2. Response to comment on methodology not novel
  We appreciate the reviewer's insights on our methodology. However, we believe there are several aspects which may have been overlooked regarding the novelty of our approach:
  1. SR Loss Function in Hough Space: Our paper introduces a new loss function, SR Loss, that works in both the original and transformed Hough spaces. This is a significant change from traditional loss functions, like Dice and Focal losses, which do not consider the inherent linear shape of contrails in this way.
  2. Augmented Transfer Learning: While transfer learning and image augmentation are common, our specific implementation and combination of these techniques are tailored for the unique challenges of contrail detection in satellite imagery. This specific application represents an innovative approach in this domain.
  3. Few-Shot Learning with Limited Data: Our methodology emphasizes training contrail detection models with minimal labeled data, combining few-shot learning with augmented transfer learning, a relevant approach given the challenges in obtaining large, high-quality datasets for remote sensing imagery.
  
  We will clarify these points in our revised manuscript to better highlight the novelty of our approach.
  3. Response to comment on the lack of quantitative analysis
  The critique about the qualitative nature of our analysis is well-taken. We will conduct a more rigorous quantitative analysis in our revised paper. This will include detailed performance metrics of our detection model and the impact of various adjustments to the loss function. We aim to present our findings in a way that allows for objective interpretation and comparison, correcting any perceived overstatements about the performance of our new loss function.
  
  However, we also want to point out the the existing metrics for image detection is not fully applicable in the contrail detection research. We want to look for alternative from flight trajectory based approaches, but these maybe beyond the scope of this paper
  4. Response to comments on technical mistakes and confusing statements
  We are committed to addressing each specific comment provided by the reviewer to correct any mistakes and clarify confusing statements. Our revision will go through the paper carefully to ensure accuracy and clarity throughout.
  We fully thankful and intend to follow the reviewer’s valuable suggestions for improving our work. This includes a extensive review of both the contrail detection and deep learning literature to better compare our novel loss function with existing approaches. Additionally, we will use existing labeled contrail datasets to assess the quantitative performance of our SR loss function and data augmentation techniques.
  Finally, we believe the revisions will greatly enhance the quality and impact of our paper, addressing the concerns raised by the reviewer and making a valuable contribution to the field.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2189-AC1
RC2:
'Comment on egusphere-2023-2189', Anonymous Referee #2, 10 Nov 2023
In some conditions, air traffic is generating contrails. These clouds of anthropogenic origin exhibit a significant radiative forcing, and improved methods for classifying contrails in satellite imagery are required for better characterising their present climate impact and how to decrease the impact by adjusting flight routes and switching to alternative fuels. This manuscript addresses this classification problem with machine learning. Despite that at least one novel approach is introduced, unfortunately, the overall judgement is that the manuscript does not meet the requirements to be considered for publication. That is, the suggested action is to reject the manuscript.
The main shortcoming is a need for quantitative results backing up the claims to prove that the techniques introduced actually lead to clear improvements. The claims are primarily subjective opinions. Publicly available datasets of satellite images with contrail labelling exist. In fact, the authors cite two such datasets (McCloskey et al., 2021; Ng et al., 2023). However, these datasets are not considered for evaluating the model suggested for contrail detection, but rather only four images from different sources. The paper would benefit from such an evaluation, concerning a possible improvement in the classification skill as well as supporting that the data augmentation used for the few-shot learning strategy used is sufficient.
The scope of the paper needs to be clarified. The training is solely done on infrared geostationary data, and interpreting such data appears to be the scope through most of the manuscript. However, towards the end, also photos from ground level are considered. In the Conclusion, it is said, “... to detecting flight contrails in remote sensing imagery data”. This statement includes ground-based observations, as well as including measurements across the electromagnetic spectrum.
Contrails are manually identified in a set of images. Nothing is said about who has done this labelling nor the expertise of that person. That is, the quality of the labelled dataset is hard to judge. In addition, nothing is said on the process followed to select images with contrails, only that "days and regions are identified with contrail occurrence". Further, the dataset size is small, with a total of 30 images, with 20 for training. It is argued that the data augmentation applied compensates for this fact, and it is agreed that rotation, scaling, etc., are valid ways to augment the effective size of the dataset. However, the parameters of the data augmentation transforms used are not stated. On the other hand, it is doubtful if the contrast and brightness changes are valid when the observations from the thermal infrared range are used. There is no variation in “luminance” for these observations. There seems to be confusion with visible and near-IR measurements.
The language and structure of the paper need to be revisited. Poor phrasings, incorrect spellings and grammatical errors induce an insufficient reading flow. Moreover, the arrangement of some paragraphs and sections is non-standard, leading the reader to question the origin of specific information or statements. For example, in Sect. 5.2 an example of a pre-trained model is given, while the actual used one is given first in Sect. 7.2 (i.e. as part of the discussion, with no reference to it in Sect. 5.2). Although the authors selected to include a completely new acronym (SR) in the title, its meaning is not explained.
Concerning the network and training approach used, the authors should offer technical clarifications. For example, it is unclear how the network trained with SR loss predicts in the untransformed and transformed space as opposed to the networks trained with Focal or Dice losses alone: does it have two network heads or does it use only one head? If the latter is true, is any information given to the network when predicting in Hough space or untransformed? That is, what is fed to the network, and how does it work at test time? From the manuscript, the understanding is that the network takes the brightness temperature difference as input alone. In addition, the authors mention that the networks are pre-trained with ImageNet, that is, RGB images (3 channels), and present more RGB images in Fig. 14. Does the network also support RGB channels, which are not captured with the near-IR training data, or has any pre-processing applied to the RGB images?
Some more specific comments are given below, with no ambition to be complete and language issues ignored completely.
The description of the libraries used could be moved into an appendix, and the descriptions of deep learning tools, such as residual blocks or data augmentation, could be more concise or use fewer figures. The authors should instead emphasise what exact network they used, for example, if their network matches the original U-Net with residual connections, and if yes, how these residual connections are built in the network; it is difficult to judge from Figs. 4 and 5.

Sect. 1.2, "the details of the models are not made available by the paper". It is unclear what the authors want to convey with this statement.

Sect. 1.3, "no adequate loss functions optimized for linear features" is not convincing enough. The deep learning literature is broad, and likely similar approaches have been developed for detecting linear features. This statement could benefit of showing that such research has been done; a quick search with "hough transform" for recent years reveals many papers making use of this transform and deep learning.

Sect. 3.4, It is suggested to introduce the notation first and then focus on the loss functions. Is the meaning of p in eq. (3) the same as in eq. (4), explained below eq. (4)? What values can g in eq. (4) take, \pm 1 or 0 and 1?

Sect. 6.1, "corrupted data occurs in the image (bottom right)". It is not explained how this is handled in the training nor if the corrupted pixels are inherent in the data; with the small dataset (30 images), this can be inspected manually.

Sect. 6.2, the explanation of the training is incomplete: it is only given the number of steps (which is unclear what is meant with 'steps'), but no other details, such as learning rates or if it were employed, for example, any optimizers or mini-batches for the gradient updates.

Sect. 7.1, it is not explained how was the training-test split done.

Sect. 7.3, "However, we can observe that the model trained with 4000 steps using SR Loss is already better than the other two models trained with 8000 steps". I understand this statement as that SR Loss is more effective, but the SR Loss is using two spaces for the optimization problem (is that, then, 2 spaces x 4000 steps = 8000 effective steps?),; I think this statement should be reviewed.

Figure 1 would require a colour bar describing the BTD values.

Figure 12 caption states that the models are trained with 8000 steps, but the last column indicates 4000 steps.
Citation: https://doi.org/10.5194/egusphere-2023-2189-RC2
- AC2: 'Reply on RC2', Junzi Sun, 19 Dec 2023
  
  We would like to thank the reviewer for the in-depth review of our paper. The following replies contain our responses to the major comments from the review. Other minor comments related to language and writing pointed out by the reviewer will be fixed in the revised text.
  1. Response to comment on quantitative results and dataset utilization
  We acknowledge the reviewer's concern regarding the lack of quantitative results and limited dataset usage. In our revised manuscript, we will incorporate a detailed quantitative analysis using the datasets cited (McCloskey et al., 2021; Ng et al., 2023). This will allow us to comprehensively evaluate our model's performance and validate our claims with objective data. We understand the importance of leveraging these datasets to demonstrate clear improvements in classification skill and the effectiveness of our few-shot learning strategy.
  2. Response to comment on the scope clarifications
  We appreciate the feedback on the need for clearer scope delineation. We will revise the manuscript to consistently focus on few-shot learning with the proposed new loss functions for facilitate the contrail detections in segmentation tasks.
  3. Response to comment on dataset quality and data augmentation
  The reviewer’s point about the quality of the labeled dataset and the details of data augmentation is well-taken. In the revised manuscript, we will provide clear information about the expertise of the individuals who labeled the contrails and elaborate on the selection process for the images. Additionally, we will specify the parameters used in our data augmentation techniques, addressing concerns about the validity of contrast and brightness changes in thermal infrared observations. Currently, they are all available in the open source code, but they will be elaborated in the paper as well
  4. Response to comment on language and structure
  The revised manuscript will undergo more proofreading to correct grammatical errors and improve phrasing. We will reorganize the content for better logical flow, ensuring that all sections and paragraphs present information coherently. The meaning of the newly introduced acronym (SR) will be explicitly explained early in the manuscript.
  5. Response to comment on technical clarifications of network and training approach
  Currently a lot of these nuances are in the source code we shared openly together with the paper. In the revised paper, we will also clarify how the network trained with SR loss functions in both untransformed and transformed spaces in text. Additionally, we will conduct more tests that aim to address the use of pre-training with ImageNet and its compatibility with our near-IR training data, including any preprocessing applied to RGB images.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2189-AC2

Report abuse

Please provide a reason why you see this comment as being abusive.
You might include your name and email but you can also stay anonymous.

Please provide a reason why you see this comment as being abusive.

Please confirm reCaptcha.

Comment*

Name:

Email:

Junzi Sun and Esther Roosenbrand

Model code and software

source code Junzi Sun https://github.com/junzis/contrail-net

Junzi Sun and Esther Roosenbrand

Viewed

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 213 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
213	0	0	213	0	0

HTML: 213
PDF: 0
XML: 0
Total: 213
BibTeX: 0
EndNote: 0

Views and downloads (calculated since 17 Oct 2023)

Cumulative views and downloads (calculated since 17 Oct 2023)

Viewed (geographical distribution)

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 209 (including HTML, PDF, and XML) Thereof 209 with geography defined and 0 with unknown origin.

Country	#	Views	%
United States of America	1	79	37
Germany	2	32	15
China	3	20	9
Netherlands	4	20	9
Sweden	5	15	7


Total:	0
HTML:	0
PDF:	0
XML:	0

Latest update: 05 Apr 2025

Junzi Sun

CORRESPONDING AUTHOR

j.sun-1@tudelft.nl

https://orcid.org/0000-0003-3888-1192

Faculty of Aerospace Engineering, Delft University of Technology, Kluyverweg 1, 2629 HS, Delft, the Netherlands

Esther Roosenbrand

Faculty of Aerospace Engineering, Delft University of Technology, Kluyverweg 1, 2629 HS, Delft, the Netherlands

Download

This preprint has been withdrawn.

Preprint

Short summary

Our paper advances contrail segmentation by using augmented transfer learning to overcome data scarcity, a key challenge in environmental monitoring. We introduce a new loss function, significantly boosting contrail detection performance for the proposed neural network model. The model is applicable across diverse imagery, broadens its applicability. Additionally, we release an open-source library that can be beneficial for future research studies.

Our paper advances contrail segmentation by using augmented transfer learning to overcome data...