The Polar Radiant Energy in the Far Infrared Experiment (PREFIRE) principal component-based cloud mask: A simulation experiment

Kahn, Brian; Bertossa, Cameron; Chen, Xiuhong; Drouin, Brian; Hokanson, Erin; Huang, Xianglei; L'Ecuyer, Tristan; Mattingly, Kyle; Merrelli, Aronne; Michaels, Tim; Miller, Nate; Donat, Federico; Maestri, Tiziano; Martinazzo, Michele

doi:10.5194/egusphere-2023-2463

Preprints

https://doi.org/10.5194/egusphere-2023-2463

Preprints

08 Nov 2023

| 08 Nov 2023

The Polar Radiant Energy in the Far Infrared Experiment (PREFIRE) principal component-based cloud mask: A simulation experiment

Brian Kahn, Cameron Bertossa, Xiuhong Chen, Brian Drouin, Erin Hokanson, Xianglei Huang, Tristan L'Ecuyer, Kyle Mattingly, Aronne Merrelli, Tim Michaels, Nate Miller, Federico Donat, Tiziano Maestri, and Michele Martinazzo

Abstract.

We describe a cloud mask simulation experiment developed for the PREFIRE mission. The basis of the cloud mask is a principal component (PC) methodology (PC-MSK) adapted from the algorithm heritage of the upcoming Far-infrared Outgoing Radiation Understanding and Monitoring (FORUM) mission. Simulated clear-sky and cloudy-sky PREFIRE radiances are calculated from the Goddard Earth Observing System (GEOS) meteorological fields and include a variety of complex cloud configurations. The simulation experiment is based on local training that is adjusted along segments of simulated orbits that mimic actual PREFIRE orbits. A numerically stable method of separating clear sky from cloudy sky is achieved using Otsu’s binary classification method and requires no a priori thresholding estimate for multimodal histograms. Comparisons are made against a machine-learning cloud mask (ML-MSK) developed for the PREFIRE mission. The global hit rate of PC-MSK (92.6 %) compares favorably to the hit rate of ML-MSK (95.3 %). The Arctic hit rate of PC-MSK (86.7 %) compares favorably to the hit rate of ML-MSK (89.4 %) and both cloud masks are shown to meet mission requirements for PREFIRE cloud detection. The simulation experiment demonstrates the potential for accurate cloud masking with PREFIRE despite a low number of information-containing PCs compared to those obtained from hyperspectral infrared sounders. We conclude with a discussion about clear-sky and cloudy-sky training sets that are suitable for an operational version of PC-MSK and their development during the post-launch checkout time period.

Received: 24 Oct 2023 – Discussion started: 08 Nov 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2463', Anonymous Referee #1, 31 Jan 2024

The authors present a cloud detection algorithm newly adapted to PREFIRE spectral radiometer measurements. The authors provide detailed discussion of their method and comparison to a preexisting PREFIRE cloud detection algorithm, making for algorithm documentation of a soon-to-be data product for the upcoming satellite mission. However, I found the testing methodology used in the study to be lacking.
The training dataset is a single day of data, where each local scene may only be visited once. It seems to me that the training matrices would then be overfit to the meteorological conditions at the time of training and generalize poorly to other days with other conditions. However, since the authors train and test on the same single day of synthetic observations, they have no way of knowing how PC-MSK might generalize to an out-of-training sample.
This problem of potential overfitting is made worse by the authors' choice to utilize best-out-of-25 performing training matrices. If I understand correctly, the model performance on the testing data is explicitly used as a part the training process (lines 263-269), which seems to me to guarantee overfitting. If the best-performing random draw is chosen, then I am not sure how repeating evaluation on the same data represents an independent check on the algorithm.
While the authors state in the text that the 'brute-force stochastic sampling' algorithm step is problematic, they still treat the PC-MSK performance metrics obtained using this step as comparable to the ML-MSK performance metrics on the same data. It seems to me that PC-MSK performance would be "best-case scenario" performance, while ML-MSK performance would be objective since ML-MSK has not been trained on the GEOS FP-IT data. The authors should ensure separation between training and testing data to improve the evaluation of algorithm performance.
Beyond this, I think the paper could benefit from a big-picture schematic of the overall algorithm flow and more detail on how local training regions are divided. Some general questions with regards to the definition of local training regions: how many segments of 2000 footprints are there? How long does it take to return to the same orbital segment? Do the spatial extents of orbital segments overlap? Are they the same for both TIRS instruments? These question of training regions could benefit from e.g. a map plotting the boundaries of each training region.
Minor comments:

* Line 110: "PCRTM v2.1 is sufficiently accurate with a root-mean square error (RMSE) of 0.67K" -- what is this error defined with respect to? Is this quantity a brightness temperature? What spectral resolution is it defined for?

* Line 121: "and an additional broadband channel that extends from 0 to 53 μm." What is zero wavelength?

* Line 241: since it is a key innovation of the present study, the paper could benefit from a brief presentation of Otsu's method.

* Figure 1: What is alog(cod) and why is it defined even for clear skies if it's a cloud property?

* Figure 5 title: What is "75.2 to 83.2"?

* Figure 10: What are units of the numbers in the confusion matrix? Percent of samples?
Typos:

* Line 116: "PCRTM output at 0.5 cm–1 sampling" -- PCRTM outputs at 0.5 cm-1 sampling

* Line 322: Why is a period in this line colored red?

Citation: https://doi.org/10.5194/egusphere-2023-2463-RC1
- AC1:
  'Reply on RC1', Brian Kahn, 22 Apr 2024
  Thanks to the three reviewers for the insightful and helpful comments. Below we generally describe how we will respond with a revised version of the manuscript.
  The three reviewers had similar comments about three different issues touched on in the paper. Thus, please find below our initial response on these three points. In the response to reviews for the revised version, we will include a more detailed set of responses for each reviewer separately.
  Using the best of 25 random draws to quantify performance with the simulated data set isn’t well described, explained, and justified.
  
  Other aspects of the methodology need further description.
  
  The manuscript would greatly benefit from some type of schematic.
  
  Our initial response to (1). We wholeheartedly agree that the approach needs more clarity and justification. After consideration of the reviewers’ comments, we believe that the wide range in the performance of PC-MSK among the 25 random draws in the high latitudes is a result of underfitting rather than overfitting. The PC-MSK training matrices of either clear-sky or cloudy-sky, or perhaps both, may not represent the true local scene variance that occurs within orbital segments of 2000 TIRS footprints. Thus, the clear and cloudy sky training matrices are not sufficiently realistic for some orbital segments and cannot capture all patterns in the spectra. It is natural to expect that some of the 25 random draws will perform better than the other random draws. The spread in the performance gets larger with increasing latitude away from the tropics as the MIR/FIR spectrum increases in complexity.
  The information contained in the simulated PREFIRE radiances is relatively low, nor is it remotely comparable to other PCA-based retrieval algorithms that leverage hyperspectral radiance spectra (e.g., AERI, IASI, and AIRS). The number of information-bearing PCs in hyperspectral radiance data sets range between 100-300, while the TIRS radiances in the simulated PREFIRE experiment range between 3-7 according to our calculations in Section 3.2. (Note: these estimates could be adjusted based on the availability of observed TIRS radiances after launch.) We examined PC-MSK to see if noise was captured in the higher order PCs, and they are not. Therefore, we do not believe we are encountering an overfitting problem.
  This may not matter as much in the Tropics – where the PC-MSK performance is very high across the 25 random draws. We believe this is because clear-sky radiance variability is primarily from water vapor variance in very moist atmospheres rather than from temperature variance. In the middle and high latitudes, temperature is highly variable in the horizontal and vertical directions, with frequent inversions over land surfaces. Furthermore, the clear-sky radiance Jacobians in the FIR channels are larger when water vapor amounts are low, exhibiting a much wider range of radiance variability in the FIR. Additionally, cloud height, cloud overlap, microphysical, and thermodynamic phase variances and covariances are more complex, with frequently occurring clouds located below bases of inversions that are difficult to detect. In the Northern Hemisphere extratropics, there is a large diversity of surface types and topography that add degrees of difficulty to cloud detection. Cloud state sampling biases are also important in high latitudes – for instance water vapor can be higher and temperature can be lower in cloud fields compared to clear skies – thus separation of clear and cloudy sky radiance contributions may be problematic in some clouds, particularly broken or optically thin clouds that have clear and cloudy radiance contributions that are both significant.
  Thus, we may ask– why does the ML-MSK perform so well if it is using the same simulated data sets as PC-MSK? Bertossa et al. (2023) use a neural network (NN) algorithm with hidden nodes that allow the information to be gleaned in a nonlinear way, and the algorithm learns the nonlinear behavior based on a loss function minimization. The PC-MSK by construction is linear and does not have an ability to learn complex behaviors like a NN. To account for the problem of underfitting that is most acute in the high latitudes, the brute force stochastic approach of 25 random draws (5 clear-sky times 5 cloudy-sky) is an attempt to sample more of the local variability and determine if the performance can be improved. We find that, by using the ML-MSK as a “truth” data set, we can improve the statistics considerably in some scenes, particularly in the Arctic region that is the primary mission target of PREFIRE. We agree with reviewer #1 that the best performing of the 25 PC-MSK random draws can be thought of as a “best case scenario”.
  Our initial response to (2). It is true that Otsu’s method is unsupervised. This method is used only with respect to the similarity index difference (SID) histograms, for instance figure 6. All Otsu’s method does is determine the breakpoint between two dominant peaks in the SID histograms (clear vs. cloud). To make the histograms of SID, we need clear-sky and cloudy-sky training matrices that use radiances drawn from the orbital segments of 2000 TIRS footprints. These matrices are contained in Eqns. (1) and (2) for the calculation of clear and cloudy sky SI. We will improve the clarity of our discussion of each step in the algorithm.
  We also agree with reviewer #3 that we could include information on the spread of the performance of PC-MSK for the 25 random draws. We will add some performance metrics for this. Furthermore, we plan on using VIIRS and CrIS/ATMS cloud retrievals from the JPSS satellites that are matched in space to the PREFIRE orbits to do a more careful independent validation of PC-MSK and ML-MSK. We also plan on using these coincident observations to test out a more deliberate training data set (see Section 5). Since both the PC-MSK and the ML-MSK are based on simulated data sets, these activities will not proceed until after launch.
  Our initial response to (3). It is a great idea to include a schematic of some type that shows how the PC-MSK algorithm will work operationally. We furthermore agree that the operational implementation of this algorithm is not clear. As a baseline operational algorithm, our first step was to use one random draw and produce PC-MSK along the orbit. We did not make that clear. But our intention was to make this a starting point for post-launch changes. As TIRS data becomes available, our second step is to understand more deeply the performance of PC-MSK based on the training data it is fed. By running 25 random draws, we can narrow down particular scene types, cloud types, and other characteristics that will help improve or degrade the performance. This will help inform a more deliberate sampling strategy than a random draw. Our third step is that there is no guarantee that ML-MSK will work in all conditions, for all scene types, and may have its own challenges. We will do detailed comparisons between PC-MSK and ML-MSK at the footprint level and determine what is working – and not working – for both masks.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2463-AC1
RC2:
'Comment on egusphere-2023-2463', Anonymous Referee #2, 01 Feb 2024

The authors presented a method intended to be trained by use of synthetic data to classify clouds from clear sky scenes using TIR channels, which is a very challenging task. This could be a major breakthrough in terms of methodology with many benefits in pre-launch and post launch algorithm evaluation. However, I found the lacking of good explanation of the method renders the conclusion questionable.
1. I have some trouble to understand the use of synthetic dataset in this algorithm. To my knowledge Otsu's method is an unsupervised learning method which requires no training and it has been applied here as the separation method of clear and cloud scenes. So why would we need a training dataset then? Also I have trouble understanding the "best-of-25" rule for selecting the best PC-MSK result against the COD-MSK "truth", which is not available in operational use. In all, I think a big picture is missing from the description of the algorithm which makes it very hard to understand what is actually being done and more importantly, how can other people make use of the methodology.
2. It is hard to understand the scope of this algorithm from this paper. The use of training and validation dataset from the same 2000 footprints near Greenland gives me impression that this algorithm is trained and applied regionally in the same area while Section 4.2 is about the global application of this algorithm, which indicates it is supposed to be applied globally. I believe a flowchart or schematic chart would greatly help to explain the scope of the algorithm, how it was trained and applied globally and locally, what are the training and validation dataset, how to avoid overfitting etc.
Minor point:
It would be great to go into more details on the physics of cloud/clear sky simulations and its relation to instrument channel selection. For example, in Fig. 4 it is almost impossible to find a difference between a clear sky simulation and a cloud sky simulation of the same scene. I believe this is not hard to do from a model perspective since you only need to run the model twice with and without cloud and a better presentation showing relative difference between a cloudy and clear sky case can greatly help people understand where the sensitivity might come from. On the physics side, how are the 23 channels from 54 being chosen from a TIRS simulation? I believe a plot similar to Fig. 3 of M19 with shaded area marking the location of these 23 channels would be beneficial.

Citation: https://doi.org/10.5194/egusphere-2023-2463-RC2
- AC2:
  'Reply on RC2', Brian Kahn, 22 Apr 2024
  Thanks to the three reviewers for the insightful and helpful comments. Below we generally describe how we will respond with a revised version of the manuscript.
  The three reviewers had similar comments about three different issues touched on in the paper. Thus, please find below our initial response on these three points. In the response to reviews for the revised version, we will include a more detailed set of responses for each reviewer separately.
  Using the best of 25 random draws to quantify performance with the simulated data set isn’t well described, explained, and justified.
  
  Other aspects of the methodology need further description.
  
  The manuscript would greatly benefit from some type of schematic.
  
  Our initial response to (1). We wholeheartedly agree that the approach needs more clarity and justification. After consideration of the reviewers’ comments, we believe that the wide range in the performance of PC-MSK among the 25 random draws in the high latitudes is a result of underfitting rather than overfitting. The PC-MSK training matrices of either clear-sky or cloudy-sky, or perhaps both, may not represent the true local scene variance that occurs within orbital segments of 2000 TIRS footprints. Thus, the clear and cloudy sky training matrices are not sufficiently realistic for some orbital segments and cannot capture all patterns in the spectra. It is natural to expect that some of the 25 random draws will perform better than the other random draws. The spread in the performance gets larger with increasing latitude away from the tropics as the MIR/FIR spectrum increases in complexity.
  The information contained in the simulated PREFIRE radiances is relatively low, nor is it remotely comparable to other PCA-based retrieval algorithms that leverage hyperspectral radiance spectra (e.g., AERI, IASI, and AIRS). The number of information-bearing PCs in hyperspectral radiance data sets range between 100-300, while the TIRS radiances in the simulated PREFIRE experiment range between 3-7 according to our calculations in Section 3.2. (Note: these estimates could be adjusted based on the availability of observed TIRS radiances after launch.) We examined PC-MSK to see if noise was captured in the higher order PCs, and they are not. Therefore, we do not believe we are encountering an overfitting problem.
  This may not matter as much in the Tropics – where the PC-MSK performance is very high across the 25 random draws. We believe this is because clear-sky radiance variability is primarily from water vapor variance in very moist atmospheres rather than from temperature variance. In the middle and high latitudes, temperature is highly variable in the horizontal and vertical directions, with frequent inversions over land surfaces. Furthermore, the clear-sky radiance Jacobians in the FIR channels are larger when water vapor amounts are low, exhibiting a much wider range of radiance variability in the FIR. Additionally, cloud height, cloud overlap, microphysical, and thermodynamic phase variances and covariances are more complex, with frequently occurring clouds located below bases of inversions that are difficult to detect. In the Northern Hemisphere extratropics, there is a large diversity of surface types and topography that add degrees of difficulty to cloud detection. Cloud state sampling biases are also important in high latitudes – for instance water vapor can be higher and temperature can be lower in cloud fields compared to clear skies – thus separation of clear and cloudy sky radiance contributions may be problematic in some clouds, particularly broken or optically thin clouds that have clear and cloudy radiance contributions that are both significant.
  Thus, we may ask– why does the ML-MSK perform so well if it is using the same simulated data sets as PC-MSK? Bertossa et al. (2023) use a neural network (NN) algorithm with hidden nodes that allow the information to be gleaned in a nonlinear way, and the algorithm learns the nonlinear behavior based on a loss function minimization. The PC-MSK by construction is linear and does not have an ability to learn complex behaviors like a NN. To account for the problem of underfitting that is most acute in the high latitudes, the brute force stochastic approach of 25 random draws (5 clear-sky times 5 cloudy-sky) is an attempt to sample more of the local variability and determine if the performance can be improved. We find that, by using the ML-MSK as a “truth” data set, we can improve the statistics considerably in some scenes, particularly in the Arctic region that is the primary mission target of PREFIRE. We agree with reviewer #1 that the best performing of the 25 PC-MSK random draws can be thought of as a “best case scenario”.
  Our initial response to (2). It is true that Otsu’s method is unsupervised. This method is used only with respect to the similarity index difference (SID) histograms, for instance figure 6. All Otsu’s method does is determine the breakpoint between two dominant peaks in the SID histograms (clear vs. cloud). To make the histograms of SID, we need clear-sky and cloudy-sky training matrices that use radiances drawn from the orbital segments of 2000 TIRS footprints. These matrices are contained in Eqns. (1) and (2) for the calculation of clear and cloudy sky SI. We will improve the clarity of our discussion of each step in the algorithm.
  We also agree with reviewer #3 that we could include information on the spread of the performance of PC-MSK for the 25 random draws. We will add some performance metrics for this. Furthermore, we plan on using VIIRS and CrIS/ATMS cloud retrievals from the JPSS satellites that are matched in space to the PREFIRE orbits to do a more careful independent validation of PC-MSK and ML-MSK. We also plan on using these coincident observations to test out a more deliberate training data set (see Section 5). Since both the PC-MSK and the ML-MSK are based on simulated data sets, these activities will not proceed until after launch.
  Our initial response to (3). It is a great idea to include a schematic of some type that shows how the PC-MSK algorithm will work operationally. We furthermore agree that the operational implementation of this algorithm is not clear. As a baseline operational algorithm, our first step was to use one random draw and produce PC-MSK along the orbit. We did not make that clear. But our intention was to make this a starting point for post-launch changes. As TIRS data becomes available, our second step is to understand more deeply the performance of PC-MSK based on the training data it is fed. By running 25 random draws, we can narrow down particular scene types, cloud types, and other characteristics that will help improve or degrade the performance. This will help inform a more deliberate sampling strategy than a random draw. Our third step is that there is no guarantee that ML-MSK will work in all conditions, for all scene types, and may have its own challenges. We will do detailed comparisons between PC-MSK and ML-MSK at the footprint level and determine what is working – and not working – for both masks.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2463-AC2
RC3:
'Comment on egusphere-2023-2463', Anonymous Referee #3, 07 Feb 2024

Review of Kahn et al., The Polar Radiant Energy in the Far Infrared Experiment (PREFIRE) principal component-based cloud mask: A simulation experiment
The paper provides a good overview of the PCA based cloud masking approach which is one of the methods planned for both the PREFIRE and FORUM missions. The biggest concern with the methodology presented in the paper is that it may not be representative of the planned operational implementation. What is presented in the paper are results using the “best of the 25 combinations” which the authors note is an “obvious problem” (line 271). While it is reasonable to present a paper for which “The primary purpose of this pre-launch algorithm is to describe a flexible tool capable of using any type of training set. (line 273).” the “brute force approach” used here means that the numerical results are unlikely to be indicative of the operational performance.
In order for the paper to be useful for the interested reader, the authors need to motivate why the results presented might be representative of, or informative regarding, the operational implementation, for example by providing information on the range of results obtained with the 25 combinations. In addition the discussions of planned operational implementation should be in a single place: i.e. the multiple training data set options for the operational implementation (lines 269-275 in a section titled “Pre-launch algorithm”) and section 5 should be integrated and expanded and as the other reviewers have noted a “big-picture schematic of the overall algorithm flow and more detail” should be provided. In particular the authors claim that “All training methods will be compared and their quantitative performance will be assessed by surface and scene type during post-launch check-out before the data release, with the focus on the mission requirement for clear-sky detection. (lines 349-351).” This statement comes immediately after they have stated that training spectra will come from simulated and observed TIRS spectra and listed methods (i)-(iic) or a hybrid combination of them for constructing such sets and immediately before a set of three potential testing approaches are listed. A much clearer statement of how performance is tested, where and why as part of the “big-picture” would make the paper much more readable.
Finally, there are operational datasets available in the 8.5-14 µm spectral region that is where the major clear and cloudy sky variability is (e.g. Fig. 4). Such operational datasets could have been used to evaluate what level of additional “real world” variability there may be as compared to that which is observed in the synthetic data sets.
Minor comments:
Figure 4 – x-axis is labelled as Wavenumber (microns) which should be Wavelength (as in Fig. 5)
T_i is used to label the number of spectra. This should be changed since this variable is then part of an equation (4) where a summation is indexed over I, but T_i does not depend on the index.

A sensor zenith angle of zero degrees is used, but a swath of 8 cross-track footprints are shown. Does TIRS have a cross track swath and if so, what errors are introduced by assuming a zero degree sensor zenith?

Citation: https://doi.org/10.5194/egusphere-2023-2463-RC3
- AC3:
  'Reply on RC3', Brian Kahn, 22 Apr 2024
  Thanks to the three reviewers for the insightful and helpful comments. Below we generally describe how we will respond with a revised version of the manuscript.
  The three reviewers had similar comments about three different issues touched on in the paper. Thus, please find below our initial response on these three points. In the response to reviews for the revised version, we will include a more detailed set of responses for each reviewer separately.
  Using the best of 25 random draws to quantify performance with the simulated data set isn’t well described, explained, and justified.
  
  Other aspects of the methodology need further description.
  
  The manuscript would greatly benefit from some type of schematic.
  
  Our initial response to (1). We wholeheartedly agree that the approach needs more clarity and justification. After consideration of the reviewers’ comments, we believe that the wide range in the performance of PC-MSK among the 25 random draws in the high latitudes is a result of underfitting rather than overfitting. The PC-MSK training matrices of either clear-sky or cloudy-sky, or perhaps both, may not represent the true local scene variance that occurs within orbital segments of 2000 TIRS footprints. Thus, the clear and cloudy sky training matrices are not sufficiently realistic for some orbital segments and cannot capture all patterns in the spectra. It is natural to expect that some of the 25 random draws will perform better than the other random draws. The spread in the performance gets larger with increasing latitude away from the tropics as the MIR/FIR spectrum increases in complexity.
  The information contained in the simulated PREFIRE radiances is relatively low, nor is it remotely comparable to other PCA-based retrieval algorithms that leverage hyperspectral radiance spectra (e.g., AERI, IASI, and AIRS). The number of information-bearing PCs in hyperspectral radiance data sets range between 100-300, while the TIRS radiances in the simulated PREFIRE experiment range between 3-7 according to our calculations in Section 3.2. (Note: these estimates could be adjusted based on the availability of observed TIRS radiances after launch.) We examined PC-MSK to see if noise was captured in the higher order PCs, and they are not. Therefore, we do not believe we are encountering an overfitting problem.
  This may not matter as much in the Tropics – where the PC-MSK performance is very high across the 25 random draws. We believe this is because clear-sky radiance variability is primarily from water vapor variance in very moist atmospheres rather than from temperature variance. In the middle and high latitudes, temperature is highly variable in the horizontal and vertical directions, with frequent inversions over land surfaces. Furthermore, the clear-sky radiance Jacobians in the FIR channels are larger when water vapor amounts are low, exhibiting a much wider range of radiance variability in the FIR. Additionally, cloud height, cloud overlap, microphysical, and thermodynamic phase variances and covariances are more complex, with frequently occurring clouds located below bases of inversions that are difficult to detect. In the Northern Hemisphere extratropics, there is a large diversity of surface types and topography that add degrees of difficulty to cloud detection. Cloud state sampling biases are also important in high latitudes – for instance water vapor can be higher and temperature can be lower in cloud fields compared to clear skies – thus separation of clear and cloudy sky radiance contributions may be problematic in some clouds, particularly broken or optically thin clouds that have clear and cloudy radiance contributions that are both significant.
  Thus, we may ask– why does the ML-MSK perform so well if it is using the same simulated data sets as PC-MSK? Bertossa et al. (2023) use a neural network (NN) algorithm with hidden nodes that allow the information to be gleaned in a nonlinear way, and the algorithm learns the nonlinear behavior based on a loss function minimization. The PC-MSK by construction is linear and does not have an ability to learn complex behaviors like a NN. To account for the problem of underfitting that is most acute in the high latitudes, the brute force stochastic approach of 25 random draws (5 clear-sky times 5 cloudy-sky) is an attempt to sample more of the local variability and determine if the performance can be improved. We find that, by using the ML-MSK as a “truth” data set, we can improve the statistics considerably in some scenes, particularly in the Arctic region that is the primary mission target of PREFIRE. We agree with reviewer #1 that the best performing of the 25 PC-MSK random draws can be thought of as a “best case scenario”.
  Our initial response to (2). It is true that Otsu’s method is unsupervised. This method is used only with respect to the similarity index difference (SID) histograms, for instance figure 6. All Otsu’s method does is determine the breakpoint between two dominant peaks in the SID histograms (clear vs. cloud). To make the histograms of SID, we need clear-sky and cloudy-sky training matrices that use radiances drawn from the orbital segments of 2000 TIRS footprints. These matrices are contained in Eqns. (1) and (2) for the calculation of clear and cloudy sky SI. We will improve the clarity of our discussion of each step in the algorithm.
  We also agree with reviewer #3 that we could include information on the spread of the performance of PC-MSK for the 25 random draws. We will add some performance metrics for this. Furthermore, we plan on using VIIRS and CrIS/ATMS cloud retrievals from the JPSS satellites that are matched in space to the PREFIRE orbits to do a more careful independent validation of PC-MSK and ML-MSK. We also plan on using these coincident observations to test out a more deliberate training data set (see Section 5). Since both the PC-MSK and the ML-MSK are based on simulated data sets, these activities will not proceed until after launch.
  Our initial response to (3). It is a great idea to include a schematic of some type that shows how the PC-MSK algorithm will work operationally. We furthermore agree that the operational implementation of this algorithm is not clear. As a baseline operational algorithm, our first step was to use one random draw and produce PC-MSK along the orbit. We did not make that clear. But our intention was to make this a starting point for post-launch changes. As TIRS data becomes available, our second step is to understand more deeply the performance of PC-MSK based on the training data it is fed. By running 25 random draws, we can narrow down particular scene types, cloud types, and other characteristics that will help improve or degrade the performance. This will help inform a more deliberate sampling strategy than a random draw. Our third step is that there is no guarantee that ML-MSK will work in all conditions, for all scene types, and may have its own challenges. We will do detailed comparisons between PC-MSK and ML-MSK at the footprint level and determine what is working – and not working – for both masks.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2463-AC3

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2463', Anonymous Referee #1, 31 Jan 2024

The authors present a cloud detection algorithm newly adapted to PREFIRE spectral radiometer measurements. The authors provide detailed discussion of their method and comparison to a preexisting PREFIRE cloud detection algorithm, making for algorithm documentation of a soon-to-be data product for the upcoming satellite mission. However, I found the testing methodology used in the study to be lacking.
The training dataset is a single day of data, where each local scene may only be visited once. It seems to me that the training matrices would then be overfit to the meteorological conditions at the time of training and generalize poorly to other days with other conditions. However, since the authors train and test on the same single day of synthetic observations, they have no way of knowing how PC-MSK might generalize to an out-of-training sample.
This problem of potential overfitting is made worse by the authors' choice to utilize best-out-of-25 performing training matrices. If I understand correctly, the model performance on the testing data is explicitly used as a part the training process (lines 263-269), which seems to me to guarantee overfitting. If the best-performing random draw is chosen, then I am not sure how repeating evaluation on the same data represents an independent check on the algorithm.
While the authors state in the text that the 'brute-force stochastic sampling' algorithm step is problematic, they still treat the PC-MSK performance metrics obtained using this step as comparable to the ML-MSK performance metrics on the same data. It seems to me that PC-MSK performance would be "best-case scenario" performance, while ML-MSK performance would be objective since ML-MSK has not been trained on the GEOS FP-IT data. The authors should ensure separation between training and testing data to improve the evaluation of algorithm performance.
Beyond this, I think the paper could benefit from a big-picture schematic of the overall algorithm flow and more detail on how local training regions are divided. Some general questions with regards to the definition of local training regions: how many segments of 2000 footprints are there? How long does it take to return to the same orbital segment? Do the spatial extents of orbital segments overlap? Are they the same for both TIRS instruments? These question of training regions could benefit from e.g. a map plotting the boundaries of each training region.
Minor comments:

* Line 110: "PCRTM v2.1 is sufficiently accurate with a root-mean square error (RMSE) of 0.67K" -- what is this error defined with respect to? Is this quantity a brightness temperature? What spectral resolution is it defined for?

* Line 121: "and an additional broadband channel that extends from 0 to 53 μm." What is zero wavelength?

* Line 241: since it is a key innovation of the present study, the paper could benefit from a brief presentation of Otsu's method.

* Figure 1: What is alog(cod) and why is it defined even for clear skies if it's a cloud property?

* Figure 5 title: What is "75.2 to 83.2"?

* Figure 10: What are units of the numbers in the confusion matrix? Percent of samples?
Typos:

* Line 116: "PCRTM output at 0.5 cm–1 sampling" -- PCRTM outputs at 0.5 cm-1 sampling

* Line 322: Why is a period in this line colored red?

Citation: https://doi.org/10.5194/egusphere-2023-2463-RC1
- AC1:
  'Reply on RC1', Brian Kahn, 22 Apr 2024
  Thanks to the three reviewers for the insightful and helpful comments. Below we generally describe how we will respond with a revised version of the manuscript.
  The three reviewers had similar comments about three different issues touched on in the paper. Thus, please find below our initial response on these three points. In the response to reviews for the revised version, we will include a more detailed set of responses for each reviewer separately.
  Using the best of 25 random draws to quantify performance with the simulated data set isn’t well described, explained, and justified.
  
  Other aspects of the methodology need further description.
  
  The manuscript would greatly benefit from some type of schematic.
  
  Our initial response to (1). We wholeheartedly agree that the approach needs more clarity and justification. After consideration of the reviewers’ comments, we believe that the wide range in the performance of PC-MSK among the 25 random draws in the high latitudes is a result of underfitting rather than overfitting. The PC-MSK training matrices of either clear-sky or cloudy-sky, or perhaps both, may not represent the true local scene variance that occurs within orbital segments of 2000 TIRS footprints. Thus, the clear and cloudy sky training matrices are not sufficiently realistic for some orbital segments and cannot capture all patterns in the spectra. It is natural to expect that some of the 25 random draws will perform better than the other random draws. The spread in the performance gets larger with increasing latitude away from the tropics as the MIR/FIR spectrum increases in complexity.
  The information contained in the simulated PREFIRE radiances is relatively low, nor is it remotely comparable to other PCA-based retrieval algorithms that leverage hyperspectral radiance spectra (e.g., AERI, IASI, and AIRS). The number of information-bearing PCs in hyperspectral radiance data sets range between 100-300, while the TIRS radiances in the simulated PREFIRE experiment range between 3-7 according to our calculations in Section 3.2. (Note: these estimates could be adjusted based on the availability of observed TIRS radiances after launch.) We examined PC-MSK to see if noise was captured in the higher order PCs, and they are not. Therefore, we do not believe we are encountering an overfitting problem.
  This may not matter as much in the Tropics – where the PC-MSK performance is very high across the 25 random draws. We believe this is because clear-sky radiance variability is primarily from water vapor variance in very moist atmospheres rather than from temperature variance. In the middle and high latitudes, temperature is highly variable in the horizontal and vertical directions, with frequent inversions over land surfaces. Furthermore, the clear-sky radiance Jacobians in the FIR channels are larger when water vapor amounts are low, exhibiting a much wider range of radiance variability in the FIR. Additionally, cloud height, cloud overlap, microphysical, and thermodynamic phase variances and covariances are more complex, with frequently occurring clouds located below bases of inversions that are difficult to detect. In the Northern Hemisphere extratropics, there is a large diversity of surface types and topography that add degrees of difficulty to cloud detection. Cloud state sampling biases are also important in high latitudes – for instance water vapor can be higher and temperature can be lower in cloud fields compared to clear skies – thus separation of clear and cloudy sky radiance contributions may be problematic in some clouds, particularly broken or optically thin clouds that have clear and cloudy radiance contributions that are both significant.
  Thus, we may ask– why does the ML-MSK perform so well if it is using the same simulated data sets as PC-MSK? Bertossa et al. (2023) use a neural network (NN) algorithm with hidden nodes that allow the information to be gleaned in a nonlinear way, and the algorithm learns the nonlinear behavior based on a loss function minimization. The PC-MSK by construction is linear and does not have an ability to learn complex behaviors like a NN. To account for the problem of underfitting that is most acute in the high latitudes, the brute force stochastic approach of 25 random draws (5 clear-sky times 5 cloudy-sky) is an attempt to sample more of the local variability and determine if the performance can be improved. We find that, by using the ML-MSK as a “truth” data set, we can improve the statistics considerably in some scenes, particularly in the Arctic region that is the primary mission target of PREFIRE. We agree with reviewer #1 that the best performing of the 25 PC-MSK random draws can be thought of as a “best case scenario”.
  Our initial response to (2). It is true that Otsu’s method is unsupervised. This method is used only with respect to the similarity index difference (SID) histograms, for instance figure 6. All Otsu’s method does is determine the breakpoint between two dominant peaks in the SID histograms (clear vs. cloud). To make the histograms of SID, we need clear-sky and cloudy-sky training matrices that use radiances drawn from the orbital segments of 2000 TIRS footprints. These matrices are contained in Eqns. (1) and (2) for the calculation of clear and cloudy sky SI. We will improve the clarity of our discussion of each step in the algorithm.
  We also agree with reviewer #3 that we could include information on the spread of the performance of PC-MSK for the 25 random draws. We will add some performance metrics for this. Furthermore, we plan on using VIIRS and CrIS/ATMS cloud retrievals from the JPSS satellites that are matched in space to the PREFIRE orbits to do a more careful independent validation of PC-MSK and ML-MSK. We also plan on using these coincident observations to test out a more deliberate training data set (see Section 5). Since both the PC-MSK and the ML-MSK are based on simulated data sets, these activities will not proceed until after launch.
  Our initial response to (3). It is a great idea to include a schematic of some type that shows how the PC-MSK algorithm will work operationally. We furthermore agree that the operational implementation of this algorithm is not clear. As a baseline operational algorithm, our first step was to use one random draw and produce PC-MSK along the orbit. We did not make that clear. But our intention was to make this a starting point for post-launch changes. As TIRS data becomes available, our second step is to understand more deeply the performance of PC-MSK based on the training data it is fed. By running 25 random draws, we can narrow down particular scene types, cloud types, and other characteristics that will help improve or degrade the performance. This will help inform a more deliberate sampling strategy than a random draw. Our third step is that there is no guarantee that ML-MSK will work in all conditions, for all scene types, and may have its own challenges. We will do detailed comparisons between PC-MSK and ML-MSK at the footprint level and determine what is working – and not working – for both masks.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2463-AC1
RC2:
'Comment on egusphere-2023-2463', Anonymous Referee #2, 01 Feb 2024

The authors presented a method intended to be trained by use of synthetic data to classify clouds from clear sky scenes using TIR channels, which is a very challenging task. This could be a major breakthrough in terms of methodology with many benefits in pre-launch and post launch algorithm evaluation. However, I found the lacking of good explanation of the method renders the conclusion questionable.
1. I have some trouble to understand the use of synthetic dataset in this algorithm. To my knowledge Otsu's method is an unsupervised learning method which requires no training and it has been applied here as the separation method of clear and cloud scenes. So why would we need a training dataset then? Also I have trouble understanding the "best-of-25" rule for selecting the best PC-MSK result against the COD-MSK "truth", which is not available in operational use. In all, I think a big picture is missing from the description of the algorithm which makes it very hard to understand what is actually being done and more importantly, how can other people make use of the methodology.
2. It is hard to understand the scope of this algorithm from this paper. The use of training and validation dataset from the same 2000 footprints near Greenland gives me impression that this algorithm is trained and applied regionally in the same area while Section 4.2 is about the global application of this algorithm, which indicates it is supposed to be applied globally. I believe a flowchart or schematic chart would greatly help to explain the scope of the algorithm, how it was trained and applied globally and locally, what are the training and validation dataset, how to avoid overfitting etc.
Minor point:
It would be great to go into more details on the physics of cloud/clear sky simulations and its relation to instrument channel selection. For example, in Fig. 4 it is almost impossible to find a difference between a clear sky simulation and a cloud sky simulation of the same scene. I believe this is not hard to do from a model perspective since you only need to run the model twice with and without cloud and a better presentation showing relative difference between a cloudy and clear sky case can greatly help people understand where the sensitivity might come from. On the physics side, how are the 23 channels from 54 being chosen from a TIRS simulation? I believe a plot similar to Fig. 3 of M19 with shaded area marking the location of these 23 channels would be beneficial.

Citation: https://doi.org/10.5194/egusphere-2023-2463-RC2
- AC2:
  'Reply on RC2', Brian Kahn, 22 Apr 2024
  Thanks to the three reviewers for the insightful and helpful comments. Below we generally describe how we will respond with a revised version of the manuscript.
  The three reviewers had similar comments about three different issues touched on in the paper. Thus, please find below our initial response on these three points. In the response to reviews for the revised version, we will include a more detailed set of responses for each reviewer separately.
  Using the best of 25 random draws to quantify performance with the simulated data set isn’t well described, explained, and justified.
  
  Other aspects of the methodology need further description.
  
  The manuscript would greatly benefit from some type of schematic.
  
  Our initial response to (1). We wholeheartedly agree that the approach needs more clarity and justification. After consideration of the reviewers’ comments, we believe that the wide range in the performance of PC-MSK among the 25 random draws in the high latitudes is a result of underfitting rather than overfitting. The PC-MSK training matrices of either clear-sky or cloudy-sky, or perhaps both, may not represent the true local scene variance that occurs within orbital segments of 2000 TIRS footprints. Thus, the clear and cloudy sky training matrices are not sufficiently realistic for some orbital segments and cannot capture all patterns in the spectra. It is natural to expect that some of the 25 random draws will perform better than the other random draws. The spread in the performance gets larger with increasing latitude away from the tropics as the MIR/FIR spectrum increases in complexity.
  The information contained in the simulated PREFIRE radiances is relatively low, nor is it remotely comparable to other PCA-based retrieval algorithms that leverage hyperspectral radiance spectra (e.g., AERI, IASI, and AIRS). The number of information-bearing PCs in hyperspectral radiance data sets range between 100-300, while the TIRS radiances in the simulated PREFIRE experiment range between 3-7 according to our calculations in Section 3.2. (Note: these estimates could be adjusted based on the availability of observed TIRS radiances after launch.) We examined PC-MSK to see if noise was captured in the higher order PCs, and they are not. Therefore, we do not believe we are encountering an overfitting problem.
  This may not matter as much in the Tropics – where the PC-MSK performance is very high across the 25 random draws. We believe this is because clear-sky radiance variability is primarily from water vapor variance in very moist atmospheres rather than from temperature variance. In the middle and high latitudes, temperature is highly variable in the horizontal and vertical directions, with frequent inversions over land surfaces. Furthermore, the clear-sky radiance Jacobians in the FIR channels are larger when water vapor amounts are low, exhibiting a much wider range of radiance variability in the FIR. Additionally, cloud height, cloud overlap, microphysical, and thermodynamic phase variances and covariances are more complex, with frequently occurring clouds located below bases of inversions that are difficult to detect. In the Northern Hemisphere extratropics, there is a large diversity of surface types and topography that add degrees of difficulty to cloud detection. Cloud state sampling biases are also important in high latitudes – for instance water vapor can be higher and temperature can be lower in cloud fields compared to clear skies – thus separation of clear and cloudy sky radiance contributions may be problematic in some clouds, particularly broken or optically thin clouds that have clear and cloudy radiance contributions that are both significant.
  Thus, we may ask– why does the ML-MSK perform so well if it is using the same simulated data sets as PC-MSK? Bertossa et al. (2023) use a neural network (NN) algorithm with hidden nodes that allow the information to be gleaned in a nonlinear way, and the algorithm learns the nonlinear behavior based on a loss function minimization. The PC-MSK by construction is linear and does not have an ability to learn complex behaviors like a NN. To account for the problem of underfitting that is most acute in the high latitudes, the brute force stochastic approach of 25 random draws (5 clear-sky times 5 cloudy-sky) is an attempt to sample more of the local variability and determine if the performance can be improved. We find that, by using the ML-MSK as a “truth” data set, we can improve the statistics considerably in some scenes, particularly in the Arctic region that is the primary mission target of PREFIRE. We agree with reviewer #1 that the best performing of the 25 PC-MSK random draws can be thought of as a “best case scenario”.
  Our initial response to (2). It is true that Otsu’s method is unsupervised. This method is used only with respect to the similarity index difference (SID) histograms, for instance figure 6. All Otsu’s method does is determine the breakpoint between two dominant peaks in the SID histograms (clear vs. cloud). To make the histograms of SID, we need clear-sky and cloudy-sky training matrices that use radiances drawn from the orbital segments of 2000 TIRS footprints. These matrices are contained in Eqns. (1) and (2) for the calculation of clear and cloudy sky SI. We will improve the clarity of our discussion of each step in the algorithm.
  We also agree with reviewer #3 that we could include information on the spread of the performance of PC-MSK for the 25 random draws. We will add some performance metrics for this. Furthermore, we plan on using VIIRS and CrIS/ATMS cloud retrievals from the JPSS satellites that are matched in space to the PREFIRE orbits to do a more careful independent validation of PC-MSK and ML-MSK. We also plan on using these coincident observations to test out a more deliberate training data set (see Section 5). Since both the PC-MSK and the ML-MSK are based on simulated data sets, these activities will not proceed until after launch.
  Our initial response to (3). It is a great idea to include a schematic of some type that shows how the PC-MSK algorithm will work operationally. We furthermore agree that the operational implementation of this algorithm is not clear. As a baseline operational algorithm, our first step was to use one random draw and produce PC-MSK along the orbit. We did not make that clear. But our intention was to make this a starting point for post-launch changes. As TIRS data becomes available, our second step is to understand more deeply the performance of PC-MSK based on the training data it is fed. By running 25 random draws, we can narrow down particular scene types, cloud types, and other characteristics that will help improve or degrade the performance. This will help inform a more deliberate sampling strategy than a random draw. Our third step is that there is no guarantee that ML-MSK will work in all conditions, for all scene types, and may have its own challenges. We will do detailed comparisons between PC-MSK and ML-MSK at the footprint level and determine what is working – and not working – for both masks.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2463-AC2
RC3:
'Comment on egusphere-2023-2463', Anonymous Referee #3, 07 Feb 2024

Review of Kahn et al., The Polar Radiant Energy in the Far Infrared Experiment (PREFIRE) principal component-based cloud mask: A simulation experiment
The paper provides a good overview of the PCA based cloud masking approach which is one of the methods planned for both the PREFIRE and FORUM missions. The biggest concern with the methodology presented in the paper is that it may not be representative of the planned operational implementation. What is presented in the paper are results using the “best of the 25 combinations” which the authors note is an “obvious problem” (line 271). While it is reasonable to present a paper for which “The primary purpose of this pre-launch algorithm is to describe a flexible tool capable of using any type of training set. (line 273).” the “brute force approach” used here means that the numerical results are unlikely to be indicative of the operational performance.
In order for the paper to be useful for the interested reader, the authors need to motivate why the results presented might be representative of, or informative regarding, the operational implementation, for example by providing information on the range of results obtained with the 25 combinations. In addition the discussions of planned operational implementation should be in a single place: i.e. the multiple training data set options for the operational implementation (lines 269-275 in a section titled “Pre-launch algorithm”) and section 5 should be integrated and expanded and as the other reviewers have noted a “big-picture schematic of the overall algorithm flow and more detail” should be provided. In particular the authors claim that “All training methods will be compared and their quantitative performance will be assessed by surface and scene type during post-launch check-out before the data release, with the focus on the mission requirement for clear-sky detection. (lines 349-351).” This statement comes immediately after they have stated that training spectra will come from simulated and observed TIRS spectra and listed methods (i)-(iic) or a hybrid combination of them for constructing such sets and immediately before a set of three potential testing approaches are listed. A much clearer statement of how performance is tested, where and why as part of the “big-picture” would make the paper much more readable.
Finally, there are operational datasets available in the 8.5-14 µm spectral region that is where the major clear and cloudy sky variability is (e.g. Fig. 4). Such operational datasets could have been used to evaluate what level of additional “real world” variability there may be as compared to that which is observed in the synthetic data sets.
Minor comments:
Figure 4 – x-axis is labelled as Wavenumber (microns) which should be Wavelength (as in Fig. 5)
T_i is used to label the number of spectra. This should be changed since this variable is then part of an equation (4) where a summation is indexed over I, but T_i does not depend on the index.

A sensor zenith angle of zero degrees is used, but a swath of 8 cross-track footprints are shown. Does TIRS have a cross track swath and if so, what errors are introduced by assuming a zero degree sensor zenith?

Citation: https://doi.org/10.5194/egusphere-2023-2463-RC3
- AC3:
  'Reply on RC3', Brian Kahn, 22 Apr 2024
  Thanks to the three reviewers for the insightful and helpful comments. Below we generally describe how we will respond with a revised version of the manuscript.
  The three reviewers had similar comments about three different issues touched on in the paper. Thus, please find below our initial response on these three points. In the response to reviews for the revised version, we will include a more detailed set of responses for each reviewer separately.
  Using the best of 25 random draws to quantify performance with the simulated data set isn’t well described, explained, and justified.
  
  Other aspects of the methodology need further description.
  
  The manuscript would greatly benefit from some type of schematic.
  
  Our initial response to (1). We wholeheartedly agree that the approach needs more clarity and justification. After consideration of the reviewers’ comments, we believe that the wide range in the performance of PC-MSK among the 25 random draws in the high latitudes is a result of underfitting rather than overfitting. The PC-MSK training matrices of either clear-sky or cloudy-sky, or perhaps both, may not represent the true local scene variance that occurs within orbital segments of 2000 TIRS footprints. Thus, the clear and cloudy sky training matrices are not sufficiently realistic for some orbital segments and cannot capture all patterns in the spectra. It is natural to expect that some of the 25 random draws will perform better than the other random draws. The spread in the performance gets larger with increasing latitude away from the tropics as the MIR/FIR spectrum increases in complexity.
  The information contained in the simulated PREFIRE radiances is relatively low, nor is it remotely comparable to other PCA-based retrieval algorithms that leverage hyperspectral radiance spectra (e.g., AERI, IASI, and AIRS). The number of information-bearing PCs in hyperspectral radiance data sets range between 100-300, while the TIRS radiances in the simulated PREFIRE experiment range between 3-7 according to our calculations in Section 3.2. (Note: these estimates could be adjusted based on the availability of observed TIRS radiances after launch.) We examined PC-MSK to see if noise was captured in the higher order PCs, and they are not. Therefore, we do not believe we are encountering an overfitting problem.
  This may not matter as much in the Tropics – where the PC-MSK performance is very high across the 25 random draws. We believe this is because clear-sky radiance variability is primarily from water vapor variance in very moist atmospheres rather than from temperature variance. In the middle and high latitudes, temperature is highly variable in the horizontal and vertical directions, with frequent inversions over land surfaces. Furthermore, the clear-sky radiance Jacobians in the FIR channels are larger when water vapor amounts are low, exhibiting a much wider range of radiance variability in the FIR. Additionally, cloud height, cloud overlap, microphysical, and thermodynamic phase variances and covariances are more complex, with frequently occurring clouds located below bases of inversions that are difficult to detect. In the Northern Hemisphere extratropics, there is a large diversity of surface types and topography that add degrees of difficulty to cloud detection. Cloud state sampling biases are also important in high latitudes – for instance water vapor can be higher and temperature can be lower in cloud fields compared to clear skies – thus separation of clear and cloudy sky radiance contributions may be problematic in some clouds, particularly broken or optically thin clouds that have clear and cloudy radiance contributions that are both significant.
  Thus, we may ask– why does the ML-MSK perform so well if it is using the same simulated data sets as PC-MSK? Bertossa et al. (2023) use a neural network (NN) algorithm with hidden nodes that allow the information to be gleaned in a nonlinear way, and the algorithm learns the nonlinear behavior based on a loss function minimization. The PC-MSK by construction is linear and does not have an ability to learn complex behaviors like a NN. To account for the problem of underfitting that is most acute in the high latitudes, the brute force stochastic approach of 25 random draws (5 clear-sky times 5 cloudy-sky) is an attempt to sample more of the local variability and determine if the performance can be improved. We find that, by using the ML-MSK as a “truth” data set, we can improve the statistics considerably in some scenes, particularly in the Arctic region that is the primary mission target of PREFIRE. We agree with reviewer #1 that the best performing of the 25 PC-MSK random draws can be thought of as a “best case scenario”.
  Our initial response to (2). It is true that Otsu’s method is unsupervised. This method is used only with respect to the similarity index difference (SID) histograms, for instance figure 6. All Otsu’s method does is determine the breakpoint between two dominant peaks in the SID histograms (clear vs. cloud). To make the histograms of SID, we need clear-sky and cloudy-sky training matrices that use radiances drawn from the orbital segments of 2000 TIRS footprints. These matrices are contained in Eqns. (1) and (2) for the calculation of clear and cloudy sky SI. We will improve the clarity of our discussion of each step in the algorithm.
  We also agree with reviewer #3 that we could include information on the spread of the performance of PC-MSK for the 25 random draws. We will add some performance metrics for this. Furthermore, we plan on using VIIRS and CrIS/ATMS cloud retrievals from the JPSS satellites that are matched in space to the PREFIRE orbits to do a more careful independent validation of PC-MSK and ML-MSK. We also plan on using these coincident observations to test out a more deliberate training data set (see Section 5). Since both the PC-MSK and the ML-MSK are based on simulated data sets, these activities will not proceed until after launch.
  Our initial response to (3). It is a great idea to include a schematic of some type that shows how the PC-MSK algorithm will work operationally. We furthermore agree that the operational implementation of this algorithm is not clear. As a baseline operational algorithm, our first step was to use one random draw and produce PC-MSK along the orbit. We did not make that clear. But our intention was to make this a starting point for post-launch changes. As TIRS data becomes available, our second step is to understand more deeply the performance of PC-MSK based on the training data it is fed. By running 25 random draws, we can narrow down particular scene types, cloud types, and other characteristics that will help improve or degrade the performance. This will help inform a more deliberate sampling strategy than a random draw. Our third step is that there is no guarantee that ML-MSK will work in all conditions, for all scene types, and may have its own challenges. We will do detailed comparisons between PC-MSK and ML-MSK at the footprint level and determine what is working – and not working – for both masks.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2463-AC3

Viewed

Total article views: 1,114 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
731	301	82	1,114	82	97

HTML: 731
PDF: 301
XML: 82
Total: 1,114
BibTeX: 82
EndNote: 97

Views and downloads (calculated since 08 Nov 2023)

Month	HTML	PDF	XML	Total
Nov 2023	87	20	7	114
Dec 2023	41	13	2	56
Jan 2024	45	5	2	52
Feb 2024	97	16	8	121
Mar 2024	22	4	4	30
Apr 2024	58	14	8	80
May 2024	25	9	3	37
Jun 2024	49	7	5	61
Jul 2024	12	8	8	28
Aug 2024	10	3	7	20
Sep 2024	9	5	6	20
Oct 2024	8	8	0	16
Nov 2024	11	6	0	17
Dec 2024	2	6	0	8
Jan 2025	6	7	4	17
Feb 2025	10	7	1	18
Mar 2025	8	8	0	16
Apr 2025	6	10	1	17
May 2025	13	7	0	20
Jun 2025	7	28	1	36
Jul 2025	12	16	1	29
Aug 2025	36	11	2	49
Sep 2025	47	11	0	58
Oct 2025	13	18	1	32
Nov 2025	25	17	3	45
Dec 2025	25	12	2	39
Jan 2026	39	19	5	63
Feb 2026	8	6	1	15

Cumulative views and downloads (calculated since 08 Nov 2023)

Month	HTML	PDF	XML	Total
Nov 2023	87	20	7	114
Dec 2023	41	13	2	56
Jan 2024	45	5	2	52
Feb 2024	97	16	8	121
Mar 2024	22	4	4	30
Apr 2024	58	14	8	80
May 2024	25	9	3	37
Jun 2024	49	7	5	61
Jul 2024	12	8	8	28
Aug 2024	10	3	7	20
Sep 2024	9	5	6	20
Oct 2024	8	8	0	16
Nov 2024	11	6	0	17
Dec 2024	2	6	0	8
Jan 2025	6	7	4	17
Feb 2025	10	7	1	18
Mar 2025	8	8	0	16
Apr 2025	6	10	1	17
May 2025	13	7	0	20
Jun 2025	7	28	1	36
Jul 2025	12	16	1	29
Aug 2025	36	11	2	49
Sep 2025	47	11	0	58
Oct 2025	13	18	1	32
Nov 2025	25	17	3	45
Dec 2025	25	12	2	39
Jan 2026	39	19	5	63
Feb 2026	8	6	1	15

Viewed (geographical distribution)

Total article views: 1,099 (including HTML, PDF, and XML) Thereof 1,099 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Feb 2026

Short summary

A cloud detection mask algorithm is developed for the upcoming Polar Radiant Energy in the Far Infrared Experiment (PREFIRE) satellite mission to be launched by NASA in May 2024. The cloud mask is compared to "truth" and is capable of detecting over 90 % of all clouds globally tested with simulated data, and about 87 % of all clouds in the Arctic region.


Total:	0
HTML:	0
PDF:	0
XML:	0