the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers
Abstract. Clouds constitute, through their interactions with incoming solar radiation and outgoing terrestrial radiation, a fundamental element of the Earth’s climate system. Different cloud types show a wide variety in cloud microphysical or optical properties, phase, vertical extent or temperature among others, and thus disparate radiative effects. Both in observational and model datasets, classifying cloud types is also of large importance since different cloud types respond differently to current and future anthropogenic climate change. Cloud types have traditionally been defined using a simplified partition of the space determined by spatially aggregated values e.g. of the cloud top pressure and the cloud optical thickness. In this study, we present a method called CloudViT (Cloud Vision Transformer) building upon spatial extracts of cloud properties from the MODIS instrument to derive cloud types, leveraging spatial features and patterns with a vision transformer model. The classification model is based on global surface observations of cloud types. The method is then evaluated through the distributions of cloud type properties and the corresponding spatial patterns of cloud type occurrences for a global cloud type dataset produced over a year-long period. Subsequently, a first application of the cloud type classification method to climate model data is presented. This application additionally provides insights into how global storm-resolving models are representing clouds as these models are increasingly being used to perform simulations. The global cloud type dataset and the method code constituting CloudViT are available from Zenodo (Lenhardt et al., 2024b).
- Preprint
(13359 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2024-2724', Chen Zhou, 30 Oct 2024
This paper presents CloudViT, a novel cloud classification method based on Vision Transformers (ViTs) and cloud properties derived from MODIS satellite data. The authors aim to classify cloud types across global datasets using spatial patterns of cloud properties such as cloud top height (CTH), cloud optical thickness (COT), and cloud water path (CWP). The method is evaluated on co-located ground-based observations and satellite data, producing accurate classifications of different cloud types. The approach is further tested with applications to General Circulation Models (GCMs), notably ICON-Sapphire, showcasing CloudViT's ability to generalize cloud type retrievals at kilometer-scale resolution.
CloudViT leverages self-supervised learning for pretraining and contrastive learning to overcome the limited number of labeled cloud observations. The method is robust, showing competitive performance when compared to traditional methods and CNN-based approaches, and effectively captures global cloud distributions, including complex cloud types like cumuliform and stratiform clouds. I think the paper is suitable for acceptance with minor revisions.
Minor Comments:
L142: Change "retrieved" to the verb form "retrieve."
L177: Replace "requires" with "require" to agree with the plural subject.
L209: In the sentence "this type of model, alongside CNNs, are," replace "are" with the singular verb "is" to agree with the subject "this type of model."
L323: Change "cardinal" to "cardinality" to correctly refer to the size or number of elements in a set.
L587-L593: I believe it would be beneficial to discuss the limitations, such as follows:
Since MODIS data is collected through near-nadir scanning, observations in high-latitude regions become oblique, leading to distortions and errors in cloud property retrievals, such as cloud top height and optical thickness. This could potentially affect the model’s performance in polar regions.
Citation: https://doi.org/10.5194/egusphere-2024-2724-CC1 -
RC1: 'Comment on egusphere-2024-2724', Anonymous Referee #1, 17 Nov 2024
My previous comment appears as "CC", so I re-posted my comment as "RC" here.
Overview:
This paper presents CloudViT, a novel cloud classification method based on Vision Transformers (ViTs) and cloud properties derived from MODIS satellite data. The authors aim to classify cloud types across global datasets using spatial patterns of cloud properties such as cloud top height (CTH), cloud optical thickness (COT), and cloud water path (CWP). The method is evaluated on co-located ground-based observations and satellite data, producing accurate classifications of different cloud types. The approach is further tested with applications to General Circulation Models (GCMs), notably ICON-Sapphire, showcasing CloudViT's ability to generalize cloud type retrievals at kilometer-scale resolution.
CloudViT leverages self-supervised learning for pretraining and contrastive learning to overcome the limited number of labeled cloud observations. The method is robust, showing competitive performance when compared to traditional methods and CNN-based approaches, and effectively captures global cloud distributions, including complex cloud types like cumuliform and stratiform clouds. I think the paper is suitable for acceptance with minor revisions.
Minor Comments:
L142: Change "retrieved" to the verb form "retrieve."
L177: Replace "requires" with "require" to agree with the plural subject.
L209: In the sentence "this type of model, alongside CNNs, are," replace "are" with the singular verb "is" to agree with the subject "this type of model."
L323: Change "cardinal" to "cardinality" to correctly refer to the size or number of elements in a set.
L587-L593: I believe it would be beneficial to discuss the limitations, such as follows:
Since MODIS data is collected through near-nadir scanning, observations in high-latitude regions become oblique, leading to distortions and errors in cloud property retrievals, such as cloud top height and optical thickness. This could potentially affect the model’s performance in polar regions.
Citation: https://doi.org/10.5194/egusphere-2024-2724-RC1 -
RC2: 'Comment on egusphere-2024-2724', Anonymous Referee #2, 20 Dec 2024
The paper shows results of using a ViT model that is pretrained on MODIS data to classify cloud scenes into 4/10 cloud types as defined by WMO. The authors go into great details at times on model training choices etc. The paper is, however, light on physics. The classification performance of the models shown in the paper is honestly quite poor. Accuracy of 0.46 and F1 score of 0.43 for the best model on test data cannot be treated as state-of-the-art. Note that the statistics for the training data are not that much better either. There is something not quite right about for this paper, either the choice of training data, the training procedure, or something else because ViT models are quite capable as the author wrote in the intro, yet the resulting performance is so poor. We urge the authors to investigate this glaring mismatch and improve the model's performance. Otherwise, results from application of such a model are highly unreliable, which defeats the purpose. I therefore suggest a major revision.
Technically, I do not see anything wrong with the general approach in terms of engineering. The authors described how they approached the problem, and given enough data and if the approach is sound, the models used in this paper should give us highly performing models.
The authors also did not do a thorough job at reviewing the literature on cloud type classification using machine learning/ deep learning. Their introduction to the subject seems a bit vague. I suggest the authors pay more attention to the actual physics instead of details of engineering and implementation because this is not an applied machine learning journal.
Minor comments:
Line 34-35: references are needed. This sentence is also a bit disconnected from previous ones.
lines 39-41: references are needed
Lines 46-47: please rewrite this sentence because it is confusing to read.
Line 51: the classification is not done pixel-wise as far as I'm aware.
Lines 62-68: incomplete review of cloud type classifcation
Line 82: 'robust retrievals': this is usually not considered a retrieval since in cloud remote sensing community retrievals have specific meaning.
Line 103: I'm dubious on the point that such classification provides 'a high level of precision'.
Line 112: again, I'm not sure why 'retrievals' are used here. It reads off.
Figure 1: should at least contain panels that show the actual number of training data.
Lines 177-178: not clear what this sentence means.
Line 178-179: present the actual number of training data samples to give readers a clear idea.
Line 190: 128x128 pixels are not the same as 128kmx128km.
Line 197: sloppy language use. Suggest to change.
Line 199: is 4&10 much more detailed than 9 types?
Lines 215-217: what? This sentence is quite confusing. Please rewrite.
Line 243: please use terms consistently. Do not use different terms to refer to the same thing.
Anything after Table 2 and Figure 5 is not worth discussing too much because the performance is just not acceptable. The authors need to dig deeper into their data, approach, or something else to find ways to improve the results before application.
Citation: https://doi.org/10.5194/egusphere-2024-2724-RC2
Data sets
Datasets for the article "CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers." J. Lenhardt et al. https://doi.org/10.5281/zenodo.12731288
Model code and software
Model code for the article "CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers." J. Lenhardt et al. https://doi.org/10.5281/zenodo.12731288
Interactive computing environment
Notebooks for the article "CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers." J. Lenhardt et al. https://doi.org/10.5281/zenodo.12731288
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
282 | 68 | 161 | 511 | 9 | 7 |
- HTML: 282
- PDF: 68
- XML: 161
- Total: 511
- BibTeX: 9
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 145 | 28 |
Germany | 2 | 46 | 9 |
France | 3 | 34 | 6 |
Sweden | 4 | 28 | 5 |
Romania | 5 | 25 | 5 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 145