CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers

Lenhardt, Julien; Quaas, Johannes; Sejdinovic, Dino; Klocke, Daniel

doi:https://doi.org/10.5194/egusphere-2024-2724

Preprints

https://doi.org/10.5194/egusphere-2024-2724

Preprints

02 Oct 2024

| 02 Oct 2024

CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers

Julien Lenhardt, Johannes Quaas, Dino Sejdinovic, and Daniel Klocke

Abstract. Clouds constitute, through their interactions with incoming solar radiation and outgoing terrestrial radiation, a fundamental element of the Earth’s climate system. Different cloud types show a wide variety in cloud microphysical or optical properties, phase, vertical extent or temperature among others, and thus disparate radiative effects. Both in observational and model datasets, classifying cloud types is also of large importance since different cloud types respond differently to current and future anthropogenic climate change. Cloud types have traditionally been defined using a simplified partition of the space determined by spatially aggregated values e.g. of the cloud top pressure and the cloud optical thickness. In this study, we present a method called CloudViT (Cloud Vision Transformer) building upon spatial extracts of cloud properties from the MODIS instrument to derive cloud types, leveraging spatial features and patterns with a vision transformer model. The classification model is based on global surface observations of cloud types. The method is then evaluated through the distributions of cloud type properties and the corresponding spatial patterns of cloud type occurrences for a global cloud type dataset produced over a year-long period. Subsequently, a first application of the cloud type classification method to climate model data is presented. This application additionally provides insights into how global storm-resolving models are representing clouds as these models are increasingly being used to perform simulations. The global cloud type dataset and the method code constituting CloudViT are available from Zenodo (Lenhardt et al., 2024b).

Received: 30 Aug 2024 – Discussion started: 02 Oct 2024

Competing interests: Some authors are members of the editorial board of journal ACP.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Julien Lenhardt, Johannes Quaas, Dino Sejdinovic, and Daniel Klocke

Status: final response (author comments only)

CC1:
'Comment on egusphere-2024-2724', Chen Zhou, 30 Oct 2024

This paper presents CloudViT, a novel cloud classification method based on Vision Transformers (ViTs) and cloud properties derived from MODIS satellite data. The authors aim to classify cloud types across global datasets using spatial patterns of cloud properties such as cloud top height (CTH), cloud optical thickness (COT), and cloud water path (CWP). The method is evaluated on co-located ground-based observations and satellite data, producing accurate classifications of different cloud types. The approach is further tested with applications to General Circulation Models (GCMs), notably ICON-Sapphire, showcasing CloudViT's ability to generalize cloud type retrievals at kilometer-scale resolution.
CloudViT leverages self-supervised learning for pretraining and contrastive learning to overcome the limited number of labeled cloud observations. The method is robust, showing competitive performance when compared to traditional methods and CNN-based approaches, and effectively captures global cloud distributions, including complex cloud types like cumuliform and stratiform clouds. I think the paper is suitable for acceptance with minor revisions.

Minor Comments:
L142: Change "retrieved" to the verb form "retrieve."
L177: Replace "requires" with "require" to agree with the plural subject.
L209: In the sentence "this type of model, alongside CNNs, are," replace "are" with the singular verb "is" to agree with the subject "this type of model."
L323: Change "cardinal" to "cardinality" to correctly refer to the size or number of elements in a set.
L587-L593: I believe it would be beneficial to discuss the limitations, such as follows：
Since MODIS data is collected through near-nadir scanning, observations in high-latitude regions become oblique, leading to distortions and errors in cloud property retrievals, such as cloud top height and optical thickness. This could potentially affect the model’s performance in polar regions.

Citation: https://doi.org/10.5194/egusphere-2024-2724-CC1
- AC1: 'Reply on RC1', Julien Lenhardt, 25 Feb 2025
  
  Please find our response to the referees in the supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2724-AC1
RC1:
'Comment on egusphere-2024-2724', Anonymous Referee #1, 17 Nov 2024

My previous comment appears as "CC", so I re-posted my comment as "RC" here.
Overview:
This paper presents CloudViT, a novel cloud classification method based on Vision Transformers (ViTs) and cloud properties derived from MODIS satellite data. The authors aim to classify cloud types across global datasets using spatial patterns of cloud properties such as cloud top height (CTH), cloud optical thickness (COT), and cloud water path (CWP). The method is evaluated on co-located ground-based observations and satellite data, producing accurate classifications of different cloud types. The approach is further tested with applications to General Circulation Models (GCMs), notably ICON-Sapphire, showcasing CloudViT's ability to generalize cloud type retrievals at kilometer-scale resolution.
CloudViT leverages self-supervised learning for pretraining and contrastive learning to overcome the limited number of labeled cloud observations. The method is robust, showing competitive performance when compared to traditional methods and CNN-based approaches, and effectively captures global cloud distributions, including complex cloud types like cumuliform and stratiform clouds. I think the paper is suitable for acceptance with minor revisions.

Minor Comments:
L142: Change "retrieved" to the verb form "retrieve."
L177: Replace "requires" with "require" to agree with the plural subject.
L209: In the sentence "this type of model, alongside CNNs, are," replace "are" with the singular verb "is" to agree with the subject "this type of model."
L323: Change "cardinal" to "cardinality" to correctly refer to the size or number of elements in a set.
L587-L593: I believe it would be beneficial to discuss the limitations, such as follows：
Since MODIS data is collected through near-nadir scanning, observations in high-latitude regions become oblique, leading to distortions and errors in cloud property retrievals, such as cloud top height and optical thickness. This could potentially affect the model’s performance in polar regions.

Citation: https://doi.org/10.5194/egusphere-2024-2724-RC1
- AC1: 'Reply on RC1', Julien Lenhardt, 25 Feb 2025
  
  Please find our response to the referees in the supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2724-AC1
RC2:
'Comment on egusphere-2024-2724', Anonymous Referee #2, 20 Dec 2024

The paper shows results of using a ViT model that is pretrained on MODIS data to classify cloud scenes into 4/10 cloud types as defined by WMO. The authors go into great details at times on model training choices etc. The paper is, however, light on physics. The classification performance of the models shown in the paper is honestly quite poor. Accuracy of 0.46 and F1 score of 0.43 for the best model on test data cannot be treated as state-of-the-art. Note that the statistics for the training data are not that much better either. There is something not quite right about for this paper, either the choice of training data, the training procedure, or something else because ViT models are quite capable as the author wrote in the intro, yet the resulting performance is so poor. We urge the authors to investigate this glaring mismatch and improve the model's performance. Otherwise, results from application of such a model are highly unreliable, which defeats the purpose. I therefore suggest a major revision.
Technically, I do not see anything wrong with the general approach in terms of engineering. The authors described how they approached the problem, and given enough data and if the approach is sound, the models used in this paper should give us highly performing models.
The authors also did not do a thorough job at reviewing the literature on cloud type classification using machine learning/ deep learning. Their introduction to the subject seems a bit vague. I suggest the authors pay more attention to the actual physics instead of details of engineering and implementation because this is not an applied machine learning journal.
Minor comments:
Line 34-35: references are needed. This sentence is also a bit disconnected from previous ones.
lines 39-41: references are needed
Lines 46-47: please rewrite this sentence because it is confusing to read.
Line 51: the classification is not done pixel-wise as far as I'm aware.
Lines 62-68: incomplete review of cloud type classifcation
Line 82: 'robust retrievals': this is usually not considered a retrieval since in cloud remote sensing community retrievals have specific meaning.
Line 103: I'm dubious on the point that such classification provides 'a high level of precision'.
Line 112: again, I'm not sure why 'retrievals' are used here. It reads off.
Figure 1: should at least contain panels that show the actual number of training data.
Lines 177-178: not clear what this sentence means.
Line 178-179: present the actual number of training data samples to give readers a clear idea.
Line 190: 128x128 pixels are not the same as 128kmx128km.
Line 197: sloppy language use. Suggest to change.
Line 199: is 4&10 much more detailed than 9 types?
Lines 215-217: what? This sentence is quite confusing. Please rewrite.
Line 243: please use terms consistently. Do not use different terms to refer to the same thing.
Anything after Table 2 and Figure 5 is not worth discussing too much because the performance is just not acceptable. The authors need to dig deeper into their data, approach, or something else to find ways to improve the results before application.

Citation: https://doi.org/10.5194/egusphere-2024-2724-RC2
- AC1: 'Reply on RC1', Julien Lenhardt, 25 Feb 2025
  
  Please find our response to the referees in the supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2724-AC1

Julien Lenhardt, Johannes Quaas, Dino Sejdinovic, and Daniel Klocke

Data sets

Datasets for the article "CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers." J. Lenhardt et al. https://doi.org/10.5281/zenodo.12731288

Model code and software

Model code for the article "CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers." J. Lenhardt et al. https://doi.org/10.5281/zenodo.12731288

Interactive computing environment

Notebooks for the article "CloudViT: classifying cloud types in global satellite data and in kilometre-resolution simulations using vision transformers." J. Lenhardt et al. https://doi.org/10.5281/zenodo.12731288

Julien Lenhardt, Johannes Quaas, Dino Sejdinovic, and Daniel Klocke

Viewed

Total article views: 1,529 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
923	194	412	1,529	20	40

HTML: 923
PDF: 194
XML: 412
Total: 1,529
BibTeX: 20
EndNote: 40

Views and downloads (calculated since 02 Oct 2024)

Month	HTML	PDF	XML	Total
Oct 2024	117	29	30	176
Nov 2024	86	14	49	149
Dec 2024	61	19	53	133
Jan 2025	29	7	40	76
Feb 2025	34	7	40	81
Mar 2025	24	12	42	78
Apr 2025	21	9	46	76
May 2025	20	8	90	118
Jun 2025	29	37	20	86
Jul 2025	18	13	0	31
Aug 2025	96	20	1	117
Sep 2025	369	15	1	385
Oct 2025	19	4	0	23

Cumulative views and downloads (calculated since 02 Oct 2024)

Month	HTML	PDF	XML	Total
Oct 2024	117	29	30	176
Nov 2024	86	14	49	149
Dec 2024	61	19	53	133
Jan 2025	29	7	40	76
Feb 2025	34	7	40	81
Mar 2025	24	12	42	78
Apr 2025	21	9	46	76
May 2025	20	8	90	118
Jun 2025	29	37	20	86
Jul 2025	18	13	0	31
Aug 2025	96	20	1	117
Sep 2025	369	15	1	385
Oct 2025	19	4	0	23

Viewed (geographical distribution)

Total article views: 1,512 (including HTML, PDF, and XML) Thereof 1,512 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 14 Oct 2025

Short summary

Clouds come in various shapes and sizes and constitute a fundamental element of the Earth’s climate system. Different cloud types show variable impacts on climate change. We present a new cloud type classification method called CloudViT relying on spatial patterns of cloud properties obtained from satellite data using machine learning. We can thus help understanding the effects of different cloud types on climate change.


Total:	0
HTML:	0
PDF:	0
XML:	0