A comparative analysis of deep learning models for classifying shallow mesoscale cloud patterns in satellite images

Granberg, Anna; Lundholm, Vilma; Khalaj, Pouria; Thomas, Manu Anna; Ding, Yifan; Jönsson, Daniel; Devasthale, Abhay

doi:10.5194/egusphere-2026-915

Preprints

https://doi.org/10.5194/egusphere-2026-915

Preprints

09 Apr 2026

| 09 Apr 2026

A comparative analysis of deep learning models for classifying shallow mesoscale cloud patterns in satellite images

Anna Granberg, Vilma Lundholm, Pouria Khalaj, Manu Anna Thomas, Yifan Ding, Daniel Jönsson, and Abhay Devasthale

Abstract. Representation of clouds in climate models is challenging, not the least due to their heterogeneous spatial structures and dynamic behavior. In this study, the potential of advanced machine learning (ML) techniques to identify and categorize mesoscale low-level cloud structures in satellite imagery is explored, with particular emphasis on those patterns that are frequently observed over the trade wind regions of the south Atlantic Ocean.

Rectified Level 1.5 satellite images from the spinning enhanced visible and infrared imager (SEVIRI) for the year 2021 are used for the analysis. To assess the potential gains in classification accuracy under limited labeled datasets, several deep learning approaches are evaluated. The analysis considers a custom-built convolutional neural network, a pre-trained 50-layer residual neural network adapted through transfer learning using EuroSat, and a self-supervised vision transformer framework known as DINOv2 (self-distillation with no labels version 2). The embeddings, i.e. the feature representations yielded by DINOv2 are used in two separate approaches, one based on manually-labeled data and the other using the k-means clustering algorithm.

The results show that combining the DINOv2 model with a multilayer perceptron and training on labeled data achieves the highest cloud pattern classification accuracy among the evaluated ML approaches.

Received: 17 Feb 2026 – Discussion started: 09 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Anna Granberg, Vilma Lundholm, Pouria Khalaj, Manu Anna Thomas, Yifan Ding, Daniel Jönsson, and Abhay Devasthale

Status: final response (author comments only)

RC1: 'Comment on egusphere-2026-915', Anonymous Referee #1, 04 Jun 2026

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-915/egusphere-2026-915-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2026-915-RC1
RC2:
'Comment on egusphere-2026-915', Anonymous Referee #2, 27 Jun 2026
This manuscript compares CNN, EuroSat transfer-learning ResNet50, supervised DINOv2+MLP, and DINOv2 feature-based clustering approaches for classifying shallow mesoscale cloud patterns in SEVIRI imagery. The research question is clear, the dataset and benchmark design are useful, and the results consistently suggest that DINOv2 features are more robust under domain shift. The manuscript is generally well structured and the figures and tables are comprehensive. However, several aspects require clarification, especially data splitting, annotation reliability, model comparability, and reproducibility. I recommend minor revision.
Major Minor-Revision Comments

Clarify whether all models were evaluated on exactly the same test set.

Sections 4.2 and 6.1 state that all models use a unified 15% hold-out test set, but the class counts in Tables 5/6 differ from those in Table 7. For example, Closed Cell has a count of 71 for CNN/ResNet50 but 98 for DINOv2+MLP. This affects direct comparison of Top-1/Top-2 and per-class metrics. The authors should explain this discrepancy or re-report all supervised model results on an identical test set.

Provide more detail on the manual annotation process and label consistency.

The dataset was annotated by a meteorologist, and the Discussion notes that some annotations are incomplete or inconsistent. The authors should describe the annotation criteria, how ambiguous cases were handled, whether any quality control or second review was performed, and whether inter-annotator agreement was assessed. If only one annotator was used, this should be explicitly acknowledged as a limitation.

Describe the benchmark domain shift more concretely.

The manuscript states that benchmark images are larger and differ in annotation conditions, causing domain shift. Please provide more information on the benchmark sampling strategy, date/month distribution, bounding-box size distribution, class distribution, and how these differ from the original dataset. If possible, a simple stratified analysis by bounding-box size or month would strengthen the claim.

Add uncertainty estimates or significance testing for model comparisons.

On the benchmark set, DINOv2+MLP achieves a Top-1 accuracy of 0.61, compared with 0.56 for both CNN and ResNet50. The direction is clear, but the margin is modest. Bootstrap confidence intervals, McNemar tests, or at least macro/weighted averages would make the comparison more convincing.

Improve reproducibility details for DINOv2 and clustering.

Please specify the exact DINOv2 version/model size, embedding dimension, input normalization, k-means initialization settings, number of runs, random seeds, and whether PCA or feature standardization was used. Also, Tables 11/12 report cluster-label prediction performance, which is not equivalent to semantic cloud-class accuracy; this distinction should be made clearer.

Strengthen the code and data availability statement.

The current statement says that the code will be made public after the AI4PEX project and can meanwhile be provided upon request. At minimum, the authors should provide model configurations, training/test split files, random seeds, annotation files, or a reproducibility package. If full release is not yet possible, the reason and expected timeline should be stated.

Minor Comments

The abstract would benefit from including the main benchmark result, such as the Top-1/Top-2 accuracy of DINOv2+MLP.

Tables 13/14 should also report macro-F1 or balanced accuracy, since per-class performance varies substantially.

Figure 10 is dense, and some Wasserstein/Silhouette labels are difficult to read. Please enlarge the font or move some panels to supplementary material.

Please standardize terminology, such as “data set” vs. “dataset,” “DINOv2+MLP” vs. “supervised DINOv2+MLP,” and “Custom CNN” vs. “CNN.”

Several minor language issues should be corrected, for example “Each metric capture” should be “Each metric captures,” and “class class” is repeated in the caption of Table 14.

The Discussion would benefit from a clearer distinction between transfer learning from general remote-sensing imagery and transfer learning from cloud/atmospheric imagery.

Please define how Top-2 accuracy is computed, especially in cases where class probabilities are close or tied.

The conclusion that cloud patterns “can be effectively classified” should be softened, since the best benchmark Top-1 accuracy is only 0.61. A more cautious phrasing would be “moderately classified under domain shift, with DINOv2 showing the strongest robustness.”
Citation: https://doi.org/10.5194/egusphere-2026-915-RC2

Anna Granberg, Vilma Lundholm, Pouria Khalaj, Manu Anna Thomas, Yifan Ding, Daniel Jönsson, and Abhay Devasthale

Viewed

Total article views: 400 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
233	146	21	400	16	23

HTML: 233
PDF: 146
XML: 21
Total: 400
BibTeX: 16
EndNote: 23

Views and downloads (calculated since 09 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	163	86	13	262
May 2026	50	39	2	91
Jun 2026	11	6	2	19
Jul 2026	9	15	4	28

Cumulative views and downloads (calculated since 09 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	163	86	13	262
May 2026	50	39	2	91
Jun 2026	11	6	2	19
Jul 2026	9	15	4	28

Viewed (geographical distribution)

Total article views: 383 (including HTML, PDF, and XML) Thereof 383 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Jul 2026

Short summary

This study explores the potential of machine learning models to classify mesoscale low-level cloud patterns frequently observed in the trade wind regions of the Atlantic Ocean. These clouds significantly influence the Earth's climate. This study is the first of its kind to establish a framework for classifying these shallow clouds – currently parameterized in global climate models and offers a framework that can be integrated into such models to reduce uncertainties in the climate projections.


Total:	0
HTML:	0
PDF:	0
XML:	0