Preprints
https://doi.org/10.5194/egusphere-2024-3160
https://doi.org/10.5194/egusphere-2024-3160
13 Nov 2024
 | 13 Nov 2024
Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

Exploring the effect of training set size and number of categories on ice crystal classification through a contrastive semi-supervised learning algorithm

Yunpei Chu, Huiying Zhang, Xia Li, and Jan Henneberger

Abstract. The shapes of ice crystals play an important role in global precipitation formation and radiation budget. Classifying ice crystal shapes can improve our understanding of in-cloud conditions and these processes. However, existing classification methods rely on features such as the aspect ratio of ice crystals, environmental temperature, and so on, which bring high instability to the classification performance, or employ supervised learning machine learning algorithms that heavily rely on human labeling. This poses significant challenges, including human subjectivity in classification and a substantial labor cost in manual labeling. In addition, previous deep learning algorithms for ice crystal classification are often trained and evaluated on datasets with varying sizes and different classification schemes, each with distinct criteria and a different number of categories, making it difficult to make a fair comparison of algorithm performance. To overcome these limitations, a contrastive semi-supervised learning (CSSL) algorithm for the classification of ice crystals is proposed. The algorithm consists of an upstream unsupervised learning network tasked with extracting meaningful representations from a large amount of unlabeled ice crystal images, and a downstream supervised network is fine-tuned with a small subset labeled images of the entire dataset to perform the classification task. To determine the minimal number of ice crystal images that require human labeling while balancing the algorithm performance and manual labeling effort, the algorithm is trained and evaluated on datasets with varying sizes and numbers of categories. The ice crystal data used in this study was collected during the NASCENT campaign at Ny-Ålesund and CLOUDLAB project on the Swiss plateau, using a holographic imager mounted on a tethered balloon system. In general, the CSSL algorithm performs better than a purely supervised algorithm in classifying 19 categories. Approximately 154 hours of manual labeling can be avoided using just 11 % (2048 images) of the training set for fine-tuning, sacrificing only 3.8 % in overall precision compared to a fully supervised model trained on the entire dataset. In the 4-category classification task, the CSSL algorithm also outperforms the purely supervised algorithm. The algorithm fine-tuned on 2048 images (25 % of the entire 4-category dataset) achieves an overall accuracy of 89.6 %, which is comparable to the purely supervised algorithm trained on 8192 images (91.0 %). Moreover, when tested on the unseen CLOUDLAB dataset, the CSSL algorithm exhibits significantly stronger generalization capabilities than the supervised approach, with an average improvement of 2.19 % in accuracy. These results highlight the strength and practical effectiveness of CSSL in comparison to purely supervised methods, and the potential of the CSSL algorithm to perform well on datasets that would be collected under different conditions.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Yunpei Chu, Huiying Zhang, Xia Li, and Jan Henneberger

Status: open (until 19 Dec 2024)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2024-3160', Louis Jaffeux, 03 Dec 2024 reply
Yunpei Chu, Huiying Zhang, Xia Li, and Jan Henneberger
Yunpei Chu, Huiying Zhang, Xia Li, and Jan Henneberger

Viewed

Total article views: 142 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
110 27 5 142 2 1
  • HTML: 110
  • PDF: 27
  • XML: 5
  • Total: 142
  • BibTeX: 2
  • EndNote: 1
Views and downloads (calculated since 13 Nov 2024)
Cumulative views and downloads (calculated since 13 Nov 2024)

Viewed (geographical distribution)

Total article views: 143 (including HTML, PDF, and XML) Thereof 143 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 13 Dec 2024
Download
Short summary
Our study improves ice crystal shape classification, key for understanding weather and climate. By adding unsupervised pre-training before supervised classification, our algorithm reduces manual labeling effort while maintaining high accuracy. It outperforms fully supervised models across datasets of varying sizes and categories, showing strong generalization ability. This method improves ice crystal classification techniques, making it adaptable to different environmental datasets.