Preprints
https://doi.org/10.5194/egusphere-2026-1512
https://doi.org/10.5194/egusphere-2026-1512
11 May 2026
 | 11 May 2026
Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

CloudMViT: Cloud Classification Using Ground-Based Remote Sensing Imagery and a Lightweight Hybrid Architecture

Wei Xu, Ningning Wu, and Lin Feng

Abstract. Ground-based remote sensing cloud image data can be used to analyze regional cloud type variation trends, thereby predicting future water resource supply capacity. However, existing cloud classification methods based on ground-based remote sensing imagery often suffer from limited recognition accuracy due to insufficient fine-grained feature extraction, and their large model parameter counts hinder deployment on embedded terminals. To address these issues, this study proposes CloudMViT, a lightweight hybrid network architecture fusing a dual-pooling channel attention module and cross-scale self-attention, which enhances both local and global feature representation of cloud images while optimizing computational efficiency. Specifically, the model suppresses sky background interference and strengthens cloud edge features via the dual-pooling channel attention module that combines global average pooling (GAP) and global max pooling (GMP); captures cross-channel detailed features (e.g., cirrus fibril structures and stratocumulus shadows) using depthwise separable convolution and a decoupling mechanism; and further reduces model parameters by introducing CloudGhost cascade compression technology through linear feature redundancy elimination.Experiments on the World Meteorological Organization (WMO)-compliant HBMCD (10 standard cloud genera) and GCD (7 sky conditions) datasets demonstrate that CloudMViT achieves classification accuracies of 98.81 % and 95.13 %, respectively, significantly outperforming lightweight models such as MobileViT and EfficientNet. Ablation experiments validate the effectiveness of the dual-pooling channel attention module (improving accuracy by 5.31 %) and the CloudGhost module (increasing inference speed by 50 %). When deployed on the RK3588 embedded platform, the INT8-quantized CloudMViT enables real-time inference, maintaining an accuracy of 94.79 % with only 0.47 MB of memory occupation. The proposed cloud classification method and hardware acceleration strategy provide a feasible solution for the development of portable ground-based cloud observation and classification devices.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Wei Xu, Ningning Wu, and Lin Feng

Status: open (until 16 Jun 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Wei Xu, Ningning Wu, and Lin Feng
Wei Xu, Ningning Wu, and Lin Feng
Metrics will be available soon.
Latest update: 11 May 2026
Download
Short summary
Clouds shape Earth’s climate and water supply. Classifying them from ground-based images helps track regional weather. Existing models are either inaccurate or too large for portable devices. We present a lightweight model, CloudMViT, using dual-pooling channel attention and cross-scale self-attention for cloud classification. Experiments show higher accuracy and real-time, low-memory performance on embedded hardware. This work supports portable cloud observation for climate monitoring.
Share