the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Leveraging Machine Learning to Enhance Aerosol Classification using Single-Particle Mass Spectrometry
Abstract. Advancing automated classification of atmospheric aerosols from Single-Particle Mass Spectrometry (SPMS) data remains challenging due to overlapping chemical signatures and limited labeled data. Semi-supervised learning approaches offer potential solutions by leveraging unlabeled data to enhance classification accuracy. Four models were compared: a supervised Support Vector Machine (SVM), a self-training SVM, a stacked autoencoder classifier, and a stacked autoencoder trained with a temporal ensembling mean teacher framework. All models achieved robust performance with overall accuracies of 90.0–91.1 %, representing improvements over previous work on the same dataset (87 %) and competitive performance with current methods. Notably, the models effectively classified aerosols with limited representation in the dataset – soot (0.77 % of spectra, F1-scores: 0.93–0.97) and hazelnut pollen (0.98 % of spectra, F1-scores: 0.97–1.00) – highlighting their ability to capture distinct chemical signatures even with fewer than 200 training samples per class. While challenges persist in classifying certain species, particularly feldspars due to overlapping spectral features and class imbalances, this study demonstrates the significant potential of semi-supervised learning and advanced machine learning architectures in improving aerosol classification, with implications for atmospheric and climate research.
- Preprint
(2582 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 21 Oct 2025)
-
RC1: 'Comment on egusphere-2025-3616', Anonymous Referee #1, 29 Sep 2025
reply
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3616/egusphere-2025-3616-RC1-supplement.pdfReplyCitation: https://doi.org/
10.5194/egusphere-2025-3616-RC1 -
RC2: 'Comment on egusphere-2025-3616', Anonymous Referee #2, 06 Oct 2025
reply
This paper investigated the performance of semi-supervised learning approaches in the automated classification of atmospheric aerosols from SPMS data. By leveraging unlabeled data, semi-supervised learning can enhance the model's generalization performance and mitigate the risk of overfitting. This study demonstrates the significant potential of semi-supervised learning and advanced machine learning architectures in improving aerosol classification. However, the new methods have not been tested using field data, limiting its potential implications. It can be recommended for publication after the following comments are addressed.
Specific comments:
Lines 118-120: The authors stated that a supervised learning approach cannot identify aerosol types absent from the training data. How did the semi-supervised learning method resolve this problem? It would be helpful to show the performance of both methods when aerosol types are absent from the training data.
Lines 142-150: It is unclear how the PALMS data were collected. Are these data obtained during chamber experiments? Details on the experimental procedures should be given. The model performance should also be tested for field data that is more complex.
Lines 158-160: Were these unlabeled mass spectra collected for a mixture of different aerosol types? Did these data include inorganic aerosols?
It will be helpful if the source code is open to the public.
Citation: https://doi.org/10.5194/egusphere-2025-3616-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,676 | 26 | 5 | 1,707 | 21 | 29 |
- HTML: 1,676
- PDF: 26
- XML: 5
- Total: 1,707
- BibTeX: 21
- EndNote: 29
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1