Leveraging Machine Learning to Enhance Aerosol Classification using Single-Particle Mass Spectrometry
Abstract. Advancing automated classification of atmospheric aerosols from Single-Particle Mass Spectrometry (SPMS) data remains challenging due to overlapping chemical signatures and limited labeled data. Semi-supervised learning approaches offer potential solutions by leveraging unlabeled data to enhance classification accuracy. Four models were compared: a supervised Support Vector Machine (SVM), a self-training SVM, a stacked autoencoder classifier, and a stacked autoencoder trained with a temporal ensembling mean teacher framework. All models achieved robust performance with overall accuracies of 90.0–91.1 %, representing improvements over previous work on the same dataset (87 %) and competitive performance with current methods. Notably, the models effectively classified aerosols with limited representation in the dataset – soot (0.77 % of spectra, F1-scores: 0.93–0.97) and hazelnut pollen (0.98 % of spectra, F1-scores: 0.97–1.00) – highlighting their ability to capture distinct chemical signatures even with fewer than 200 training samples per class. While challenges persist in classifying certain species, particularly feldspars due to overlapping spectral features and class imbalances, this study demonstrates the significant potential of semi-supervised learning and advanced machine learning architectures in improving aerosol classification, with implications for atmospheric and climate research.