the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Unsupervised Classification of Absorbing Aerosols with the SP2 via a Variational Autoencoder (VAE)
Abstract. The Single Particle Soot Photometer (SP2) detects refractory aerosol particle mass on a single-particle basis via laser-induced incandescence (L-II). While the SP2 has traditionally been used to quantify black carbon aerosol mass in the atmosphere, the instrument is increasingly being used to detect and quantify other types of absorbing aerosols, such as mineral dust or anthropogenically-sourced iron oxide aerosols. Quantifying the mass loadings and emission sources of absorbing aerosols in the atmosphere is important for understanding their role in the climate cycle. Supervised machine learning algorithms have shown potential to classify different types of aerosols from L-II signals, but these methods are sensitive to instrument configuration and require training datasets generated from laboratory samples, which do not generalize well to ambient atmospheric aerosols. Here we explore the effectiveness of an unsupervised deep learning method, a variational autoencoder (VAE), applied directly to L-II signals from the SP2 in order to classify different types of absorbing aerosols. The VAE compresses L-II signals into a bottleneck latent representation and reconstructs an output as similar as possible to the input signal, thereby reducing dimensionality. We apply this approach to a dataset comprised of laboratory samples of materials that show detectable incandescence in the SP2, including fullerene soot (as a proxy for black carbon), coated fullerene soot, coal fly ash, mineral dust, volcanic ash, hematite, and magnetite. We explore optimal latent representations of L-II signals to maximize separability of different aerosol classes by varying the size of the latent representation, and find that a latent representation of 3 allows us to capture the majority of the information in the L-II signals relevant for identifying different types of absorbing aerosols. We demonstrate that unsupervised machine learning is a promising method for identifying distinct populations of aerosols detected by the SP2.
- Preprint
(9239 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-3210', Anonymous Referee #1, 09 Sep 2025
- AC2: 'Reply on RC1', Kara Lamb, 14 Nov 2025
-
RC2: 'Comment on egusphere-2025-3210', Anonymous Referee #2, 18 Sep 2025
This study presents a new method for classifying aerosol particles based on the Laser-Induced Incandescence (L-II) signals using unsupervised machine learning. The author applied Variational Autoencoder (VAE) to analyze L-II signals and compress into a lower-dimensional latent space. This approach is an improvement because it removes the need for manual feature engineering. The paper is well-structured and easy to follow. The introduction provides sufficient background on the SP2 and the limitations of previous classifying methods. The methods section clearly describes the dataset, data preprocessing, and the VAE model. Despite these strengths, there are several issues that must be addressed before publication.
General comments:
- Physical Interpretation of Latent Space: While the VAE approach is a powerful tool for classifying aerosol types, the discussion on the physical interpretation of the latent space (e.g., z1−z4) feels underdeveloped. The connection between the latent representation and blackbody temperature (Figure 3) is fascinating and a key finding. What specific features of the L-II signals (e.g., peak sharpness, symmetry, or decay rate) are being captured by these latent variables? A more thorough discussion linking the distributions in Figures 5 and 6 to the microphysical properties of the different aerosol types (BC, FeOx, dust) would significantly strengthen the paper's scientific contribution.
- Outlier Detection and Ambient Data: The claim that outlier detection can be useful for characterizing aerosols from various sources (Lines 293-301) seems a strong assertion. While this is a promising application, the current study, which uses laboratory-generated data, does not provide sufficient evidence to support this claim for ambient atmospheric observations. To make this point more convincing, the authors would need to analyze real atmospheric data. I recommend toning down this claim to a more cautious statement about the potential for this method to be applied to ambient data in future work.
- Figure Readability: Overall, the font size for text within the figures, including labels and legends, is too small and difficult to read. The marker sizes in the legends are also unclear, making it hard to distinguish between different aerosol types. I would recommend that the authors increase the font size and marker size to improve the readability of all figures.
Specific comments:
- Lines 29-30: Please re-check the citation for Moteki and Kondo (2010). This paper does not focus on a field study of rBC. The citation should be removed or replaced with a more relevant reference.
- Line 78: The chemical formula of Iron (IV) should be corrected to Iron (II, III).
- Line 131: The text should be corrected from Ch. 0 to Ch. 1.
- Figure 2: Please correct “Fe203” and “Fe3O4” to “Fe2O3” and “Fe3O4”. Additionally, please correct “Schwartz et. al 2006.”
- Figure 2: In the center upper panels, “Class 1” and “Class 2” are not defined. Could you clarify what these classes represent?
- Line 216: Please correct the spelling of “FeOx”. I would recommend the thorough check of the entire manuscript.
- Figure 4: Why are “z1” and “z3” denoted in x labels and “z2” and “z4” denoted in y labels? It seems that they should be “time” and “signal amplitude.”
- Figure 4: The "Noise" signals are not explicitly defined. It would be helpful to provide a clear definition of "Noise" signal in this context.
- Lines 242–244 and Figure 5: The author state that there is “significant overlap” between FS and FS+glyc. However, the distributions for Ch 1 (z4 vs z3) appear to be different.
- Line 278: It seems that “outlier increases” should be corrected to “outlier decreases.”
- Figure 7: The scatter plots of z4 and z3 in left two panels appear to be almost identical. It would be more efficient to include only one panel. Additionally, please clarify what the dashed lines indicate and confirm that they correspond to the selected outlier markers.
- Figure 7: The signals of Channel 3 are difficult to interpret for unfamiliar reader to interpret. It would be helpful that author to include an example of a normal signal of Channel 3 as well as Channels 0 and 1 showed in Figure 2.
- Lines 281–285: Please provide a physical explanation for why these specific outliers occurred. For example, was the Fe3O4 outlier due to the multiple detection of particles?
Citation: https://doi.org/10.5194/egusphere-2025-3210-RC2 - AC1: 'Reply on RC2', Kara Lamb, 14 Nov 2025
Data sets
Laser-Induced Incandescent Signals for Laboratory Samples of Absorbing Aerosols Detected by the Single Particle Soot Photometer Kara Lamb https://doi.org/10.5281/zenodo.15800436
Interactive computing environment
SP2-Aerosol-Classification Aaryan Doshi and Kara Lamb https://github.com/adoshi25/SP2-Aerosol-Classification
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,652 | 36 | 14 | 1,702 | 32 | 36 |
- HTML: 1,652
- PDF: 36
- XML: 14
- Total: 1,702
- BibTeX: 32
- EndNote: 36
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The paper by Doshi and Lamb introduces an unsupervised machine learning approach to better understand the structure of absorbing aerosols using L-II signals from the SP2. Using a variational autoencoder (VAE) the authors are able to extract a compressed latent feature vector of the L-II signals, and use this for outlier detection and enhanced identification of distinct aerosol populations (even outperforming previous tests using significant feature engineering). The paper is generally well written and I appreciate the conciseness of everything. Before fully recommending the paper for publication, I have a handful of questions/comments I’d like to see addressed surrounding latent feature physical interpretations, the dimensionality reduction methodology, outlier detection approach, and generalizability. Further, several of the figures should be updated to match the specifications set forth by EGU (i.e., enhanced text/label size throughout and improved color choices for visibility) to improve general readability.
General Comments:
Specific Comments: