the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Application of fuzzy c-means clustering for analysis of chemical ionization mass spectra: insights into the gas-phase chemistry of NO3-initiated oxidation of isoprene
Abstract. Oxidation of volatile organic compounds (VOCs) can lead to the formation of secondary organic aerosol, a significant component of atmospheric fine particles, which can affect air quality, human health, and climate change. However, current understanding of the formation mechanism of SOA is still incomplete, which is not only due to the complexity of the chemistry, but also relates to analytical challenges in SOA precursor detection and quantification. Recent instrumental advances, especially the developments of high-resolution time-of-flight chemical ionization mass spectrometry (CIMS), greatly enhanced the capability to detect low- and extremely low-volatility organic molecules (L/ELVOCs). Although detection and characterization of low volatility vapors largely improved our understanding of SOA formation, analyzing and interpreting complex mass spectrometric data remains a challenging task. This necessitates the use of dimension-reduction techniques to simplify mass spectrometric data with the purpose of extracting chemical and kinetic information of the investigated system. Here we present an approach by using fuzzy c-means clustering (FCM) to analyze CIMS data from chamber experiments aiming to investigate the gas-phase chemistry of nitrate radical initiated oxidation of isoprene.
The performance of FCM was evaluated and validated. By applying FCM various oxidation products were classified into different groups according to their chemical and kinetic properties, and the common patterns of their time series were identified, which gave insights into the chemistry of the system investigated. The chemical properties are characterized by elemental ratios and average carbon oxidation state, and the kinetic behaviors are parameterized with generation number and effective rate coefficient (describing the average reactivity of a species) by using the gamma kinetic parameterization model. In addition, the fuzziness of FCM algorithm provides a possibility to separate isomers or different chemical processes species are involved in, which could be useful for mechanism development. Overall FCM is a well applicable technique to simplify complex mass spectrometric data, and the chemical and kinetic properties derived from clustering can be utilized to understand the reaction system of interest.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(2047 KB)
-
Supplement
(2122 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2047 KB) - Metadata XML
-
Supplement
(2122 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1896', Anonymous Referee #1, 16 Nov 2023
Wu and colleagues present the novel application of a data treatment technique (fuzzy c-means clustering) that reduces the complexity of a data set for the interpretation of atmospheric mass spectra. I consider the exploration of novel techniques for the extraction of information from increasingly more complex and information rich mass spectra to be of interest to the atmospheric science community and the readership of AMT. The manuscript strikes a balance between being an introduction to the technique and the demonstration of the technique on an established chemical system. I consider the chosen format and presentation as appropriate for the presentation of a new approach, and can see how this manuscript could serve as a reference for future studies using the technique.
I enjoyed reading this manuscript and would welcome a publication after some minor points have been addressed.
Could you touch briefly on the practical implementation of the algorithm? I do not readily find what software you used. What is the code availability? What are typical run times of the algorithm?
I did not check the correctness of the equations, but encourage the authors to double (or even triple) check the manuscript for typos in the equations before the final publication.
Else, I found few typographical errors only, which will be taken care of in copy-editing, and limit my comments to content observations only.
paragraph 3.1.1: unclear on what data set you "ran the FCM algorithm 50 times", please clarify
paragraph 3.1.3: expand on the accuracy of m*. Is 1.42 and 1.52 significantly different or essentially the same? Generally, what can be considered as different?
Fig.4: I was a little thrown off by the missing x-axis label on the mass profile panels. Can you add a label, and/or label the dominant species in the individual panels? Also, consider changing the time axis label to elapsed time since start of experiment.
184: consider rewording "and only formulas within an accuracy tolerance of 10 ppm and with reasonable chemical meanings were considered." to "and only plausible formulas with relative m/z deviations smaller than 10 ppm were considered"
290: Incomplete sentence. Missing "be"?
300: consider rewording "the right choice" to "may not always be appropriate"
428: "mathematically unsolvable". Probably better to say that there is no simple analytical solution.
467: punishing function? while punishing seems to be used in some literature, it should maybe rather be penalty instead?
493: reword "looks reasonable"
512: drop "As a quick reminder"
699: "mathematically"? Consider making the paragraph (especially lines 699-701) more concise.
736: marker size: I appreciate the attempt to be specific about what the marker size represents. Please be fully specific by referring to the marker area or the diameter. This comment applies to a couple more figures in the main text and SI.
Fig. 11: Units for kSupplementary information
page 2: proposed >the< Kwon index
page 2: punishing function, same as main text
page 4: it's --> it is
Fig S1: axis labels cut off
Fig S2: subscripts inO3, NO2, NO3
Fig S4: same comments as main text figureCitation: https://doi.org/10.5194/egusphere-2023-1896-RC1 - AC2: 'Reply on RC1', Thomas Mentel, 08 Jan 2024
-
RC2: 'Comment on egusphere-2023-1896', Anonymous Referee #2, 17 Nov 2023
Please, find my detailed review report in the attached pdf.
- AC1: 'Reply on RC2', Thomas Mentel, 08 Jan 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1896', Anonymous Referee #1, 16 Nov 2023
Wu and colleagues present the novel application of a data treatment technique (fuzzy c-means clustering) that reduces the complexity of a data set for the interpretation of atmospheric mass spectra. I consider the exploration of novel techniques for the extraction of information from increasingly more complex and information rich mass spectra to be of interest to the atmospheric science community and the readership of AMT. The manuscript strikes a balance between being an introduction to the technique and the demonstration of the technique on an established chemical system. I consider the chosen format and presentation as appropriate for the presentation of a new approach, and can see how this manuscript could serve as a reference for future studies using the technique.
I enjoyed reading this manuscript and would welcome a publication after some minor points have been addressed.
Could you touch briefly on the practical implementation of the algorithm? I do not readily find what software you used. What is the code availability? What are typical run times of the algorithm?
I did not check the correctness of the equations, but encourage the authors to double (or even triple) check the manuscript for typos in the equations before the final publication.
Else, I found few typographical errors only, which will be taken care of in copy-editing, and limit my comments to content observations only.
paragraph 3.1.1: unclear on what data set you "ran the FCM algorithm 50 times", please clarify
paragraph 3.1.3: expand on the accuracy of m*. Is 1.42 and 1.52 significantly different or essentially the same? Generally, what can be considered as different?
Fig.4: I was a little thrown off by the missing x-axis label on the mass profile panels. Can you add a label, and/or label the dominant species in the individual panels? Also, consider changing the time axis label to elapsed time since start of experiment.
184: consider rewording "and only formulas within an accuracy tolerance of 10 ppm and with reasonable chemical meanings were considered." to "and only plausible formulas with relative m/z deviations smaller than 10 ppm were considered"
290: Incomplete sentence. Missing "be"?
300: consider rewording "the right choice" to "may not always be appropriate"
428: "mathematically unsolvable". Probably better to say that there is no simple analytical solution.
467: punishing function? while punishing seems to be used in some literature, it should maybe rather be penalty instead?
493: reword "looks reasonable"
512: drop "As a quick reminder"
699: "mathematically"? Consider making the paragraph (especially lines 699-701) more concise.
736: marker size: I appreciate the attempt to be specific about what the marker size represents. Please be fully specific by referring to the marker area or the diameter. This comment applies to a couple more figures in the main text and SI.
Fig. 11: Units for kSupplementary information
page 2: proposed >the< Kwon index
page 2: punishing function, same as main text
page 4: it's --> it is
Fig S1: axis labels cut off
Fig S2: subscripts inO3, NO2, NO3
Fig S4: same comments as main text figureCitation: https://doi.org/10.5194/egusphere-2023-1896-RC1 - AC2: 'Reply on RC1', Thomas Mentel, 08 Jan 2024
-
RC2: 'Comment on egusphere-2023-1896', Anonymous Referee #2, 17 Nov 2023
Please, find my detailed review report in the attached pdf.
- AC1: 'Reply on RC2', Thomas Mentel, 08 Jan 2024
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
334 | 116 | 24 | 474 | 83 | 17 | 15 |
- HTML: 334
- PDF: 116
- XML: 24
- Total: 474
- Supplement: 83
- BibTeX: 17
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Rongrong Wu
Sören R. Zorn
Sungah Kang
Astrid Kiendler-Scharr
Andreas Wahner
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2047 KB) - Metadata XML
-
Supplement
(2122 KB) - BibTeX
- EndNote
- Final revised paper