the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Automated compound speciation, cluster analysis, and quantification of organic vapours and aerosols using comprehensive two-dimensional gas chromatography and mass spectrometry
Abstract. The advancement of analytical techniques, such as comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC×GC-MS), enables the efficient separation of complex organic matrix. Developing innovative methods for data processing and analysis is crucial to unlock the full potential of GC×GC-MS in understanding intricate chemical mixtures. In this study, we proposed an innovative method for the semi-automated identification and quantification of complex organic mixtures using GC×GC-MS. The method was formulated based on self-constructed mass spectrum patterns and the traversal algorithms and was applied to organic vapor and aerosol samples collected from tailpipe emissions of heavy-duty diesel vehicles and the ambient atmosphere. Thousands of compounds were filtered, speciated, and clustered into 26 categories, including aliphatic and cyclic hydrocarbons, aromatic hydrocarbons, aliphatic oxygenated species, phenols and alkyl-phenols, and heteroatom containing species. The identified species accounted for over 80 % of all the eluted chromatographic peaks at the molecular level. A comprehensive analysis of quantification uncertainty was undertaken. Using representative compounds, quantification uncertainties were found to be less than 37.67 %, 22.54 %, and 12.74 % for alkanes, polycyclic aromatic hydrocarbons (PAHs), and alkyl-substituted benzenes, respectively, across the GC×GC space, excluding the first and the last time intervals. From source apportionment perspective, adamantane was clearly isolated as a potential tracer for heavy-duty diesel vehicles (HDDVs) emission. The systematic distribution of N-containing compounds in oxidized and reduced valences was discussed and many of them served as critical tracers for secondary nitrate formation processes. The results highlighted the benefits of developing self-constructed model for the enhanced peak identification, automated cluster analysis, robust uncertainty estimation, and source apportionment and achieving the full potential of GC×GC-MS in atmospheric chemistry.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(1127 KB)
-
Supplement
(1181 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1127 KB) - Metadata XML
-
Supplement
(1181 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-1671', Yutong Liang, 07 Jul 2024
Review of Xiao He et al.
He et al. described an algorithm to cluster and quantify compounds measured by GC×GC. They applied this algorithm to HDDV emission and ambient PM2.5 samples and achieved promising results. I think this work presents a novel and comprehensive approach for the analysis of complex GC×GC data. However, more details of the algorithms should be provided. Overall, I think this work can be accepted for publication after a minor revision.
Major comments
My main concern is the clustering algorithm is not described in enough detail, which makes it hard for researchers, especially those new to this field, to implement. For example, Lines 172-180 say that functional groups affect the mass spectra of compounds, and indicative reaction schemes were incorporated into the algorithm. S1 in SI described these reaction schemes. However, it is not clear how these rules are applied in the algorithm.
Lines 193-198: These descriptions should be enriched. The authors may want to provide pseudo-code like the one for the quantification algorithm in the box in Figure 1.
Minor comments
Line 56: Should write NIST instead of NIST20 because NIST20 only refers to the 2020 version.
Line 57: Retention Index matching was introduced decades ago, not by Zang et al.
Line 88: What does “retention rate” mean here?
Lines 113-115: The authors may want to write down how many samples were collected (or used in this work).
Line 122: What do you mean by TD samples? Do you mean sorbent tubes? I would suggest renaming it because filter samples also went through thermal desorption. Also, why were the TD samples kept at room temperature? Won’t the analytes evaporate?
Line 145: Should give more description of the mass spectrometer, like resolution and ion source.
Line 155: The authors may want to list the deuterated internal standards in the SI.
Line 159: Would Retention Index be a better chromatographic variable to use in cluster matching?
Lines 227-229: The authors cited Franklin et al. to suggest that the decomposition products may not be very significant. But in that work, Emily derivatized the polar compounds with MSTFA, which helps to protect the thermally labile compounds. I think the comparison with her work here could be misleading. It is better for the authors to delete this comparison.
Line 233: How is this correlation calculated?
Section 3.2 Uncertainty Estimation: I feel this section is only about the uncertainty associated with (semi-)quantification of compounds correctly clustered, not about the uncertainty associated with clustering. The authors may want to clarify.
Line 356: S2?
Line 358: What do you mean by “filtered”?
Line 358: Cooking can also contribute substantially to carboxylic acids in ambient aerosol in urban areas like Shenzhen.
Citation: https://doi.org/10.5194/egusphere-2024-1671-RC1 -
AC1: 'Reply on RC1', Xiao He, 18 Jul 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1671/egusphere-2024-1671-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Xiao He, 18 Jul 2024
-
RC2: 'Comment on egusphere-2024-1671', Anonymous Referee #2, 20 Jul 2024
General comments:
The development of analytical techniques has put forward higher requirements for the identification and processing of complex organic species. Developing innovative data parsing methods important to understand intricate chemical mixtures. The study reported an innovative method for the semi-automated identification and quantification of complex organic mixtures using GC×GC-MS and applied this method to organic vapor and aerosol samples collected from tailpipe emissions of heavy-duty diesel vehicles and the ambient atmosphere. The study is novelty, providing an automated approach for chemical compound speciation and cluster analysis. The manuscript is well organized and written. I recommend it can be accepted after a minor revision
Specific comments:
- Line 88: “Despite a low retention rate”. What is retention rate? Do the authors mean population rate in the whole vehicle fleet?
- Line 105: Section “2.1 Sample collection, treatment, and instrumental analysis”. Detailed sample information was not available in this section. For example, sampling season and the relevant PM2.5 concentration in the atmosphere were unclear. How many diesel vehicles were measured, and their emission levels, engine size, repetition frequency, etc.?
- Line 109-111: “The average temperature in the sampling train was precisely controlled at 47 °C, and airflow, relative humidity, and airflow, relative humidity, and pressure were monitored simultaneously”. “and airflow, relative humidity” were repeated.
- Line 163: Section “2.3 Algorithmic development”. I think the methodology how the authors train, iterate, and optimize the scripts was introduced somewhat roughly, which is important whether this algorithm can be referenced by other studies. For example, how many parameters does the algorithm contain and how many parameters can be optimized, what about their impacts. How many times did the authors conduct training, how effective was the training, and so on.
- Line 286: Section “3.2 Model uncertainty estimation”. The authors have conducted a detail uncertainty analysis on the model estimation. However, I still wonder the differences of the results analyzed by this new approach compared to the traditional one. It would be better if there could be some validation for some species by two different identification methods.
- Line 417: Figure 5. How many samples for the heavy-duty diesel vehicle emissions and the ambient atmosphere and what about the consistency between the samples of the diesel vehicle and ambient samples, respectively?
Citation: https://doi.org/10.5194/egusphere-2024-1671-RC2 -
AC2: 'Reply on RC2', Xiao He, 22 Jul 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1671/egusphere-2024-1671-AC2-supplement.pdf
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-1671', Yutong Liang, 07 Jul 2024
Review of Xiao He et al.
He et al. described an algorithm to cluster and quantify compounds measured by GC×GC. They applied this algorithm to HDDV emission and ambient PM2.5 samples and achieved promising results. I think this work presents a novel and comprehensive approach for the analysis of complex GC×GC data. However, more details of the algorithms should be provided. Overall, I think this work can be accepted for publication after a minor revision.
Major comments
My main concern is the clustering algorithm is not described in enough detail, which makes it hard for researchers, especially those new to this field, to implement. For example, Lines 172-180 say that functional groups affect the mass spectra of compounds, and indicative reaction schemes were incorporated into the algorithm. S1 in SI described these reaction schemes. However, it is not clear how these rules are applied in the algorithm.
Lines 193-198: These descriptions should be enriched. The authors may want to provide pseudo-code like the one for the quantification algorithm in the box in Figure 1.
Minor comments
Line 56: Should write NIST instead of NIST20 because NIST20 only refers to the 2020 version.
Line 57: Retention Index matching was introduced decades ago, not by Zang et al.
Line 88: What does “retention rate” mean here?
Lines 113-115: The authors may want to write down how many samples were collected (or used in this work).
Line 122: What do you mean by TD samples? Do you mean sorbent tubes? I would suggest renaming it because filter samples also went through thermal desorption. Also, why were the TD samples kept at room temperature? Won’t the analytes evaporate?
Line 145: Should give more description of the mass spectrometer, like resolution and ion source.
Line 155: The authors may want to list the deuterated internal standards in the SI.
Line 159: Would Retention Index be a better chromatographic variable to use in cluster matching?
Lines 227-229: The authors cited Franklin et al. to suggest that the decomposition products may not be very significant. But in that work, Emily derivatized the polar compounds with MSTFA, which helps to protect the thermally labile compounds. I think the comparison with her work here could be misleading. It is better for the authors to delete this comparison.
Line 233: How is this correlation calculated?
Section 3.2 Uncertainty Estimation: I feel this section is only about the uncertainty associated with (semi-)quantification of compounds correctly clustered, not about the uncertainty associated with clustering. The authors may want to clarify.
Line 356: S2?
Line 358: What do you mean by “filtered”?
Line 358: Cooking can also contribute substantially to carboxylic acids in ambient aerosol in urban areas like Shenzhen.
Citation: https://doi.org/10.5194/egusphere-2024-1671-RC1 -
AC1: 'Reply on RC1', Xiao He, 18 Jul 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1671/egusphere-2024-1671-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Xiao He, 18 Jul 2024
-
RC2: 'Comment on egusphere-2024-1671', Anonymous Referee #2, 20 Jul 2024
General comments:
The development of analytical techniques has put forward higher requirements for the identification and processing of complex organic species. Developing innovative data parsing methods important to understand intricate chemical mixtures. The study reported an innovative method for the semi-automated identification and quantification of complex organic mixtures using GC×GC-MS and applied this method to organic vapor and aerosol samples collected from tailpipe emissions of heavy-duty diesel vehicles and the ambient atmosphere. The study is novelty, providing an automated approach for chemical compound speciation and cluster analysis. The manuscript is well organized and written. I recommend it can be accepted after a minor revision
Specific comments:
- Line 88: “Despite a low retention rate”. What is retention rate? Do the authors mean population rate in the whole vehicle fleet?
- Line 105: Section “2.1 Sample collection, treatment, and instrumental analysis”. Detailed sample information was not available in this section. For example, sampling season and the relevant PM2.5 concentration in the atmosphere were unclear. How many diesel vehicles were measured, and their emission levels, engine size, repetition frequency, etc.?
- Line 109-111: “The average temperature in the sampling train was precisely controlled at 47 °C, and airflow, relative humidity, and airflow, relative humidity, and pressure were monitored simultaneously”. “and airflow, relative humidity” were repeated.
- Line 163: Section “2.3 Algorithmic development”. I think the methodology how the authors train, iterate, and optimize the scripts was introduced somewhat roughly, which is important whether this algorithm can be referenced by other studies. For example, how many parameters does the algorithm contain and how many parameters can be optimized, what about their impacts. How many times did the authors conduct training, how effective was the training, and so on.
- Line 286: Section “3.2 Model uncertainty estimation”. The authors have conducted a detail uncertainty analysis on the model estimation. However, I still wonder the differences of the results analyzed by this new approach compared to the traditional one. It would be better if there could be some validation for some species by two different identification methods.
- Line 417: Figure 5. How many samples for the heavy-duty diesel vehicle emissions and the ambient atmosphere and what about the consistency between the samples of the diesel vehicle and ambient samples, respectively?
Citation: https://doi.org/10.5194/egusphere-2024-1671-RC2 -
AC2: 'Reply on RC2', Xiao He, 22 Jul 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1671/egusphere-2024-1671-AC2-supplement.pdf
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
317 | 77 | 27 | 421 | 41 | 16 | 16 |
- HTML: 317
- PDF: 77
- XML: 27
- Total: 421
- Supplement: 41
- BibTeX: 16
- EndNote: 16
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Xuan Zheng
Shuwen Guo
Lewei Zeng
Ting Chen
Bohan Yang
Shupei Xiao
Qiongqiong Wang
Zhiyuan Li
Yan You
Shaojun Zhang
Ye Wu
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1127 KB) - Metadata XML
-
Supplement
(1181 KB) - BibTeX
- EndNote
- Final revised paper