the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
UCB-GLOBES: An open-access mass spectral database of identified and unidentified atmospheric organic compounds
Abstract. Chemical characterization of atmospheric organic aerosols using gas chromatography with 70 eV electron ionization mass spectrometry (GC/EI-MS) has been used for decades in advancing molecular marker detection and identification, though primarily through suspect screening and/or targeted analyses. To advance non-targeted analyses of environmental samples, we have catalogued approximately 27,000 mass spectra (MS) of semi-volatile organic aerosol (OA) analytes observed in ambient samples from the U.S. and the Central Amazon and/or laboratory simulations of secondary OA (SOA) formation in the open-access University of California Berkeley Goldstein Library of Organic Biogenic Environmental Spectra (UCB-GLOBES). These samples are representative of OA under urban and biomass burning influences as well as SOA derived from biogenic precursors (e.g., isoprene, monoterpenes, sesquiterpenes) and biomass burning intermediates. MS are documented in UCB-GLOBES without regard to known chemical identity, annotated with extensive metadata such as sample source/experimental conditions, structural information gained from MS analyses, and predicted chemical properties such as average carbon oxidation state and carbon number. UCB-GLOBES MS are compatible for importing into NIST MS Search program, and we have also provided a Jupyter Notebook for MS visualization and comparisons. We demonstrate the utility of UCB-GLOBES through MS reanalyses of prior analytes observed in ambient data, finding a 20 % reduction in the number of analytes assigned to OA source categories reliant solely on time series correlation and an overall 11 % increase in new MS-based OA source categorization for the Southeast U.S. For 1,513 analytes observed previously in the Central Amazon, we found 375 MS matches using UCB-GLOBES vs. 136 MS matches during prior analyses, representing a 14 % gain in newly confirmed or newly categorized OA species. While OA from laboratory oxidation experiments in UCB-GLOBES are highly diverse chemically, on average only 29 % of UCB-GLOBES MS have a mass spectral match to another MS entry in UCB-GLOBES and/or in the NIST MS Database. This indicates that roughly 70 % of UCB-GLOBES MS are unique thus far, not observed more than once among the laboratory oxidation samples and ambient data in UCB-GLOBES MS. Further, only 18 % can be positively identified in the NIST MS database or with known authentic standards. This points to a large gap between these laboratory simulations and ambient OA. Overall, the UCB-GLOBES database can be utilized for improving confidence in OA source categorization and/or identification, novel chemical marker discovery, tracking chemical diversity, de novo structure and properties prediction, and improving MS search and matching algorithms. inform future research priorities for the chemical characterization of atmospheric organic samples.
- Preprint
(1887 KB) - Metadata XML
-
Supplement
(7164 KB) - BibTeX
- EndNote
Status: open (until 22 Mar 2026)
- RC1: 'Comment on egusphere-2026-116', Anna Feerick, 14 Mar 2026 reply
Data sets
ucbglobes2025_v1 Lindsay Yee https://doi.org/10.5281/zenodo.18176760
Interactive computing environment
UCB-GLOBES MS Data Visualization and Comparison Tool_v1 Lindsay Yee https://doi.org/10.5281/zenodo.18177255
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 225 | 121 | 14 | 360 | 53 | 47 | 57 |
- HTML: 225
- PDF: 121
- XML: 14
- Total: 360
- Supplement: 53
- BibTeX: 47
- EndNote: 57
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General Comments
The preprint is of high quality. The paper addresses relevant scientific questions within the scope of AMT, and, importantly, provides a much-desired resource within the gas chromatography nontarget analysis and atmospheric communities. While the current data in the UCB-GLOBES library is most useful for derivatized compounds separated through GCxGC, the development of a platform that can be contributed to by the greater community and provides an open-source option for semi-volatile organic aerosols will greatly expand the depth to which researchers can explore their environmental sampling sets. The worth of the UCB-GLOBES libraries in retrospective analysis is demonstrated through the re-classification of sourcing for both the GoAmazon and Southeast US SOAS studies. Most importantly, for the study of organic aerosols, it provides a framework for comparing different SOA sources and informing future research directions for SOA characterization. Areas of improvement include adding additional ions to the MS spectra related to TMS derivatization to improve the likelihood of a correct library match and to provide a quantitative measure of similarity between source profiles.
Specific Comments
All current MS spectra in the UCB-GLOBES library are of derivatized compounds. Adding this point to the abstract or the conclusion would improve clarity and understanding of the current use cases for other GC/EI-MS practitioners.
Line 254: Does the simple neighbor comparison find peaks that are considered local maxima in addition to those of true maxima?
Section: Automated metadata entries from MS featurization: MW prediction, base peak, five highest intensity m/z ions
Line 298: How were Retention index tolerances checked?
Line 455-456: The authors state that the copaiba oil spread shown in Figure 6 is similar to the sesquiterpene system in a) and the monoterpene system in c). I agree up until Nc > 20, where I believe the copaiba oil lacks the data points to confidently say it overlaps with monoterpenes. Since “similar spread” seems to be determined by a qualitative rather than a quantitative measure and can leave room for debate, I’d recommend including a quantitative metric for the degree of similarity for these different databases.
Technical corrections:
Table 1, Dataset Description:
Table 2, Description column
Figure 1: A boarder should be present separating the bars of Fire Science Lab and Napa, CA Fires 2017
Figure 3: ISOP and BBOA could use a color change to make them more distinct in black and white
Figure 4: The number overlap with the border on the right side is very difficult to see in black and white. I’d recommend removing the numbers on the right side of each graphic to reduce clutter.
Figure 5: Some of the borders between the matches and unmatched blocks look thicker than others. If possible, I’d recommend unifying the border size. Additionally, there is a light blue block at the top of the 2-methyfuran/OH bar. If this is part of the No Matches, Unknown category, I’d recommend combining the two.
Figure 6: These figures are difficult to parse in black and white, and sub-figures a, c, and d would benefit from more color variations. To improve the clarity of sub-figure a, I’d recommend O3 and NOx be split into dark colors for one and light colors for the other. This would help visualize the similar spreads of OSc and Nc. Lightening the apin+NO3 in sub-figure c could bring a similar clarity. For sub-figure d, darkening 3-me-fur+OH would help it stand out against the FSL_FIREX data. Capitalize the last word in the title of sub-figure d “burn”.
Line 91: “have been born using” feels awkward. One option for alteration is to add “from” between born and using or replace with “have been made using”
Line 106: re-evaluate the use of “/” in this sentence. Are you using “/” as “or”, or is it implying that “known chemical identity” and “MS generated from authentic standards readily available” are the same thing in the previously mentioned resources?
Line 498-500: Clarify the sentence “In contrast…chemical categorization”. I do not understand it. This makes it difficult to understand what the following example is trying to highlight.
Line 167: used a hyphen when writing high-resolution in the following paragraph. Either add a hyphen here or remove one from line 76 and 187
Line 178: remove space between “N-methyl-N-“ and “trimethylsilyl”
Line 227: The current phrasing implies that low polarity peaks were the contaminants. Is this correct? If not, and PFMD was the “additional known contaminants” make “contaminants” in this sentence singular.
Line 238: remove “be able to”
Line 329: May benefit from a paragraph break between “column.” and “Using UCB-GLOBES…”
Line 456: Add c) after the semi-colon to connect the following statement with the correct sub-figure.