the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
UCB-GLOBES: An open-access mass spectral database of identified and unidentified atmospheric organic compounds
Abstract. Chemical characterization of atmospheric organic aerosols using gas chromatography with 70 eV electron ionization mass spectrometry (GC/EI-MS) has been used for decades in advancing molecular marker detection and identification, though primarily through suspect screening and/or targeted analyses. To advance non-targeted analyses of environmental samples, we have catalogued approximately 27,000 mass spectra (MS) of semi-volatile organic aerosol (OA) analytes observed in ambient samples from the U.S. and the Central Amazon and/or laboratory simulations of secondary OA (SOA) formation in the open-access University of California Berkeley Goldstein Library of Organic Biogenic Environmental Spectra (UCB-GLOBES). These samples are representative of OA under urban and biomass burning influences as well as SOA derived from biogenic precursors (e.g., isoprene, monoterpenes, sesquiterpenes) and biomass burning intermediates. MS are documented in UCB-GLOBES without regard to known chemical identity, annotated with extensive metadata such as sample source/experimental conditions, structural information gained from MS analyses, and predicted chemical properties such as average carbon oxidation state and carbon number. UCB-GLOBES MS are compatible for importing into NIST MS Search program, and we have also provided a Jupyter Notebook for MS visualization and comparisons. We demonstrate the utility of UCB-GLOBES through MS reanalyses of prior analytes observed in ambient data, finding a 20 % reduction in the number of analytes assigned to OA source categories reliant solely on time series correlation and an overall 11 % increase in new MS-based OA source categorization for the Southeast U.S. For 1,513 analytes observed previously in the Central Amazon, we found 375 MS matches using UCB-GLOBES vs. 136 MS matches during prior analyses, representing a 14 % gain in newly confirmed or newly categorized OA species. While OA from laboratory oxidation experiments in UCB-GLOBES are highly diverse chemically, on average only 29 % of UCB-GLOBES MS have a mass spectral match to another MS entry in UCB-GLOBES and/or in the NIST MS Database. This indicates that roughly 70 % of UCB-GLOBES MS are unique thus far, not observed more than once among the laboratory oxidation samples and ambient data in UCB-GLOBES MS. Further, only 18 % can be positively identified in the NIST MS database or with known authentic standards. This points to a large gap between these laboratory simulations and ambient OA. Overall, the UCB-GLOBES database can be utilized for improving confidence in OA source categorization and/or identification, novel chemical marker discovery, tracking chemical diversity, de novo structure and properties prediction, and improving MS search and matching algorithms. inform future research priorities for the chemical characterization of atmospheric organic samples.
- Preprint
(1887 KB) - Metadata XML
-
Supplement
(7164 KB) - BibTeX
- EndNote
Status: open (until 13 Mar 2026)
Data sets
ucbglobes2025_v1 Lindsay Yee https://doi.org/10.5281/zenodo.18176760
Interactive computing environment
UCB-GLOBES MS Data Visualization and Comparison Tool_v1 Lindsay Yee https://doi.org/10.5281/zenodo.18177255