Non-Target Analysis of Atmospheric Organic Aerosols as a Tool to Discriminate Anthropogenic Contribution in Mixed Air Masses during the ACROSS campaign
Abstract. Organic aerosol is a major component in the particle phase of Earth´s atmosphere and has influences on quality of life, health and climate. In this study, a non-target analysis of the chemical composition of atmospheric organic aerosols using liquid chromatography-Orbitrap mass spectrometry (LC-Orbitrap MS) was conducted to differentiate anthropogenic and biogenic sources through unsupervised KMeans clustering. The ACROSS campaign dataset (consisting of 36 wind-characterized samples) identified 4,916 compounds (in the range 50–400 m/z). Due to the location of the sampling site, the samples contain influences from the greater Paris area, as well as biogenic influences from the surrounding forest. K-means clustering, constrained to 2,917 compounds with strong wind-direction correlation, resolved distinct biogenic and anthropogenic clusters. Biogenic aerosols were dominated by CHO compounds (H/C: 1.2–1.7; O/C: 0.15–0.7), consistent with oxidized terpenes, while anthropogenic aerosols featured significant CHOS enrichment (H/C: 1.5–2.2; O/C: 0.2–1.0), including nitrogen-sulfur aromatics (e.g., C10H18NO8S− with nitro/sulfonic groups and aromatic fragments). The approach allows to quantify anthropogenic contribution in mixed air masses, demonstrating higher amounts of anthropogenic compounds ratios during Paris-influenced periods. Results validate wind-driven source apportionment for small sample size non-target studies, providing a transferable method for aerosol characterization.
The manuscript presents an interesting non-target LC-Orbitrap dataset and a potentially useful approach for distinguishing anthropogenic and biogenic molecular fingerprints using reduced dataset number. However, the interpretation would benefit from further clarification of the role of seasonality and meteorological conditions.
Line 61: Should the original ACROSS overview paper be cited here, e.g. Cantrell and Michoud (2022)?
Line 169: Please clarify the basis for classifying air masses as “biogenic.” If this classification is mainly based on air masses not passing over Paris, there may still be possible anthropogenic influence from other urban/industrial areas, for example Lille, Ghent, or Rouen, depending on trajectory direction and residence time.
Line 264: When stating that the results yielded similar findings to the larger dataset of Thoma et al. (2025), please clarify whether this comparison refers specifically to their summer results or to the full dataset. Could you include a comparison in SI?
Line 250: Is there a basis for determining how many clusters were used? The use of two and three clusters shows that the anthropogenic signature was consistent, but was there an original rationale for choosing two and three clusters?
Line 296: The different biogenic signature is attributed to directional dominance, but it is not clear whether this reflects a difference in wind-direction strength only or a difference in back-trajectory origin. Are the dominant wind directions associated with distinct source regions, or are the back-trajectory origins broadly similar? In addition, the ACROSS period included distinct meteorological regimes, including a cleaner period and heatwave periods. Could the dataset be split into the clean period and heatwave period, following the period definitions used by Di Antonio et al. (2025), to test whether these conditions affected the clustering?
Line 393: Please clarify the “others” category shown in black. Does this group represent identified molecular formulas outside the CHO, CHON, and CHOS classifications, such as CHONS or halogenated species, or does it include unidentified peaks? This category appears to contribute substantially in some samples and should be more clearly defined.
Line 374: The wording “calculation” should be reconsidered or softened. If it says calculation, I expect numbers as was done by Thoma et al. (2025). Would it be possible in the future to such estimations with this method, given that there is a lot of auxiliary data from the ACROSS campaign?
Line 381: Were the four samples removed for the reduced clustering test selected randomly, or were they chosen because they represent specific source conditions, wind directions, or concentration regimes?