the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Differentiation of primary and secondary marine organic aerosol with machine learning
Abstract. Marine primary organic aerosols (POA) are important components of the marine climate system, regulating solar radiation budget and cloud dynamics. Despite their importance, there is a lack of extensive long-term observations of POA properties, introducing great uncertainty in their parameterization in models. The lack of information originated from the complexity of POA chemical composition, very few long-term high-resolution measurements of clean marine air, and the difficulty in performing source apportionment techniques over a long-term period. In this study, we utilize a comprehensive high-resolution time-of-flight aerosol mass spectrometer dataset spanning a decade (2009–2018) and introduce a machine learning approach to differentiate and quantify the contribution of marine POA from marine secondary organic aerosol (SOA). Results indicate that marine POA concentrations peak during summer months and reach lowest levels in winter. On average, marine POA constitutes 51 % (ranging from 21 % to 76 %) of the marine organic aerosol annually and up to 63 % (48 % to 75 %) in summer. With the differentiated POA and SOA, we found diverse impacts of POA and SOA on aerosol hygroscopicity and mixing state. Increase in POA reduces the hygroscopicity and leads to external state of mixing, while the increase in SOA sustains the relatively high hygroscopicity and leads to internal mixing. This study provides observational dataset for marine POA and SOA and their diverse impacts on aerosol hygroscopicity, emphasizing a better appreciation of marine POA and SOA to improve the climate projections.
- Preprint
(1400 KB) - Metadata XML
-
Supplement
(1299 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2025-1415', Anonymous Referee #1, 29 Apr 2025
-
AC1: 'Reply on RC1', Wei Xu, 05 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1415/egusphere-2025-1415-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Wei Xu, 05 Aug 2025
-
RC2: 'Comment on egusphere-2025-1415', Anonymous Referee #2, 28 Jul 2025
-
AC2: 'Reply on RC2', Wei Xu, 05 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1415/egusphere-2025-1415-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Wei Xu, 05 Aug 2025
Status: closed
-
RC1: 'Comment on egusphere-2025-1415', Anonymous Referee #1, 29 Apr 2025
This paper applies a machine learning approach to try to interpret the POA and SOA data that is observed at the Mace Head atmospheric observatory. These have been reported in previous publications, but this uses a different approach to try to analyse and interpret a long-term dataset.
There are advantages to this type of approach over other methods like PMF, in particular computational cost, interpreting dependencies on independent variables and the ability to gap-fill where necessary. However by using a supervised approach, this can be said to be less objective. This is a particular concern because quite a lot of a priori information is brought into the analysis, in particular how POA is defined, where a number of assumptions are made based on previous experience analysing data. So that being the case, it is perhaps not surprising that the model performs so well. Although it should be noted a certain amount of objectivity is brought in through the use of clustering to generate the representative POA. Furthermore, given that this specific site has proved fairly unique in producing observations such as these (owing to its location), it would remain to be seen whether this technique was applicable to other sites.
All that said however, I still found this an interesting and informative paper. The relationships between the POA and SOA and the other variables in particular, but it was also interesting comparing these data products to the outputs of the HTDMA data, which could have implications for marine CCN populations. I do have some concerns, perhaps the biggest being that I didn't find enough technical information accompanied how the models had been set up (see below) but besides that my comments are pretty minor and I recommend publication after these are addressed.Major comments:
The article, as it stands, is severely lacking in the technical details of how the FCM and SVR methods were applied to the data. While it is not necessary to post the code that was used, certain technical details were missing from the article and the supplement. Specific things relating to the FCM I found missing were the distance metric (while Euclidian distance is the default it cannot be inferred), whether there was any pre-treatment to weight or linearise variables, and how the optimum number of factors were determined. Likewise with the SVR, more explanation should be given regarding the specifics of the input data and parameters. More details such as these should be included, in the supplement if necessary.
Minor comments:
L185: While the Monte Carlo bootstrapping is a powerful method of assessing random uncertainties in the data, it does not address the issue of systematic uncertainties, which given that these AMS OA types have been reported from few locations, is potentially substantial. This should be noted.
L224: A note of caution should be added regarding treating nss-SO4 as a secondary marker because while it is indeed formed through secondary timescales, the precursors and formation timescales are different to SOA and the relationship between the two is inconsistent when looking at terrestrial environments. There is a case to be made for them being correlated in the marine boundary layer if they are both assumed to originate from biological activity in the sea surface, but explicit correlation should not necessarily be expected.
Figure S3: I actually found this one of the more interesting aspects of this work, and I would consider moving it to the main article.
Figure S7: The figure caption doesn't currently make sense. Reword.Citation: https://doi.org/10.5194/egusphere-2025-1415-RC1 -
AC1: 'Reply on RC1', Wei Xu, 05 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1415/egusphere-2025-1415-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Wei Xu, 05 Aug 2025
-
RC2: 'Comment on egusphere-2025-1415', Anonymous Referee #2, 28 Jul 2025
-
AC2: 'Reply on RC2', Wei Xu, 05 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1415/egusphere-2025-1415-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Wei Xu, 05 Aug 2025
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
850 | 117 | 24 | 991 | 31 | 19 | 37 |
- HTML: 850
- PDF: 117
- XML: 24
- Total: 991
- Supplement: 31
- BibTeX: 19
- EndNote: 37
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
This paper applies a machine learning approach to try to interpret the POA and SOA data that is observed at the Mace Head atmospheric observatory. These have been reported in previous publications, but this uses a different approach to try to analyse and interpret a long-term dataset.
There are advantages to this type of approach over other methods like PMF, in particular computational cost, interpreting dependencies on independent variables and the ability to gap-fill where necessary. However by using a supervised approach, this can be said to be less objective. This is a particular concern because quite a lot of a priori information is brought into the analysis, in particular how POA is defined, where a number of assumptions are made based on previous experience analysing data. So that being the case, it is perhaps not surprising that the model performs so well. Although it should be noted a certain amount of objectivity is brought in through the use of clustering to generate the representative POA. Furthermore, given that this specific site has proved fairly unique in producing observations such as these (owing to its location), it would remain to be seen whether this technique was applicable to other sites.
All that said however, I still found this an interesting and informative paper. The relationships between the POA and SOA and the other variables in particular, but it was also interesting comparing these data products to the outputs of the HTDMA data, which could have implications for marine CCN populations. I do have some concerns, perhaps the biggest being that I didn't find enough technical information accompanied how the models had been set up (see below) but besides that my comments are pretty minor and I recommend publication after these are addressed.
Major comments:
The article, as it stands, is severely lacking in the technical details of how the FCM and SVR methods were applied to the data. While it is not necessary to post the code that was used, certain technical details were missing from the article and the supplement. Specific things relating to the FCM I found missing were the distance metric (while Euclidian distance is the default it cannot be inferred), whether there was any pre-treatment to weight or linearise variables, and how the optimum number of factors were determined. Likewise with the SVR, more explanation should be given regarding the specifics of the input data and parameters. More details such as these should be included, in the supplement if necessary.
Minor comments:
L185: While the Monte Carlo bootstrapping is a powerful method of assessing random uncertainties in the data, it does not address the issue of systematic uncertainties, which given that these AMS OA types have been reported from few locations, is potentially substantial. This should be noted.
L224: A note of caution should be added regarding treating nss-SO4 as a secondary marker because while it is indeed formed through secondary timescales, the precursors and formation timescales are different to SOA and the relationship between the two is inconsistent when looking at terrestrial environments. There is a case to be made for them being correlated in the marine boundary layer if they are both assumed to originate from biological activity in the sea surface, but explicit correlation should not necessarily be expected.
Figure S3: I actually found this one of the more interesting aspects of this work, and I would consider moving it to the main article.
Figure S7: The figure caption doesn't currently make sense. Reword.