the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Data-driven modeling of environmental factors influencing Arctic methanesulfonic acid aerosol concentrations
Abstract. Natural aerosol components such as particulate methanesulfonic acid (MSAp) play an important role in the Arctic climate. However, numerical models struggle to reproduce MSAp concentrations and seasonality. Here we present an alternative data-driven methodology for modeling MSAp at four High Arctic stations (Alert, Gruvebadet, Pituffik/Thule, and Utqiaġvik/Barrow). In our approach, we create input features that consider the ambient conditions during atmospheric transport (e.g., temperature, radiation, cloud cover, etc.) for use in two data-driven models: a random forest (RF) regressor and an additive model (AM). The most important features were selected through automatic selection procedures and their relationships with MSAp model output was investigated. Although the overall performance of our data-driven models on test data is modest (max. R2 = 0.29), the models can capture variability in the data well (max. Pearson correlation coefficient = 0.77), outperform the current numerical models and reanalysis products, and produce physically interpretable results.
The data-driven models selected features related to the sources, chemical processing, and removal of MSAp with specific differences between stations. The seasonal cycles and selected features suggest gas-phase oxidation is relatively more important during peak concentration months at Alert, Gruvebadet, and Pituffik/Thule while aqueous-phase oxidation is relatively more important at Utqiaġvik/Barrow. Alert and Pituffik/Thule appear to be more influenced by processes aloft than in the boundary layer. Our models usually selected chemical processing related features as the main factors influencing MSAp predictions, highlighting the importance of properly simulating oxidation related processes in numerical models.
Competing interests: Eliza Harris is an Editor for ACP.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.- Preprint
(2952 KB) - Metadata XML
-
Supplement
(3002 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3379', Anonymous Referee #1, 30 Dec 2024
The authors came up with data-driven models to simulate and predict the methanesulfonic acid in Arctic aerosol based on the observed data from 4 sites, and compared the models with traditional numerical models. I recommend publication after the following issues are addressed.
1. In “Abstract” Sec., line 24 and line 30, input features consider ambient conditions/ sources, chemical processing, and MSA removal. These two kinds of features do not include each other, it's better to make the expression unified.
2. In “Introduction” Sec., lines 58-60, “which is enzymatically cleaved to produce DMS… in the atmosphere”, better to write as “which is enzymatically cleaved to produce DMS, acting as the main source of marine atmosphere/aerosol”.
3. In “Introduction” Sec., lines 68-69, the references should decrease to 2-3 classic or newest papers.
4. In “Introduction” Sec., line 74, a repetition of lines 63-64, also line 277, please make it more concise.
5. In “Introduction” Sec., paragraph 2, the description of the relationship between DMA and MSA in the Arctic marine environment is presented in about 30 rows. Authors should adjust the proportion of this in the introduction, meanwhile, pay more attention to the description of numerical models and data-driven models applied in marine research and comparison of them, especially in the Arctic.
6. In “Results” Sec. 3.2, lines 610-622, this paragraph should move to Methods.
7. In “Results” Sec. 3.3, line 743, how to convince readers of the “high accuracy” of max R2=0.29, may be compared to numerical models, it is better to change an appropriate expression of it. Fig.9 is good evidence, maybe move the expression to Fig. 9 discussing.
8. In “Results” Sec. 3.3, line 746, there is no performance comparison between numerical and data-driven models in Fig. 4, only data-driven models including RF and AM, please confirm.
9. In “Results” Sec. 3.3, line 750, it is confusing to mention Fig. 9 and Fig. 10 here, not to quote the latter figures when discussing the present figure.Citation: https://doi.org/10.5194/egusphere-2024-3379-RC1 -
RC2: 'Comment on egusphere-2024-3379', Anonymous Referee #2, 18 Feb 2025
Pernov et al. presents a data‐driven approach to model particulate methanesulfonic acid (MSAp) concentrations in the Arctic using Random Forest (RF) and Additive Models (AM). The work is interesting because it combines several data sources and uses careful feature engineering to capture different environmental condition. I would suggest the paper to be published after improvement of the following points by the authors to make their discussion deeper:
1. The paper mentions the atmospheric oxidation of dimethyl sulfide (DMS) to form MSAp but does not go into detail about how gas‐phase and aqueous‐phase oxidation work. I would suggest the authors to add a brief discussion on these processes and explain how they change under different Arctic conditions.
2. The paper uses RF and AM to analyze the data, but I would suggest the authors to discuss more clearly the strengths of these methods. For example, RF is useful for capturing non-linear relationships and provides feature importance, while AM offers a simple way to understand how each variable affects MSAp. A clear discussion on these points would help readers see why these methods were chosen.
3. While the authors note the low R² values, I would suggest they discuss this further. They could explain which important variables might be missing and how these missing elements or measurement issues might affect the results. I would also suggest discussing future improvements, like using more advanced methods or combining physical models with data‐driven approaches.
4. The paper explains how the models work but could do more to show how each feature relates to the physical processes in the Arctic. I would suggest the authors to compare the RF feature importance with the partial effects from the AM, so readers can see the real-world significance of the results.
Citation: https://doi.org/10.5194/egusphere-2024-3379-RC2
Data sets
Dataset for "Data-driven modeling of environmental factors influencing Arctic methanesulfonic acid aerosol concentrations" Jakob Boyd Pernov, William H. Aeberhard, Michele Volpi, Eliza Harris, Benjamin Hohermuth, and Julia Schmale https://gitlab.renkulab.io/arcticnap/msamodeling
Model code and software
Code for "Data-driven modeling of environmental factors influencing Arctic methanesulfonic acid aerosol concentrations" Jakob Boyd Pernov, William H. Aeberhard, Michele Volpi, Eliza Harris, Benjamin Hohermuth, and Julia Schmale https://gitlab.renkulab.io/arcticnap/msamodeling
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
207 | 76 | 14 | 297 | 28 | 13 | 15 |
- HTML: 207
- PDF: 76
- XML: 14
- Total: 297
- Supplement: 28
- BibTeX: 13
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 103 | 35 |
Switzerland | 2 | 19 | 6 |
Germany | 3 | 18 | 6 |
United Kingdom | 4 | 17 | 5 |
France | 5 | 15 | 5 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 103