Data-driven modeling of environmental factors influencing Arctic methanesulfonic acid aerosol concentrations
Abstract. Natural aerosol components such as particulate methanesulfonic acid (MSAp) play an important role in the Arctic climate. However, numerical models struggle to reproduce MSAp concentrations and seasonality. Here we present an alternative data-driven methodology for modeling MSAp at four High Arctic stations (Alert, Gruvebadet, Pituffik/Thule, and Utqiaġvik/Barrow). In our approach, we create input features that consider the ambient conditions during atmospheric transport (e.g., temperature, radiation, cloud cover, etc.) for use in two data-driven models: a random forest (RF) regressor and an additive model (AM). The most important features were selected through automatic selection procedures and their relationships with MSAp model output was investigated. Although the overall performance of our data-driven models on test data is modest (max. R2 = 0.29), the models can capture variability in the data well (max. Pearson correlation coefficient = 0.77), outperform the current numerical models and reanalysis products, and produce physically interpretable results.
The data-driven models selected features related to the sources, chemical processing, and removal of MSAp with specific differences between stations. The seasonal cycles and selected features suggest gas-phase oxidation is relatively more important during peak concentration months at Alert, Gruvebadet, and Pituffik/Thule while aqueous-phase oxidation is relatively more important at Utqiaġvik/Barrow. Alert and Pituffik/Thule appear to be more influenced by processes aloft than in the boundary layer. Our models usually selected chemical processing related features as the main factors influencing MSAp predictions, highlighting the importance of properly simulating oxidation related processes in numerical models.