the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
S2AS v1.0 and 2D polarity–volatility lumping framework v1.0: automated compound classification and scalable lumping for organic aerosol modelling
Abstract. Advancements in near-explicit chemical reaction mechanisms, such as the Master Chemical Mechanism (MCM) or the Generator of Explicit Chemistry and Kinetics of Organics in the Atmosphere (GECKO-A), have enabled highly detailed simulations of atmospheric chemistry. Such simulations offer a bottom-up approach to accompany and inform laboratory chamber experiments of organic aerosol formation or to model the complex chemistry of mixtures of volatile aerosol precursors for specific tropospheric conditions. These chemical reaction mechanisms, while comprehensive, generate hundreds to millions of organic components, creating computational challenges for subsequent applications in multiphase equilibrium gas–particle partitioning models to predict secondary organic aerosol (SOA) mass concentrations, phase compositions, and hygroscopicity. The wealth of simulated reactions and components also requires substantial simplifications for reduced-complexity representations in large-scale atmospheric models. This study introduces a suite of software tools to automate relevant pure-component property predictions as well as a 2-dimensional (2D) polarity–volatility lumping framework to systematically reduce the complexity of chemical mechanism outputs. We introduce a new polarity metric for use in the 2D framework, a ratio of a component's activity coefficients in water and an organic solvent (hexanediol). This ratio is computed using the Aerosol Inorganic–Organic Mixtures Functional groups Activity Coefficients (AIOMFAC) model. The 2D framework offers grid-based and cluster-based methods to select an adjustable number of surrogate species and offers flexibility in the choice of polarity axis. Our methods utilize the Simplified Molecular Input Line Entry System (SMILES) description of molecular structures. A new tool, SMILES to AIOMFAC subgroups (S2AS), is introduced to automatically generate AIOMFAC-model input files and to handle exception cases consistently. We demonstrate the application of our framework using systems of hundreds to thousands of components generated by near-explicit chemical mechanisms. The new framework enables tailored reduced-complexity representations of gas–particle systems.
- Preprint
(10131 KB) - Metadata XML
-
Supplement
(1065 KB) - BibTeX
- EndNote
Status: open (until 27 Mar 2026)
- RC1: 'Comment on egusphere-2025-4673', Anonymous Referee #1, 11 Mar 2026 reply
-
CEC1: 'Comment on egusphere-2025-4673 - No compliance with the policy of the journal', Juan Antonio Añel, 11 Mar 2026
reply
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived the code necessary to perform your study mostly on GitHub, and also in some other sites that you cite. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Most of the sites that you list do not fulfil GMD’s requirements for a persistent data archive because:
- They do not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist).
- They do not appear to have a published mechanism for preventing authors from unilaterally removing material. Archives must have a policy which makes removal of materials only possible in exceptional circumstances and subject to an independent curatorial decision,
- They do not appear to issue a persistent identifier such as a DOI or Handle for each precise dataset.If we have missed a published policy which does in fact address this matter satisfactorily, please post a response linking to it. If you have any questions about this issue, please post them in a reply.
The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. Please, therefore, publish your code and data in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible. We cannot have manuscripts under discussion that do not comply with our policy.
The 'Code and Data Availability’ section must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-4673-CEC1 -
AC1: 'Reply on CEC1', Andreas Zuend, 11 Mar 2026
reply
Thank you for pointing out the non-compliance of our original code and data availability section with the GMD journal's Code and Data Policy.
Our intent was to add the Zenodo-archived versions of the introduced models as part of the final article version. However, we now understand that GMD requires such archived versions already with the preprint. Accordingly, we have revised the manuscript text und references and provide now Zenodo archives and pertaining links for the S2AS model as well as the 2D polarity–volatility framework, both referring to the exact model versions used to generated the data and findings in the manuscript. Zenodo archives for the exact UManSysProp code used as well as the data underlying all the figures and tables were already part of the original preprint submission. We further clarify, with added citations, that the Master Chemical Mechanism and related AtChem model are third-party software.The revised code and data availability section reads as follows:
Code and data availability. The current Python code of the S2AS model and related documentation are available via an online code repository (https://github.com/andizuend/S2AS__SMILES_to_AIOMFAC) under the GNU General Public License v3.0. The exact version of the S2AS model (v1.0) applied to produce the results used in this article is archived on Zenodo under https://doi.org/10.5281/zenodo.18968164(Amaladhasan and Zuend, 2026b). The current Fortran code of the 2D polarity–volatility framework as well as an associated plotting program and documentation are available via an online repository (https://github.com/andizuend/2D_Polarity_Volatility_lumping) under the GNU General Public License v3.0. The exact version of this framework (v1.0) applied to produce the results used in this article is archived on Zenodo under https://doi.org/10.5281/zenodo.18968224 (Amaladhasan and Zuend, 2026a). The UManSysProp code (v1.0) by Topping et al. (2016a) is available via an online code repository (https://github.com/loftytopping/UManSysProp_public; last access: 18 September 2025). The specific version of the used UManSysProp code, including the adaptations for temperature-dependent pure-component vapour pressure parameterizations used in this work, is archived on Zenodo under https://doi.org/10.5281/zenodo.17172675 (Zuend et al., 2025). The Master Chemical Mechanism (v3.3.1) (Jenkin et al., 1997; Saunders et al., 2003; Jenkin et al., 2003) and the related AtChem online box model are available online via https://mcm.york.ac.uk/MCM/ (last access: 18 September 2025). Predicted SOA mass concentrations and hygroscopicity parameters for various surrogate methods and polarity metrics used in this article are summarized in the electronic Supplement. The data underlying the shown figures and tables, as well as related output from the property prediction tools and the 2D lumping framework, are archived on Zenodo under https://doi.org/10.5281/zenodo.17088391 (Amaladhasan et al., 2025).
The manuscript's list of references now includes the following entries (the first two are new additions):
- Amaladhasan, D. A. and Zuend, A.: 2D polarity–volatility lumping framework, https://doi.org/10.5281/zenodo.18968224, 2026a.
- Amaladhasan, D. A. and Zuend, A.: SMILES to AIOMFAC subgroups (S2AS) tool, https://doi.org/10.5281/zenodo.18968164, 2026b.
- Amaladhasan, D. A., Zuend, A., and Hassan-Barthaux, D.: Alpha-pinene and Toluene SOA System data used in Amaladhasan et al for 2D lumping, https://doi.org/10.5281/zenodo.17088391, data set, 2025.
- Zuend, A., Hassan-Barthaux, D., and Amaladhasan, D. A.: SMILES_to_sat_vapour_pressure, https://doi.org/10.5281/zenodo.17172675, 2025.
Citation: https://doi.org/10.5194/egusphere-2025-4673-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 12 Mar 2026
reply
Dear authors,
Many thanks for addressing this matter so quickly. With the new Code and Data Availability section we can consider the current version of your manuscript in compliance with the policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-4673-CEC2
-
AC1: 'Reply on CEC1', Andreas Zuend, 11 Mar 2026
reply
-
RC2: 'Comment on egusphere-2025-4673', Anonymous Referee #2, 17 Mar 2026
reply
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-4673/egusphere-2025-4673-RC2-supplement.pdf
Data sets
SMILES_to_sat_vapour_pressure Andreas Zuend, Dalrin Ampritta Amaladhasan, and Dan Hassan-Barthaux https://doi.org/10.5281/zenodo.17172675
Alpha-pinene and Toluene SOA System data used in Amaladhasan et al for 2D lumping Dalrin Ampritta Amaladhasan, Dan Hassan-Barthaux, and Andreas Zuend https://doi.org/10.5281/zenodo.17088391
Model code and software
S2AS SMILES to AIOMFAC code repository Andreas Zuend, Dalrin Ampritta Amaladhasan, and Dan Hassan-Barthaux https://github.com/andizuend/S2AS__SMILES_to_AIOMFAC
2D polarity–volatility framework repository Andreas Zuend, Dalrin Ampritta Amaladhasan, and Dan Hassan-Barthaux https://github.com/andizuend/2D_Polarity_Volatility_lumping
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 193 | 126 | 33 | 352 | 34 | 16 | 18 |
- HTML: 193
- PDF: 126
- XML: 33
- Total: 352
- Supplement: 34
- BibTeX: 16
- EndNote: 18
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General Comments
This paper is a foundational work for developing tools to benefit organic aerosol numerical modelling. It is a very well written manuscript which walks the line nicely between providing enough detail to understand the tools developed but not so much that the paper becomes a technical document. The description also discusses prior work and how they built upon it. The tools developed will be helpful to the community in developing simplified SOA mechanisms for a whole suite of precursors. I recommend the manuscript be published with minimal technical changes outlined below.
Specific Comments
- The paper is a little redundant in spots and thus could be shortened (e.g., line 362).
- It would be helpful to provide the reader with an approximate conversion from saturation vapor pressures (in Pa) to the C* variable (in ug/m3) that is common in literature for VBS. I know this depends on molecular weight. I assumed a 200 g/mol molecular weight and used gas law to create an equivalent scale to help me interpret the figures. Maybe the authors could consider labeling an upper x-axis with C* assuming an average molecular weight.
- The clustering algorithm seems like a novel approach with solid results. Even the simplified topologies give reasonable errors.
- I have one recommendation. I am wondering if the authors could create and additional table with results from a 1x3 parameter space. Where the 3 separates mass between water soluble, partially water soluble and water insoluble. Can the authors calculate the fraction of product mass and average ACR in these 3 bins for the two precursors (toluene, a-pinene)? Or if the readers feel more appropriate using 4 polarity bins to better resolve partial solubility space. I think this info would be very helpful to constrain chemical transport models. Most chemical transport models only resolve the 1D volatility space. Or maybe the mass concentration data from Figures 8b and 10b can be provided in tables so that readers could manipulate data to their model needs.
Technical Correction
- Please check the grammar on line 351 after words "activity coefficients ..."