the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
VaPOrS v1.0.1: An automated model for estimating vapor pressure of organic compounds using SMILES notation
Abstract. Volatile organic compounds play a significant role in atmospheric chemistry, influencing air quality and climate change. Accurate prediction of their physical properties is essential for understanding their behavior. This paper introduces the VaPOrS (Vapor Pressure in Organics via SMILES) as a comprehensive tool designed to process SMILES notation of organic compounds, identify key functional groups, and calculate their saturation vapor pressure and enthalpy of vaporization at any specified temperature. While this first study focuses on applying the SIMPOL method for parameterization, VaPOrS is inherently adaptable to other structure-based parameterization approaches, such as group additivity and volatility basis set (VBS) methods by extracting substructure information from each string that is meaningful to property predictive techniques. It can also be extended to any thermodynamic property that relies on structural group-based parameterizations. In its current version, the tool automates the detection of 30 critical structural groups and has been validated against manually counted functional groups and experimental saturation vapor pressure data for a diverse set of compounds. The results demonstrate high accuracy, with the tool correctly identifying the same functional groups, followed by providing prompt saturation vapor pressure predictions according to the SIMPOL parameterization. The developed method can be integrated into large-scale simulation models targeting secondary aerosol formation and involving thousands of organic species at once. Thus, the developed tool offers a robust computational approach for research in atmospheric chemistry and environmental science, allowing to streamline the analysis of a large collection of organic compounds, aiding in the assessment of their climatic impacts.
- Preprint
(2484 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2564', Simon O'Meara, 15 Jul 2025
-
AC1: 'Reply on RC1', Mojtaba Bezaatpour, 04 Aug 2025
We appreciate the reviewer’s concern regarding the novelty and significance of our contribution, particularly in relation to the existing UManSysProp tool. In our original manuscript, we deliberately chose not to emphasize direct comparisons with established methods in order to remain neutral and objective, aiming to allow users and modelers to evaluate tools based on their specific needs. However, in light of the reviewer’s comment challenging the merits of VaPOrS relative to UManSysPro, we find it necessary to clarify and emphasize the methodological and practical strengths of our approach to defend its validity and utility. We respectfully submit that VaPOrS provides important advancements that directly address known limitations of UManSysPro, particularly in reading molecular structural information from a complex, general SMILES representation with subsequent estimation of condensational parameters that are critical for secondary organic aerosol (SOA) modeling. A detailed explanation is provided in the supplementary file uploaded with this response.
-
RC2: 'Reply on AC1', Simon O'Meara, 12 Aug 2025
Many thanks for your comprehensive reply. I appreciate the time and effort that has been dedicated to both the original paper and the response. The evidence presented in the response does demonstrate the significance of VaPOrS, in contrast to my evaluation in my original review. I will contact the editor to ask whether another review, in light of the response, is allowed/wanted. If permitted, I will suggest the supplied response be included as a major revision to the original paper.
Citation: https://doi.org/10.5194/egusphere-2025-2564-RC2
-
RC2: 'Reply on AC1', Simon O'Meara, 12 Aug 2025
-
AC1: 'Reply on RC1', Mojtaba Bezaatpour, 04 Aug 2025
-
RC3: 'Comment on egusphere-2025-2564', Anonymous Referee #2, 04 Sep 2025
This manuscript introduces VaPOrS v1.0.1, a Python-based tool developed to estimate saturation vapor pressure and enthalpy of vaporization for organic compounds using their SMILES representations. A key feature of VaPOrS is its built-in capability to detect functional groups directly from SMILES strings, eliminating the need for external cheminformatics libraries and manual SMARTS definitions. The tool relies on the SIMPOL group contribution method developed by Pankow and Asher for its vapor pressure predictions and, at the moment is able to recognize only the 30 functional groups needed to apply this method.
The authors validated VaPOrS against the original SIMPOL dataset and demonstrated perfect agreement between the two approaches. Further testing on an external dataset (i.e., MCM database) showed strong correlation with manually derived SIMPOL predictions and other established models such as EVAPORATION and Nanoolal. The methodology is sound, and the tool appears robust and computationally efficient and has potential to be further expanded.
I recommend the manuscript for publication after revisions are made to address the concerns outlined below.
I think the main contribution of this work is the development of the functions to detect functional groups. This is done in a efficient way directly in the tool without relying on external libraries. I know how frustrating can be installing and setting up dependencies between different libraries and tools and I value a self-contained tool that can be adapted to include different methods. So, I think the strongest point of the paper is the SMILES parser and groups identification. The authors present a first implementation of the SIMPOL methods and highlight that the tool can be expanded to include more group contribution methods to predict saturation vapour pressure. However, this feels somewhat restrictive, as the group recognition framework developed in VaPOrS is broadly applicable and could be adapted to predict a wider range of physicochemical properties (e.g., partition coefficients) and not only vapor pressure. I think this point should be stressed more in the manuscript and the tool should be presented as a general tool for SMILES parsing and group contribution method application. Conversely the authors mainly focus on describing the SIMPOL implementation for the prediction of VP. This is an established method developed by other scientists. At a first reading it appears the VaPOrS just apply the SIMPOL method without apporting any contribution, thus I understand the comments of the first reviewer. The main contribution of the paper is the automatization of the fragments recognition in an efficient way and I think this should be stressed more.
Related to this, since the real novelty are the SMILES parsing functions, I think a substantial validation of the group recognition method is missing in the paper. Section 3.4.1 (MCM data) briefly describes as the SMILES parsing functions have been tested on 126 external compounds. I think this should be one of the main sections of the paper demonstrating that the functions are able to correctly recognize the functional groups needed by SIMPOL (or any other group contribution method implemented) on an external dataset. The authors have provided supplementary material in response to a previous reviewer’s comment, comparing their approach with UManSysProp and highlighting cases where UManSysProp fails to correctly identify certain groups, leading to inaccurate predictions. This comparison is highly relevant and should be integrated into the main text to underscore the robustness and reliability of VaPOrS. The authors criticize the SMART pattern recognition in OpenBabel, so a comparison between fragments identified by VaPOrS and fragments identified by OpenBabel should be included to highlight the strength of VaPOrS related to OpenBabel and justify the development of ad-hoc functions in a new method.
Specific comments:
Page 5, line 4, […tools like VaPOrS, enabling…] VaPOrS acronym has not been established yet.
Page 7, lines 19-21 [In particular, the SMILES string must begin…], the authors write MUST BEGIN implying that the SMILES need to be provided in a specific way. A SMILES for a chemical can be written in many different variation (e.g, canonical vs kekulized). To be valid, the tool must be able to recognize functional groups even for different variation of the same SMILES. Given that SMILES syntax can vary depending on generation method or canonicalization, it is important to demonstrate that the tool yields consistent fragment counts regardless of input variation. This would strengthen confidence in the robustness of the group recognition algorithm and its suitability for large-scale automated analyses. The manuscript should also clarify whether VaPOrS includes a SMILES standardization step prior to functional group parsing. Standardization is essential to ensure reproducibility in fragment recognition.
Furthermore, the manuscript should address how VaPOrS handles tautomeric variability in SMILES representations. Tautomers are chemically equivalent but structurally distinct forms of the same chemical that can be encoded differently. This variability can significantly impact functional group recognition and, consequently, the accuracy of property predictions. It is unclear whether the authors have tested the tool's consistency across different tautomeric forms of the same compound. I recommend including a discussion on this issue and, if not already performed, conducting a validation study to assess whether VaPOrS yields consistent fragment counts and predictions across tautomeric variants.
Page 30, lines 14-17 [Many data points are clustered close to this line…], this is subjective comments. A more objective description would consider some metric like the R2 or the RMSE. Please provide some quantitative metrics to describe your correlation.
Page 32, lines 6-10 [These discrepancies could be attributed to the structural complexity…] this paragraph concern the applicability domain of the model. I know the SIMPOL model has not been developed by these authors, but could VaPOrS provide an applicability domain? Maybe something related to the groups count? For instance, does the presence of certain group together of the presence of too many instance of the same fragment result in more uncertain prediction?
Figure 7, Antoine and SIMPOL methods seem to give a good agreement. However, there are instances where the two methods seem far from the experimental line (Decanedioic acid, Hexanamide, Diethyl-peroxide). Please comment.
Figure 11, 3D graphs look cool on computer screen in interactive applications. When on paper are kind of hard to read. For example, I cannot see the depth on one of the axis. I see that the information on the Mass can be interesting, perhaps a 2D correlation between Mass and the groups count would be better for a printed version of the paper. Â Â
Citation: https://doi.org/10.5194/egusphere-2025-2564-RC3 -
EC1: 'Comment on egusphere-2025-2564', Rolf Sander, 16 Oct 2025
I have received another reviewer comment:
 "i) wherever an estimated vapour pressure is provided, the
 corresponding temperature should be stated in the main text and
 Figure/Table caption, for example the Figures in Section 4.2 and
 Tables in Section 4.3 ii) further clarity is needed when vapour pressures from ’SIMPOL’ or
 the ’SIMPOL method’ are provided. If these have been calculated
 manually, this should be stated in each sub-section of results,
 including in Sections 3.1 and 4.3. Or, if these have been obtained
 from the original Pankow and Asher paper (e.g. using the Antoine
 coefficients in their supplement), this should be stated (including
 where in the paper they come from). If there is an alternative source,
 then this should be stated. Otherwise the assumption is that values
 have come directly from the original paper, which could mean a
 different interpretation than for manual generation."As I've already decided to accept your manuscript after minor revisions,
I do not require that you take the new comments into account. However,
if you find them useful, you're very welcome to adjust your manuscript
accordingly.Citation: https://doi.org/10.5194/egusphere-2025-2564-EC1 -
EC2: 'Comment on egusphere-2025-2564', Rolf Sander, 16 Oct 2025
I have received another reviewer comment. Again: As I've already decided
to accept your manuscript after minor revisions, I do not require that
you take the new comments into account. However, if you find them
useful, you're very welcome to adjust your manuscript accordingly."There is a degree of circularity that arises from the same authors
writing the matching patterns for the VaPOrS tool and manually
estimating vapour pressures, when both are based on their interpretation
of the SIMPOL rules. Ideally the VaPOrS estimates would be compared
against independent estimates, e.g. from the original SIMPOL paper,
though unfortunately that paper does not seem to supply the estimates in
tabulated form.But there is some interpretation of the SIMPOL rules needed, and some of
the chemicals presented in Section 4.3 exemplify the point. E.g.
dimethyl-hydroxylamine is strictly a hydroxylamine, not an amine, though
my calculations show that the authors have included the secondary amine
group to achieve the manually estimated (SIMPOL) vapour pressure. And to
be fair, Table 1C of the Pankow and Asher paper lists this molecule as
an amine, suggesting a consistency with the intended meaning of this
group.The same molecule includes a hydroxyl group bonded to a nitrogen atom,
but both Table 5 of the Pankow and Asher paper and Section 2.1.5 of the
paper under review refer to the hydroxyl group to be considered for the
group contribution as an alkyl hydroxyl, which my understanding means
the hydroxyl group bonded to a non-aromatic carbon atom. Arguably
because the bond is to nitrogen rather than carbon the hydroxyl group
should not contribute to the estimated vapour pressure in this case, but
my calculations show that the authors have included it for their manual
SIMPOL estimate.I think the authors need to acknowledge at least these points of
interpretation in the paper, plus any other points where interpretation
has been needed for the SIMPOL rules."Citation: https://doi.org/10.5194/egusphere-2025-2564-EC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 2,093 | 51 | 22 | 2,166 | 20 | 30 |
- HTML: 2,093
- PDF: 51
- XML: 22
- Total: 2,166
- BibTeX: 20
- EndNote: 30
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Bezaatpour et al. present a digital tool (VaPOrS) for estimating pure component saturation vapour pressures, and from this property, the enthalpy of vaporisation. The properties discussed are fundamentally important to understanding aerosols, with significant implications for climate, weather and health. And it is welcome that efforts are being made to further our scientific understanding of this topic. I do hope the authors continue their important work in this area despite my review.
I have a fundamental concern with the submitted paper, which is that it makes an insubstantial contribution to modelling science, making its scientific significance too little to justify publication. Specifically, the referenced UManSysProp tool (Topping et al. 2016) already provides the vapour pressure estimation technique covered by VaPOrS. Then, we ask, do the tools differ significantly in their method to provide these properties? The authors demonstrate in their introduction that there is a variation in method, namely that whilst UManSysProp depends on the OpenBabel package to convert SMILES to SMARTS, which are then parsed, VaPOrS parses the SMILES directly. UManSysProp depends on a self-contained, human-defined, library of SMARTS to identify contributing groups (as described in and around Figure. 3 of Topping et al. 2016), whilst VaPOrS depends on a self-contained, (as far as I understand the paper, human-defined), library of SMILES to identify contributing groups. The Introduction of the paper argues that the VaPOrS method could give better control over pattern-matching logic than is possible in UManSysProp, however I can't see how this is true as both methods rely on a human to provide comprehensive libraries of relevant patterns (SMILES or SMARTS), and so the theoretical maximum degree of control is the same for both methods. Because this issue of insubstantial modelling significance is so important (justifying my rejection for publication) I do not provide further comments on other aspects of the paper at this stage.