the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
VaPOrS v1.0.1: An automated model for estimating vapor pressure of organic compounds using SMILES notation
Abstract. Volatile organic compounds play a significant role in atmospheric chemistry, influencing air quality and climate change. Accurate prediction of their physical properties is essential for understanding their behavior. This paper introduces the VaPOrS (Vapor Pressure in Organics via SMILES) as a comprehensive tool designed to process SMILES notation of organic compounds, identify key functional groups, and calculate their saturation vapor pressure and enthalpy of vaporization at any specified temperature. While this first study focuses on applying the SIMPOL method for parameterization, VaPOrS is inherently adaptable to other structure-based parameterization approaches, such as group additivity and volatility basis set (VBS) methods by extracting substructure information from each string that is meaningful to property predictive techniques. It can also be extended to any thermodynamic property that relies on structural group-based parameterizations. In its current version, the tool automates the detection of 30 critical structural groups and has been validated against manually counted functional groups and experimental saturation vapor pressure data for a diverse set of compounds. The results demonstrate high accuracy, with the tool correctly identifying the same functional groups, followed by providing prompt saturation vapor pressure predictions according to the SIMPOL parameterization. The developed method can be integrated into large-scale simulation models targeting secondary aerosol formation and involving thousands of organic species at once. Thus, the developed tool offers a robust computational approach for research in atmospheric chemistry and environmental science, allowing to streamline the analysis of a large collection of organic compounds, aiding in the assessment of their climatic impacts.
- Preprint
(2484 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2564', Simon O'Meara, 15 Jul 2025
Bezaatpour et al. present a digital tool (VaPOrS) for estimating pure component saturation vapour pressures, and from this property, the enthalpy of vaporisation. The properties discussed are fundamentally important to understanding aerosols, with significant implications for climate, weather and health. And it is welcome that efforts are being made to further our scientific understanding of this topic. I do hope the authors continue their important work in this area despite my review.
I have a fundamental concern with the submitted paper, which is that it makes an insubstantial contribution to modelling science, making its scientific significance too little to justify publication. Specifically, the referenced UManSysProp tool (Topping et al. 2016) already provides the vapour pressure estimation technique covered by VaPOrS. Then, we ask, do the tools differ significantly in their method to provide these properties? The authors demonstrate in their introduction that there is a variation in method, namely that whilst UManSysProp depends on the OpenBabel package to convert SMILES to SMARTS, which are then parsed, VaPOrS parses the SMILES directly. UManSysProp depends on a self-contained, human-defined, library of SMARTS to identify contributing groups (as described in and around Figure. 3 of Topping et al. 2016), whilst VaPOrS depends on a self-contained, (as far as I understand the paper, human-defined), library of SMILES to identify contributing groups. The Introduction of the paper argues that the VaPOrS method could give better control over pattern-matching logic than is possible in UManSysProp, however I can't see how this is true as both methods rely on a human to provide comprehensive libraries of relevant patterns (SMILES or SMARTS), and so the theoretical maximum degree of control is the same for both methods. Because this issue of insubstantial modelling significance is so important (justifying my rejection for publication) I do not provide further comments on other aspects of the paper at this stage.
Citation: https://doi.org/10.5194/egusphere-2025-2564-RC1 -
AC1: 'Reply on RC1', Mojtaba Bezaatpour, 04 Aug 2025
We appreciate the reviewer’s concern regarding the novelty and significance of our contribution, particularly in relation to the existing UManSysProp tool. In our original manuscript, we deliberately chose not to emphasize direct comparisons with established methods in order to remain neutral and objective, aiming to allow users and modelers to evaluate tools based on their specific needs. However, in light of the reviewer’s comment challenging the merits of VaPOrS relative to UManSysPro, we find it necessary to clarify and emphasize the methodological and practical strengths of our approach to defend its validity and utility. We respectfully submit that VaPOrS provides important advancements that directly address known limitations of UManSysPro, particularly in reading molecular structural information from a complex, general SMILES representation with subsequent estimation of condensational parameters that are critical for secondary organic aerosol (SOA) modeling. A detailed explanation is provided in the supplementary file uploaded with this response.
-
RC2: 'Reply on AC1', Simon O'Meara, 12 Aug 2025
Many thanks for your comprehensive reply. I appreciate the time and effort that has been dedicated to both the original paper and the response. The evidence presented in the response does demonstrate the significance of VaPOrS, in contrast to my evaluation in my original review. I will contact the editor to ask whether another review, in light of the response, is allowed/wanted. If permitted, I will suggest the supplied response be included as a major revision to the original paper.
Citation: https://doi.org/10.5194/egusphere-2025-2564-RC2
-
RC2: 'Reply on AC1', Simon O'Meara, 12 Aug 2025
-
AC1: 'Reply on RC1', Mojtaba Bezaatpour, 04 Aug 2025
-
RC3: 'Comment on egusphere-2025-2564', Anonymous Referee #2, 04 Sep 2025
This manuscript introduces VaPOrS v1.0.1, a Python-based tool developed to estimate saturation vapor pressure and enthalpy of vaporization for organic compounds using their SMILES representations. A key feature of VaPOrS is its built-in capability to detect functional groups directly from SMILES strings, eliminating the need for external cheminformatics libraries and manual SMARTS definitions. The tool relies on the SIMPOL group contribution method developed by Pankow and Asher for its vapor pressure predictions and, at the moment is able to recognize only the 30 functional groups needed to apply this method.
The authors validated VaPOrS against the original SIMPOL dataset and demonstrated perfect agreement between the two approaches. Further testing on an external dataset (i.e., MCM database) showed strong correlation with manually derived SIMPOL predictions and other established models such as EVAPORATION and Nanoolal. The methodology is sound, and the tool appears robust and computationally efficient and has potential to be further expanded.
I recommend the manuscript for publication after revisions are made to address the concerns outlined below.
I think the main contribution of this work is the development of the functions to detect functional groups. This is done in a efficient way directly in the tool without relying on external libraries. I know how frustrating can be installing and setting up dependencies between different libraries and tools and I value a self-contained tool that can be adapted to include different methods. So, I think the strongest point of the paper is the SMILES parser and groups identification. The authors present a first implementation of the SIMPOL methods and highlight that the tool can be expanded to include more group contribution methods to predict saturation vapour pressure. However, this feels somewhat restrictive, as the group recognition framework developed in VaPOrS is broadly applicable and could be adapted to predict a wider range of physicochemical properties (e.g., partition coefficients) and not only vapor pressure. I think this point should be stressed more in the manuscript and the tool should be presented as a general tool for SMILES parsing and group contribution method application. Conversely the authors mainly focus on describing the SIMPOL implementation for the prediction of VP. This is an established method developed by other scientists. At a first reading it appears the VaPOrS just apply the SIMPOL method without apporting any contribution, thus I understand the comments of the first reviewer. The main contribution of the paper is the automatization of the fragments recognition in an efficient way and I think this should be stressed more.
Related to this, since the real novelty are the SMILES parsing functions, I think a substantial validation of the group recognition method is missing in the paper. Section 3.4.1 (MCM data) briefly describes as the SMILES parsing functions have been tested on 126 external compounds. I think this should be one of the main sections of the paper demonstrating that the functions are able to correctly recognize the functional groups needed by SIMPOL (or any other group contribution method implemented) on an external dataset. The authors have provided supplementary material in response to a previous reviewer’s comment, comparing their approach with UManSysProp and highlighting cases where UManSysProp fails to correctly identify certain groups, leading to inaccurate predictions. This comparison is highly relevant and should be integrated into the main text to underscore the robustness and reliability of VaPOrS. The authors criticize the SMART pattern recognition in OpenBabel, so a comparison between fragments identified by VaPOrS and fragments identified by OpenBabel should be included to highlight the strength of VaPOrS related to OpenBabel and justify the development of ad-hoc functions in a new method.
Specific comments:
Page 5, line 4, […tools like VaPOrS, enabling…] VaPOrS acronym has not been established yet.
Page 7, lines 19-21 [In particular, the SMILES string must begin…], the authors write MUST BEGIN implying that the SMILES need to be provided in a specific way. A SMILES for a chemical can be written in many different variation (e.g, canonical vs kekulized). To be valid, the tool must be able to recognize functional groups even for different variation of the same SMILES. Given that SMILES syntax can vary depending on generation method or canonicalization, it is important to demonstrate that the tool yields consistent fragment counts regardless of input variation. This would strengthen confidence in the robustness of the group recognition algorithm and its suitability for large-scale automated analyses. The manuscript should also clarify whether VaPOrS includes a SMILES standardization step prior to functional group parsing. Standardization is essential to ensure reproducibility in fragment recognition.
Furthermore, the manuscript should address how VaPOrS handles tautomeric variability in SMILES representations. Tautomers are chemically equivalent but structurally distinct forms of the same chemical that can be encoded differently. This variability can significantly impact functional group recognition and, consequently, the accuracy of property predictions. It is unclear whether the authors have tested the tool's consistency across different tautomeric forms of the same compound. I recommend including a discussion on this issue and, if not already performed, conducting a validation study to assess whether VaPOrS yields consistent fragment counts and predictions across tautomeric variants.
Page 30, lines 14-17 [Many data points are clustered close to this line…], this is subjective comments. A more objective description would consider some metric like the R2 or the RMSE. Please provide some quantitative metrics to describe your correlation.
Page 32, lines 6-10 [These discrepancies could be attributed to the structural complexity…] this paragraph concern the applicability domain of the model. I know the SIMPOL model has not been developed by these authors, but could VaPOrS provide an applicability domain? Maybe something related to the groups count? For instance, does the presence of certain group together of the presence of too many instance of the same fragment result in more uncertain prediction?
Figure 7, Antoine and SIMPOL methods seem to give a good agreement. However, there are instances where the two methods seem far from the experimental line (Decanedioic acid, Hexanamide, Diethyl-peroxide). Please comment.
Figure 11, 3D graphs look cool on computer screen in interactive applications. When on paper are kind of hard to read. For example, I cannot see the depth on one of the axis. I see that the information on the Mass can be interesting, perhaps a 2D correlation between Mass and the groups count would be better for a printed version of the paper. Â Â
Citation: https://doi.org/10.5194/egusphere-2025-2564-RC3
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,111 | 31 | 14 | 1,156 | 16 | 28 |
- HTML: 1,111
- PDF: 31
- XML: 14
- Total: 1,156
- BibTeX: 16
- EndNote: 28
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1