Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js
Preprints
https://doi.org/10.5194/egusphere-2025-1191
https://doi.org/10.5194/egusphere-2025-1191
20 Mar 2025
 | 20 Mar 2025
Status: the discussion of this preprint has been stopped.

Improved vapor pressure predictions using group contribution-assisted graph convolutional neural networks (GC2NN)

Matteo Krüger, Tommaso Galeazzo, Ivan Eremets, Bertil Schmidt, Ulrich Pöschl, Manabu Shiraiwa, and Thomas Berkemeier

Abstract. The vapor pressures (pvap) of organic molecules play a crucial role in the partitioning of secondary organic aerosol (SOA). Given the vast diversity of atmospheric organic compounds, experimentally determining pvap of each compound is unfeasible. Machine Learning (ML) algorithms allow the prediction of physicochemical properties based on complex representations of molecular structure, but their performance crucially depends on the availability of sufficient training data. We propose a novel approach to predict pvap using group contribution-assisted graph convolutional neural networks (GC2NN). The models use molecular descriptors like molar mass alongside molecular graphs containing atom and bond features as representations of molecular structure. Molecular graphs allow the ML model to better infer molecular connectivity compared to methods using other, non-structural embeddings. We achieve best results with an adaptive-depth GC2NN, where the number of evaluated graph layers depends on molecular size. We present two vapor pressure estimation models that achieve strong agreement between predicted and experimentally-determined pvap. The first is a general model with broad scope that is suitable for both organic and inorganic molecules and achieves a mean absolute error (MAE) of 0.67 log-units (R2=0.86). The second model is specialized on organic compounds with functional groups often encountered in atmospheric SOA, achieving an even stronger correlation with the test data (MAE=0.36 log-units, R2=0.97). The adaptive-depth GC2NN models clearly outperform existing methods, including parameterizations and group-contribution methods, demonstrating that graph-based ML techniques are powerful tools for the estimation of physicochemical properties, even when experimental data are scarce.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share
Download
Short summary
This work uses machine learning to predict saturation vapor pressures of...
Share