Preprints
https://doi.org/10.5194/egusphere-2025-4697
https://doi.org/10.5194/egusphere-2025-4697
29 Oct 2025
 | 29 Oct 2025
Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration

Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan

Abstract. Calibration of low-cost air quality sensors (LCSs) for total volatile organic compound (TVOC) and benzene, toluene, ethylbenzene, and xylenes (BTEX) quantification remains challenging due to the sensors' cross-sensitivity to temperature and humidity and their tendency to drift over time. In this study, we aimed to improve TVOC and BTEX metal oxide sensor calibration using a two-step colocation strategy. This strategy made it possible to develop the calibration model under environmental conditions closely matching those of the field, which is essential for model transferability from colocation to field conditions. The approach also addressed intra-sensor variability and drift in the harmonization step. In addition to TVOC and BTEX, we applied the two-step colocation process to nitrogen dioxide (NO2) electrochemical sensors to demonstrate the broader applicability of our approach beyond TVOC and BTEX quantification.

Next, we compared the performance of multiple machine learning models, including ridge, lasso, random forest, gradient boosting, extreme gradient boosting, support vector regression, and linear regression, to investigate the optimal model choice for calibration. We found that no single model performed best across all pollutants. For example, gradient boosting excelled at capturing peak TVOC concentrations, while linear regression performed best for BTEX. Conversely, linear regression was the worst-performing model for NO2. Overall, the models showed satisfactory RMSE around 40–50 ppb for TVOC, 1.25–1.75 ppb for BTEX, and 4–6 ppb for NO2. However, all models also overestimated baseline concentrations and underestimated peaks. The severity of this bias depended on the reference concentration distribution, with the most severe peak underestimation occurring in the more heavily skewed TVOC and BTEX data. The systematic bias at baseline and peak concentrations was not evident in the overall mean bias error, which was near zero for all pollutants. This result underscores the need to evaluate model performance across the entire concentration distribution. Finally, we found that calibration performance was sensitive to the choice of training and testing data split. Future research could seek to optimize the training and testing split to ensure robust model transferability to field data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan

Status: open (until 04 Dec 2025)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan
Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan
Metrics will be available soon.
Latest update: 29 Oct 2025
Download
Short summary
We implemented a two-step colocation strategy to improve the transferability of sensor calibration models to field conditions, particularly for total volatile organic compound (TVOC) and benzene, toluene, ethylbenzene, and xylene (BTEX) sensors. In our comparison of various calibration models, we found that they generally performed well even as they tended to overpredict baseline concentrations and underpredict peaks. This work provides important insights on TVOC and BTEX sensor calibration.
Share