Preprints
https://doi.org/10.5194/egusphere-2025-4697
https://doi.org/10.5194/egusphere-2025-4697
29 Oct 2025
 | 29 Oct 2025

Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration

Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan

Abstract. Calibration of low-cost air quality sensors (LCSs) for total volatile organic compound (TVOC) and benzene, toluene, ethylbenzene, and xylenes (BTEX) quantification remains challenging due to the sensors' cross-sensitivity to temperature and humidity and their tendency to drift over time. In this study, we aimed to improve TVOC and BTEX metal oxide sensor calibration using a two-step colocation strategy. This strategy made it possible to develop the calibration model under environmental conditions closely matching those of the field, which is essential for model transferability from colocation to field conditions. The approach also addressed intra-sensor variability and drift in the harmonization step. In addition to TVOC and BTEX, we applied the two-step colocation process to nitrogen dioxide (NO2) electrochemical sensors to demonstrate the broader applicability of our approach beyond TVOC and BTEX quantification.

Next, we compared the performance of multiple machine learning models, including ridge, lasso, random forest, gradient boosting, extreme gradient boosting, support vector regression, and linear regression, to investigate the optimal model choice for calibration. We found that no single model performed best across all pollutants. For example, gradient boosting excelled at capturing peak TVOC concentrations, while linear regression performed best for BTEX. Conversely, linear regression was the worst-performing model for NO2. Overall, the models showed satisfactory RMSE around 40–50 ppb for TVOC, 1.25–1.75 ppb for BTEX, and 4–6 ppb for NO2. However, all models also overestimated baseline concentrations and underestimated peaks. The severity of this bias depended on the reference concentration distribution, with the most severe peak underestimation occurring in the more heavily skewed TVOC and BTEX data. The systematic bias at baseline and peak concentrations was not evident in the overall mean bias error, which was near zero for all pollutants. This result underscores the need to evaluate model performance across the entire concentration distribution. Finally, we found that calibration performance was sensitive to the choice of training and testing data split. Future research could seek to optimize the training and testing split to ensure robust model transferability to field data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share

Journal article(s) based on this preprint

04 May 2026
Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration
Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan
Atmos. Meas. Tech., 19, 2923–2939, https://doi.org/10.5194/amt-19-2923-2026,https://doi.org/10.5194/amt-19-2923-2026, 2026
Short summary
Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-4697', Anonymous Referee #1, 18 Nov 2025
    • AC1: 'Reply on RC1', Caroline Frischmon, 26 Mar 2026
  • RC2: 'Comment on egusphere-2025-4697', Anonymous Referee #2, 05 Mar 2026
    • AC2: 'Reply on RC2', Caroline Frischmon, 26 Mar 2026

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-4697', Anonymous Referee #1, 18 Nov 2025
    • AC1: 'Reply on RC1', Caroline Frischmon, 26 Mar 2026
  • RC2: 'Comment on egusphere-2025-4697', Anonymous Referee #2, 05 Mar 2026
    • AC2: 'Reply on RC2', Caroline Frischmon, 26 Mar 2026

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Caroline Frischmon on behalf of the Authors (27 Mar 2026)  Author's response   Author's tracked changes   Manuscript 
ED: Referee Nomination & Report Request started (07 Apr 2026) by Albert Presto
RR by Anonymous Referee #1 (07 Apr 2026)
RR by Anonymous Referee #2 (15 Apr 2026)
ED: Publish as is (23 Apr 2026) by Albert Presto
AR by Caroline Frischmon on behalf of the Authors (23 Apr 2026)

Journal article(s) based on this preprint

04 May 2026
Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration
Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan
Atmos. Meas. Tech., 19, 2923–2939, https://doi.org/10.5194/amt-19-2923-2026,https://doi.org/10.5194/amt-19-2923-2026, 2026
Short summary
Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan
Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan

Viewed

Total article views: 1,387 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
828 480 79 1,387 138 109 95
  • HTML: 828
  • PDF: 480
  • XML: 79
  • Total: 1,387
  • Supplement: 138
  • BibTeX: 109
  • EndNote: 95
Views and downloads (calculated since 29 Oct 2025)
Cumulative views and downloads (calculated since 29 Oct 2025)

Viewed (geographical distribution)

Total article views: 1,384 (including HTML, PDF, and XML) Thereof 1,384 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 13 Jun 2026
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
We implemented a two-step colocation strategy to improve the transferability of sensor calibration models to field conditions, particularly for total volatile organic compound (TVOC) and benzene, toluene, ethylbenzene, and xylene (BTEX) sensors. In our comparison of various calibration models, we found that they generally performed well even as they tended to overpredict baseline concentrations and underpredict peaks. This work provides important insights on TVOC and BTEX sensor calibration.
Share