Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration

Frischmon, Caroline; Porter, Jack; Balagopalan, Ethan; Senga, William; Johnston, Jill; Hannigan, Michael

doi:10.5194/egusphere-2025-4697

Preprints

https://doi.org/10.5194/egusphere-2025-4697

Preprints

29 Oct 2025

| 29 Oct 2025

Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration

Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan

Abstract. Calibration of low-cost air quality sensors (LCSs) for total volatile organic compound (TVOC) and benzene, toluene, ethylbenzene, and xylenes (BTEX) quantification remains challenging due to the sensors' cross-sensitivity to temperature and humidity and their tendency to drift over time. In this study, we aimed to improve TVOC and BTEX metal oxide sensor calibration using a two-step colocation strategy. This strategy made it possible to develop the calibration model under environmental conditions closely matching those of the field, which is essential for model transferability from colocation to field conditions. The approach also addressed intra-sensor variability and drift in the harmonization step. In addition to TVOC and BTEX, we applied the two-step colocation process to nitrogen dioxide (NO₂) electrochemical sensors to demonstrate the broader applicability of our approach beyond TVOC and BTEX quantification.

Next, we compared the performance of multiple machine learning models, including ridge, lasso, random forest, gradient boosting, extreme gradient boosting, support vector regression, and linear regression, to investigate the optimal model choice for calibration. We found that no single model performed best across all pollutants. For example, gradient boosting excelled at capturing peak TVOC concentrations, while linear regression performed best for BTEX. Conversely, linear regression was the worst-performing model for NO₂. Overall, the models showed satisfactory RMSE around 40–50 ppb for TVOC, 1.25–1.75 ppb for BTEX, and 4–6 ppb for NO₂. However, all models also overestimated baseline concentrations and underestimated peaks. The severity of this bias depended on the reference concentration distribution, with the most severe peak underestimation occurring in the more heavily skewed TVOC and BTEX data. The systematic bias at baseline and peak concentrations was not evident in the overall mean bias error, which was near zero for all pollutants. This result underscores the need to evaluate model performance across the entire concentration distribution. Finally, we found that calibration performance was sensitive to the choice of training and testing data split. Future research could seek to optimize the training and testing split to ensure robust model transferability to field data.

Received: 23 Sep 2025 – Discussion started: 29 Oct 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 4837 KB)

Supplement (353 KB)

Download & links

Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan

Status: open (until 21 Jan 2026)

Post a comment Subscribe to comment alert

RC1:
'Comment on egusphere-2025-4697', Anonymous Referee #1, 18 Nov 2025 reply
Overall, the manuscript is well-organized and clearly presented, with effective illustrations and figures. I would offer the following minor comments for further discussion and consideration:
It would be helpful to explicitly list the MOX sensors and NO₂ sensors used in this study, so readers can quickly identify the instrumentation without searching through the text.

The reviewer is interested in whether BTEXT and tVOC are naturally correlated. If they are, how would changes in the this correlation influence the model performance? Some clarification or analysis on this point would strengthen the interpretation.

Providing a time-series plot showing how sensor readings and reference instrument measurements evolve over time would be necessary. Including an example of how training and testing datasets are selected, especially how temporal blocks are separated, would help demonstrate that autocorrelation issues are addressed and that overfitting is avoided.

Is it feasible to report importance of variables (mainly sensor raw values as input) for each model. For Lasso, coefficients provide direct interpretability, but for other models (e.g., tree-based or ensemble methods), presenting variable-importance metrics would enhance transparency and allow readers to better understand the drivers behind model predictions.

Reply
Citation: https://doi.org/10.5194/egusphere-2025-4697-RC1

Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan

Supplement

https://doi.org/10.5194/egusphere-2025-4697-supplement

Caroline Frischmon, Jack Porter, Ethan Balagopalan, William Senga, Jill Johnston, and Michael Hannigan

Viewed

Total article views: 230 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
166	47	17	230	33	15	15

HTML: 166
PDF: 47
XML: 17
Total: 230
Supplement: 33
BibTeX: 15
EndNote: 15

Views and downloads (calculated since 29 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	65	7	4	76
Nov 2025	76	25	11	112
Dec 2025	25	15	2	42

Cumulative views and downloads (calculated since 29 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	65	7	4	76
Nov 2025	76	25	11	112
Dec 2025	25	15	2	42

Viewed (geographical distribution)

Total article views: 235 (including HTML, PDF, and XML) Thereof 235 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 18 Dec 2025

Short summary

We implemented a two-step colocation strategy to improve the transferability of sensor calibration models to field conditions, particularly for total volatile organic compound (TVOC) and benzene, toluene, ethylbenzene, and xylene (BTEX) sensors. In our comparison of various calibration models, we found that they generally performed well even as they tended to overpredict baseline concentrations and underpredict peaks. This work provides important insights on TVOC and BTEX sensor calibration.


Total:	0
HTML:	0
PDF:	0
XML:	0