Machine Learning Calibration of Low-Cost Air Quality Gas Sensors

Ioannidis, Giannis; Langat, Vincent; Papaconstantinou, Roubina; Bezantakos, Spyros; Kumar, Prashant; Biskos, George

doi:10.5194/egusphere-2026-897

Preprints

https://doi.org/10.5194/egusphere-2026-897

Preprints

01 Apr 2026

| 01 Apr 2026

Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

Machine Learning Calibration of Low-Cost Air Quality Gas Sensors

Giannis Ioannidis, Vincent Langat, Roubina Papaconstantinou, Spyros Bezantakos, Prashant Kumar, and George Biskos

Abstract. Low cost sensors (LCSs) for measuring the concentrations of gaseous pollutants hold great promises for air quality monitoring (AQM) as they can improve the spatio-temporal resolution of observational networks. However, the performance of LCSs is affected by a number of factors including temperature and relative humidity of ambient air, as well as cross-sensitivities with gaseous species other than the target gas, thereby deteriorating the quality of their measurements. To address these issues, data from LCSs can be calibrated against reference instruments using machine learning (ML) algorithms. Here, we have evaluated the performance of a number of ML algorithms for calibrating measurements from CO, NO₂, O₃ and SO₂ LCSs against respective reference measurements. The best model is then used to determine (1) the influence of temporal resolution of the measurements to the calibration performance, (2) the minimum fraction of data needed for model training while maintaining the quality of calibrated measurements within acceptable levels, and (3) the ideal calibration frequency with collocated reference measurements. We found that the quality of LCS measurements improve significantly for all sensors after ML calibration, with Random Forest (RF) being the best performing algorithm, corroborating previous works. By varying the temporal resolution of the training data from 1 h to 2 min, the performance of the RF model in terms of the normalized root mean squared error and the relative expanded uncertainty calculated at maximum observed concentration improves by 11–21 %. The results also suggest that the minimum fraction of data required for training the ML models depends on the frequency of carrying out collocated measurements with reference instruments and using the resulting datasets for training the calibration model. If the calibrations are carried out on a monthly basis, ca. 50 % of the period is needed for collecting data to train the RF algorithm and qualify the LCSs for indicative measurements as defined by the EU directive (2008/50/EC). If the training is carried out every 3 or 6 months by sampling the training data continuously, then ca. 60 % of the measuring period is required for collecting training data. In those cases, if the sampling of the training data is made over specific periods every month, but the entire training dataset is used to calibrate the measurements over 3 or 6 months, the amount of data required for qualifying the LCSs for indicative measurements can significantly reduce to 22 %. However, this would require that the measurements from the LCSs be calibrated retrospectively, which for specific applications is not such of a problem.

Received: 16 Feb 2026 – Discussion started: 01 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1820 KB)

Supplement (319 KB)

Download & links

Giannis Ioannidis, Vincent Langat, Roubina Papaconstantinou, Spyros Bezantakos, Prashant Kumar, and George Biskos

Status: open (until 07 May 2026)

Post a comment Subscribe to comment alert

CC1: 'Comment on egusphere-2026-897', Ronald Cohen, 01 Apr 2026 reply

The two papers below provide performance metrics relevant to the interpretation described in this paper and show that excellent calibration metrics can be achieved with a physically interpretable model.
A.R. Winter, Y. Zhu, N.G. Asimow, M.Y. Patel, and R.C. Cohen, Sustained Performance of Low-Cost Air Quality Sensors in Long-Term Deployments , ACS Sensors, 10.1021/acssensors.5c00566, 2025.
A.R. Winter, Y. Zhu, N.G. Asimow, M.Y. Patel, and R.C. Cohen, A Scalable Calibration Method for Enhanced Accuracy in Dense Air Quality Monitoring Networks, https://doi.org/10.1021/acs.est.4c08855, ES&T 2025. see Table 4 for a more comprehensive list of others who have worked on the issue of LCS performance metrics.

Reply

Citation: https://doi.org/10.5194/egusphere-2026-897-CC1

Giannis Ioannidis, Vincent Langat, Roubina Papaconstantinou, Spyros Bezantakos, Prashant Kumar, and George Biskos

Supplement

https://doi.org/10.5194/egusphere-2026-897-supplement

Data sets

Machine Learning Calibration of Low-Cost Air Quality Gas Sensors - Data Giannis Ioannidis https://doi.org/10.5281/zenodo.18629746

Giannis Ioannidis, Vincent Langat, Roubina Papaconstantinou, Spyros Bezantakos, Prashant Kumar, and George Biskos

Metrics will be available soon.

Latest update: 02 Apr 2026

Short summary

Low-cost air pollution sensors could greatly expand monitoring, but their readings are often affected by environmental conditions. We studied how to improve their accuracy by comparing them with high-quality instruments and using machine learning methods to correct the data. We found that reliable results can be achieved while reducing the time needed for calibration from about 70 % to 80 % of measurements to about 22 % percent, lowering costs while maintaining data quality.