Preprints
https://doi.org/10.5194/egusphere-2026-897
https://doi.org/10.5194/egusphere-2026-897
01 Apr 2026
 | 01 Apr 2026

Machine Learning Calibration of Low-Cost Air Quality Gas Sensors

Giannis Ioannidis, Vincent Langat, Roubina Papaconstantinou, Spyros Bezantakos, Prashant Kumar, and George Biskos

Abstract. Low cost sensors (LCSs) for measuring the concentrations of gaseous pollutants hold great promises for air quality monitoring (AQM) as they can improve the spatio-temporal resolution of observational networks. However, the performance of LCSs is affected by a number of factors including temperature and relative humidity of ambient air, as well as cross-sensitivities with gaseous species other than the target gas, thereby deteriorating the quality of their measurements. To address these issues, data from LCSs can be calibrated against reference instruments using machine learning (ML) algorithms. Here, we have evaluated the performance of a number of ML algorithms for calibrating measurements from CO, NO2, O3 and SO2 LCSs against respective reference measurements. The best model is then used to determine (1) the influence of temporal resolution of the measurements to the calibration performance, (2) the minimum fraction of data needed for model training while maintaining the quality of calibrated measurements within acceptable levels, and (3) the ideal calibration frequency with collocated reference measurements. We found that the quality of LCS measurements improve significantly for all sensors after ML calibration, with Random Forest (RF) being the best performing algorithm, corroborating previous works. By varying the temporal resolution of the training data from 1 h to 2 min, the performance of the RF model in terms of the normalized root mean squared error and the relative expanded uncertainty calculated at maximum observed concentration improves by 11–21 %. The results also suggest that the minimum fraction of data required for training the ML models depends on the frequency of carrying out collocated measurements with reference instruments and using the resulting datasets for training the calibration model. If the calibrations are carried out on a monthly basis, ca. 50 % of the period is needed for collecting data to train the RF algorithm and qualify the LCSs for indicative measurements as defined by the EU directive (2008/50/EC). If the training is carried out every 3 or 6 months by sampling the training data continuously, then ca. 60 % of the measuring period is required for collecting training data. In those cases, if the sampling of the training data is made over specific periods every month, but the entire training dataset is used to calibrate the measurements over 3 or 6 months, the amount of data required for qualifying the LCSs for indicative measurements can significantly reduce to 22 %. However, this would require that the measurements from the LCSs be calibrated retrospectively, which for specific applications is not such of a problem.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Giannis Ioannidis, Vincent Langat, Roubina Papaconstantinou, Spyros Bezantakos, Prashant Kumar, and George Biskos

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CC1: 'Comment on egusphere-2026-897', Ronald Cohen, 01 Apr 2026
  • RC1: 'Comment on egusphere-2026-897', Anonymous Referee #1, 21 Apr 2026
  • RC2: 'Comment on egusphere-2026-897', Anonymous Referee #2, 03 May 2026
Giannis Ioannidis, Vincent Langat, Roubina Papaconstantinou, Spyros Bezantakos, Prashant Kumar, and George Biskos

Data sets

Machine Learning Calibration of Low-Cost Air Quality Gas Sensors - Data Giannis Ioannidis https://doi.org/10.5281/zenodo.18629746

Giannis Ioannidis, Vincent Langat, Roubina Papaconstantinou, Spyros Bezantakos, Prashant Kumar, and George Biskos

Viewed

Total article views: 437 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
310 104 23 437 41 14 18
  • HTML: 310
  • PDF: 104
  • XML: 23
  • Total: 437
  • Supplement: 41
  • BibTeX: 14
  • EndNote: 18
Views and downloads (calculated since 01 Apr 2026)
Cumulative views and downloads (calculated since 01 Apr 2026)

Viewed (geographical distribution)

Total article views: 437 (including HTML, PDF, and XML) Thereof 437 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 13 May 2026
Download
Short summary
Low-cost air pollution sensors could greatly expand monitoring, but their readings are often affected by environmental conditions. We studied how to improve their accuracy by comparing them with high-quality instruments and using machine learning methods to correct the data. We found that reliable results can be achieved while reducing the time needed for calibration from about 70 % to 80 % of measurements to about 22 % percent, lowering costs while maintaining data quality.
Share