Machine Learning Calibration of Low-Cost Air Quality Gas Sensors
Abstract. Low cost sensors (LCSs) for measuring the concentrations of gaseous pollutants hold great promises for air quality monitoring (AQM) as they can improve the spatio-temporal resolution of observational networks. However, the performance of LCSs is affected by a number of factors including temperature and relative humidity of ambient air, as well as cross-sensitivities with gaseous species other than the target gas, thereby deteriorating the quality of their measurements. To address these issues, data from LCSs can be calibrated against reference instruments using machine learning (ML) algorithms. Here, we have evaluated the performance of a number of ML algorithms for calibrating measurements from CO, NO2, O3 and SO2 LCSs against respective reference measurements. The best model is then used to determine (1) the influence of temporal resolution of the measurements to the calibration performance, (2) the minimum fraction of data needed for model training while maintaining the quality of calibrated measurements within acceptable levels, and (3) the ideal calibration frequency with collocated reference measurements. We found that the quality of LCS measurements improve significantly for all sensors after ML calibration, with Random Forest (RF) being the best performing algorithm, corroborating previous works. By varying the temporal resolution of the training data from 1 h to 2 min, the performance of the RF model in terms of the normalized root mean squared error and the relative expanded uncertainty calculated at maximum observed concentration improves by 11–21 %. The results also suggest that the minimum fraction of data required for training the ML models depends on the frequency of carrying out collocated measurements with reference instruments and using the resulting datasets for training the calibration model. If the calibrations are carried out on a monthly basis, ca. 50 % of the period is needed for collecting data to train the RF algorithm and qualify the LCSs for indicative measurements as defined by the EU directive (2008/50/EC). If the training is carried out every 3 or 6 months by sampling the training data continuously, then ca. 60 % of the measuring period is required for collecting training data. In those cases, if the sampling of the training data is made over specific periods every month, but the entire training dataset is used to calibrate the measurements over 3 or 6 months, the amount of data required for qualifying the LCSs for indicative measurements can significantly reduce to 22 %. However, this would require that the measurements from the LCSs be calibrated retrospectively, which for specific applications is not such of a problem.
The two papers below provide performance metrics relevant to the interpretation described in this paper and show that excellent calibration metrics can be achieved with a physically interpretable model.
A.R. Winter, Y. Zhu, N.G. Asimow, M.Y. Patel, and R.C. Cohen, Sustained Performance of Low-Cost Air Quality Sensors in Long-Term Deployments , ACS Sensors, 10.1021/acssensors.5c00566, 2025.
A.R. Winter, Y. Zhu, N.G. Asimow, M.Y. Patel, and R.C. Cohen, A Scalable Calibration Method for Enhanced Accuracy in Dense Air Quality Monitoring Networks, https://doi.org/10.1021/acs.est.4c08855, ES&T 2025. see Table 4 for a more comprehensive list of others who have worked on the issue of LCS performance metrics.