Recalibration of low-cost air pollution sensors: Is it worth it?

Gäbel, Paul; Hertig, Elke

doi:https://doi.org/10.5194/egusphere-2025-2677

Preprints

https://doi.org/10.5194/egusphere-2025-2677

Preprints

11 Jul 2025

| 11 Jul 2025

Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

Recalibration of low-cost air pollution sensors: Is it worth it?

Paul Gäbel and Elke Hertig

Abstract. The appropriate period of collocation of a low-cost air sensor (LCS) with reference measurements is often unknown. Previous low-cost air sensor studies have shown that due to sensor ageing and seasonality of environmental interferences periodical sensor calibration needs to be performed to guarantee sufficient data quality. While the limitations are well-established it is still unclear how often a recalibration of a sensor needs to be carried out. In this study, we aim to demonstrate how frequently widely used air sensors for the relevant air pollutants O₃ and PM_2.5 by two manufacturers (Alphasense and Sensirion) should be recalibrated. Sensor calibration functions were built using Multiple Linear Regression, Ridge Regression, Random Forest and Extreme Gradient Boosting. We use state-of-the-art test protocols for air sensors provided by the United States Environmental Protection Agency (EPA) and the European Committee for Standardization (CEN) for evaluative guidance. We conducted a yearlong collocation campaign at an urban background air and climate monitoring station next to the University Hospital Augsburg, Germany. LCS were exposed to a wide range of environmental conditions, with air temperatures between -10 and 36 °C, relative air humidity between 19 and 96 % and air pressure between 937 and 983 hPa. The ambient concentration ranges for O₃ and PM_2.5were up to 83 ppb and 153 µg m^-3, respectively. For the baseline single training of 5 months, the calibrated O₃ and PM_2.5 sensors were able to reflect the hourly reference data well during the training (R²: O₃ = 0.92–1.00; PM_2.5= 0.93–0.98) and the following test period (R²: O₃ = 0.93–0.97; PM_2.5= 0.84–0.93). Additionally, the sensor errors were generally acceptable during the training (RMSE: O₃ = 0.80–4.35 ppb; PM_2.5= 1.45–2.51 µg m^-3) and the following test period (RMSE: O₃ = 3.62–5.84 ppb; PM_2.5= 2.04–3.02 µg m^-3). By investigating different recalibration cycles using a pairwise calibration strategy, our results indicate that a regular in-season recalibration is required to obtain the highest quantitative validity for the analysed low-cost air sensors, with monthly recalibrations appearing to be the most suitable approach. In contrast, an extension of the training period for the calibration models had only a minor overall impact on improving the low-cost air sensors’ ability to capture temporal variations in observed O₃ concentrations and PM_2.5concentrations. The measurement uncertainty of the calibrated O₃ LCS and PM_2.5 LCS were able to meet the data quality objective (DQO) for indicative measurements for different calibration models. Compared to one-time pre-deployment sensor calibration, in-season recalibration can broaden the scope of application for a LCS (indicative measurements, objective estimation, non-regulatory supplemental and informational monitoring).

Received: 05 Jun 2025 – Discussion started: 11 Jul 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 4658 KB)

Supplement (7531 KB)

Download & links

Paul Gäbel and Elke Hertig

Status: open (until 07 Nov 2025)

Post a comment Subscribe to comment alert

RC1:
'Comment on egusphere-2025-2677', Laurent Spinelle, 18 Aug 2025 reply

First of all, I would like congratulate the authors for the work carried out and presented in this paper. After having read the full document, I'm not sure that the conclusion or the study really answer the question asked in the title. In fact, the author ask the question of the need of re-calibration of low-cost senors but they do not really answer it in the document as the present an interesting use of sensor for ambient air monitoring ("pairwise calibration strategy") based on a monthly exchange of LCS between a collocation site and a measurement site. This strategy, somehow interesting when looking at the sensors performances is much more time consuming than a classic network installation as, at the end, 2 LCS are always running adding the necessity of installation/removal every month. However, the interesting comparison of calibration results using several training length against both US-EPA and European standards brings a lot of valuable information.
I also made some minor comment along the document listed below:
- Line 153: length of this stabilization phase ?
- Line 155: coma could be removed.
- Line 157: The 3 of O3 should be in subscript.
- Line 165: Are the daily means for LCS based on the hourly values or on the raw values ? The end of this paragraph suggest that the daily means has been calculated using hourly values. Did you check the impact on the data ?
- Line 183: This PM sensor sentence seems to me to be not in the right paragraph as the PM data has been discussed on the previous one.
- Line184-189: This explanation could maybe be moved a after the first paragraph of 2.4 where the use of T and RH in the calibration models is explained. It was somehow confusing to me to read first that the data from the BME280 were not used to then see that they are finally used. Only on a second read I pay attention to the fact that the BME280 data were not used for the gas sensors.
- Table 1: the first row is not the easiest to read, in particular for O3 and NO2 as there is not a clear separation between the T (end of O3) and VNO2 (beginning of NO2).
- Line 218: what do you mean by merging the data by hour ? is it the mean calculation ?
- Line 395: you should mention in the previous paragraph 2.7 Performance metrics and target values that the measurement thus the evaluation has been carried out only for a urban background site whereas the CEN document ask for different testing site, for example a rural site for O3.
- Figure 8, 9, 10 and 11: I would advice the authors to write the title of the different graphs on a clearer way, at a first look, it is not easy to see the difference between each plot.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-2677-RC1
- AC1: 'Reply on RC1', Paul Gäbel, 16 Sep 2025 reply
  
  The comments were uploaded in the form of a supplement.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2677-RC1
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2025-2677-AC1
RC2:
'Comment on egusphere-2025-2677', Anonymous Referee #2, 24 Oct 2025 reply
This manuscript shows different options for calibration of LCS, in particular O3 and PM2.5. The goal is to show a tradeoff between the model accuracy based on an initial training with a dataset (in terms of duration) and recurrent recalibrations.

The discussion is interesting, and it is an open question. Notice that about this topic there are many issues to be considered for this problem, with regard to the initial dataset (in terms of quality, range, duration, sampling frequency, locations for deployments), models used for calibration (statistical ones or based on AI (machine learning, deep learning)), sensor types and features (gas, cross sensitivity, fabrication (Electrochemical, Metal OXide (MOX) sensor, NDIR and/or optical, aging effect) to name a few. Nevertheless, the authors focus on sensors O3 (Alphasense Ox-B431) and PM2.5 (Sensirion AG SPS30) and using 4 different models (MLR, RR, RF, XGB) for calibration.

Next, you have the suggested Comments (C) to improve your manuscript:

C1.- The title should be clearer and more specific including key words such as tradeoff, O3 and PM2.5
C2.-The study is carried out with 2 sensos O3 (Alphasense Ox-B431) and PM2.5 (Sensirion AG SPS30). The selection should be justified and motivated: why these ones? are these the more common, more reliable, price vs quality ratio, etc.? The authors should provide a survey (a study of state of art) about this. This information is very useful for the reader.
In addition, in Section 2.1, the name of the sensors for O3 and PM2.5 and their abbreviations (AS-B431, SAG-SPS30) as well as their features should be placed in a table to ease reading.

C3.- The references are bit confusing. Not sure if it is the proper format and they are correctly compiled (not linked with reference section). For instance, (Gäbel et al., 2022), you cannot find it directly in the reference list. Although in a double lookup you can assume that it refers to a paper in Sensors MDPI from the same authors.
Also, an update of these references is welcome, with more recent ones.

C4.- Figure 1 is a bit confusing. Maybe a flow diagram of the proposal of the manuscript (the tradeoff between training duration and recalibration) should be better.
C5.-In my opinion, the analysis of 2 different deployments (AELCM009 and AELCM010) is interesting, to see the behavior (variability) between the different sensors.
But, the content of this manuscript could improved in a more comprehensive way. It could be carried out by using the whole dataset, and running on this dataset the different variables of the tradeoff: x= duration of initial training, y=recalibration time. Based on (x,y) you can plot the different metrics (R2, RMSE, REU,…) or a cost function (this is mentioned later in C11)) as a heatmap (in 3D plots), in stead of using a fixed training of 5 months, with extended periods of 1 months, and with recalibration with different periods. A heatmap should be easier to understand and see the optimum, rather than Figures 2-4 and 5-7. Notice that these figures are ambiguous and unclear. Also, the caption is bit redundant except 1, 2 or 3 months.
Besides, it should be noted that usually, the datasets have a higher sampling frequency, usually 10 min (or even lower), rather than 1 hour. It should be explained. Even, the sampling frequency could be a new variable to be considered in the tradeoff, instead of 1 hour as default.

C6.- In Section, 2.1, it should be nice to place some pictures of the boxes and deployment, although you refer to them in your own reference ((Gäbel et al., 2022)).

C7.- Section 2.4 requires a better description and detail of the models used. This can be summarized in a table with a short description and reference. Additional information could be interesting such as the library used, hyperparameters used (if needed), is there overfitting in the machine learning models? etc.
In Table 1, the target (in features/target) is not necessary if it is the same name of the model (on each column). Also, it should be recommended for clarity to show only the 2 models that you are using: O3 and PM2.5.
C8.- Abbreviations are repeated many times. As a general rule for abbreviations, define them once and use them always, except in the abstract.
Besides, a glossary at the end of the paper should be interesting.

C9.- In addition to Table 2 (with the stats of the dataset for 1 day), why do not you plot the stats for the whole period (1 year?) and/or plot their value over the time?
Is it correct 36º in Augsburg?
Also, you can also include in Table 2 the same stats for all the features (variables) of your dataset (AEMSxx, Vxx).

C10.- Conclusions are too long. You could simplify them add more relevant conclusions, since it is well known that with these LCS, recalibration is always required.
Besides, both in the abstract and in conclusion, you should highlight your contribution.

C11.- As mentioned before in C5, if you plot heatmap find other suggestions to visualize the results:
Error-vs-time curves: plot RMSE(t) for different recalibration strategies. This shows how quickly accuracy decays and how recalibration recovers it.

Heatmap: x-axis = initial training duration (T₀), y-axis = recalibration interval (days). z = a metrics (RMSE, R2, …). This visually shows regions where short initial training + frequent recalibration ≈ long initial training + infrequent recalibration.

Pareto frontier / cost-accuracy plot: x-axis = operational/calibration cost, y-axis = long-term mean RMSE. Mark strategies on the plot.

Bar chart: number of recalibrations vs mean RMSE for each T₀.

Time-to-failure distributions: for threshold-triggered policies, plot histogram of detection delays.

Uncertainty band plots (error ± CI) to show statistical significance between strategies.

Reply
Citation: https://doi.org/10.5194/egusphere-2025-2677-RC2

Paul Gäbel and Elke Hertig

Supplement

https://doi.org/10.5194/egusphere-2025-2677-supplement

Paul Gäbel and Elke Hertig

Viewed

Total article views: 920 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
750	148	22	920	32	18	23

HTML: 750
PDF: 148
XML: 22
Total: 920
Supplement: 32
BibTeX: 18
EndNote: 23

Views and downloads (calculated since 11 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	103	25	7	135
Aug 2025	158	37	4	199
Sep 2025	415	35	6	456
Oct 2025	74	51	5	130

Cumulative views and downloads (calculated since 11 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	103	25	7	135
Aug 2025	158	37	4	199
Sep 2025	415	35	6	456
Oct 2025	74	51	5	130

Viewed (geographical distribution)

Total article views: 916 (including HTML, PDF, and XML) Thereof 916 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 28 Oct 2025

Short summary

Our study investigated the performance of low-cost air sensors for monitoring ozone and fine particulate matter. Sensors benefit from regular, in-season adjustments – monthly recalibration proved most effective – to deliver reliable data. Using a year-long study and state-of-the art air sensor test protocols as evaluative guidance, we demonstrated the importance of frequent calibration to maximize sensor performance and to broaden their scope of application, particularly for ozone monitoring.


Total:	0
HTML:	0
PDF:	0
XML:	0