the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Development of Low–Cost Air Quality Stations for Next Generation Monitoring Networks: Calibration and Validation of NO2 and O3 Sensors
Abstract. A Pre–deployment calibration and a field validation of two low-cost (LC) stations equipped with O3 and NO2 metal oxide sensors were addressed. Pre–deployment calibration was performed after developing and implementing a comprehensive calibration framework including several supervised learning models, such as univariate linear and non–linear algorithms, as well as multiple linear and non–linear algorithms. Univariate linear models included linear and robust regression, while univariate non–linear models included support vector machine, random forest, and gradient boosting. Multiple models consisted of both parametric and non-parametric algorithms. Internal temperature, relative humidity and gaseous interference compounds proved to be the most suitable predictors for multiple models, as they helped effectively mitigate the impact of environmental conditions and pollutant cross-sensitivity on sensor accuracy. A feature analysis, implementing Dominance analysis, feature permutations and, SHapley Additive exPlanations method, was also performed to provide further insight into the role played by each individual predictor and its impact on sensor performances. This study demonstrated that while multiple random forest (MRF) returned higher accuracy than multiple linear regression (MLR), it did not accurately represent physical models beyond the Pre–deployment calibration dataset, so that a linear approach may overall be a more suitable solution. Furthermore, as well as being less computationally demanding and generally more suitable for non-experts, parametric models such as MLR have a defined equation that also includes a few parameters, which allows easy adjustments for possible changes over time. Thus, drift correction or periodic automatable recalibration operations can be easily scheduled, which is particularly relevant for NO2 and O3 metal oxide sensors: as demonstrated in this study, they performed well with the same linear model form, but required unique parameter values due to inter-sensor variability.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(9900 KB)
-
Supplement
(4018 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(9900 KB) - Metadata XML
-
Supplement
(4018 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-673', Mark Joseph Campmier, 10 Jul 2023
Overall, this effectively communicates the full pipeline of data collection, calibration, and validation of NO2 and O3 low-cost sensors. The authors have shown a clear commitment to transparent science and carefully applied data science principles while staying relevant to the domain of atmospheric science. Importantly, rather than just fitting a plethora of models, the overall interpretability of features is investigated – including how the relevance of features varies across different meteorological and pollutant loading regimes over the course of a relatively long timeline.
However, this paper suffers from several structural weaknesses. It overstates the novelty of applying SHAP and generally should cite more recent literature throughout. Furthermore, the novelty of exploring feature relevance is somewhat lost in the acronym dense model performance metrics – most of which are already well characterized in literature. After restructuring the paper to more precisely state and describe its novel findings, it will be a useful reference for the community.
Introduction:
Overall, the introduction should be better organized. It would be helpful to first motivate applications of low-cost sensors with a short (1-2 sentences) summary of the relevance of fine-scale spatial-temporal NO2 and O3 patterns before diving into their design. Consider citing work like: doi.org/10.1016/j.envint.2018.04.002, doi.org/10.1136/bmj.n534. Don’t use phrases like “in the last few years” or “nowadays” (line 23) – Be specific on when low-cost sensors emerged and when deployments scaled
The phrasing in Line 45 is confusing. I suggest removing the claim that there are “no established protocols” and instead merely stating there are two common strategies. There are some existing guidelines from relevant government agencies (cfpub.epa.gov/si/si_public_file_download.cfm?p_download_id=517654, as well as publications.jrc.ec.europa.eu/repository/handle/JRC83791).
Please restructure the last two paragraphs of the introduction as to not exaggerate claims of novelty. This would not be the first study to use SHAP for environmental low-cost sensor evaluation as the authors claim at the end of the introduction section: doi.org/10.1016/j.atmosenv.2023.119692, doi.org/10.3390/s20195497, & doi.org/10.1109/SENSORS52175.2022.9967180. Furthermore, although the 3 gaps identified by the authors in the literature are still relevant areas of investigation, please better contextualize them – for example (iii) has been an active area of investigation with many recent publications as referenced earlier regarding SHAP and low-cost sensors.
Materials and Methods:
Clustering analysis does not obviously follow from a correlation analysis – especially if it is expected that many environmental variables will be collinear. Additionally, this paper emphasizes the importance of trying many supervised methods but offers no justification for the unsupervised method – this is especially tricky since K-means is probably not the best method for identifying robust clusters given the expected collinearity.
This section is very acronym and initialism heavy please consider writing out at least some of the less utilized terms. It may also enhance readability to move many of the details describing the exact model instantiations and hyperparameters (2.3.1 & 2.3.2) employed to a table in the appendix or SI – especially since many of the models are “off-the-shelf” from scikit-learn and not developed by the authors.
SI Table 1 would benefit from also including some historical data about pollution concentrations from NO2 and O3.
Results:
I recommend changing the joint plots in Figure 3 from hex-binned heatmaps to the much more intuitive scatterplots.
The purpose of the k-means analysis is still unclear here. These 6 clusters maybe the most robust set k-means could identify, but that does not mean they are a meaningful or interpretable clustering. From Figure 4, there does not seem to be an obvious regime change or large Euclidean distance between clusters. I would recommend removing these results or considering a density-based clustering approach. Similarly, the regression lines in Figure 4 are not obviously interpretable, they seem noisy and less robust than simply relying on correlation matrix. I would suggest promoting the bottom triangles of the two Pearon’s r matrices in SI Fig S2 and removing Figure 4. If Figure 4 stays in the manuscript, it should also use a different colormap, the yellow does not appear clear on my computer screen. I would recommend a categorical colormap such as Hawaii from https://www.fabiocrameri.ch/colourmaps/.
Figures 5 & 6 are very useful for telling the story of your paper. Consider enhancing them by increasing the height of the y-axis or adding jitter to the points to avoid the overlap as it adds to visual clutter. Furthermore, please change the colormap as suggested for Figure 4.
While Taylor plots like those in Figure 8 can be useful, it seems a table would more succinctly get the point across. I’d recommend moving it to the supplement.
Discussion & Conclusions
The first two paragraphs of the Discussions section can be combined and made more concise. They do not communicate the novelty of the study and reflect an overall structural problem with this manuscript – too much emphasis is placed on the individual “off-the-shelf” models rather than the much more interesting implications of feature relevance at differing concentration regimes or the role of model complexity in spatial-temporal transferability. The discussion point on seasonal transferability is quite interesting and I recommend expanding on it. The comparison of DA, PFI, and SHAP is out of place and would be much better in the methods section with references to literature.
The conclusions should include some detail about implications for work outside of Italy, in differing pollution and meteorological regimes.
Consider promoting SI Figure S10 to the main text, it is useful for understanding the discussion points as well as contextualizing the range of pollutant concentrations of this study.
Citation: https://doi.org/10.5194/egusphere-2023-673-RC1 - AC2: 'Reply on RC1', Alice Cavaliere, 06 Aug 2023
-
RC2: 'Comment on egusphere-2023-673', Anonymous Referee #2, 24 Jul 2023
General Comments:
This manuscript details the calibration process of low-cost metal oxide NO2 and O3 sensors. The authors evaluated the performances of several univariate, multivariate, linear, and non-linear calibration models. For these models the authors also analyzed the impact of individual predictors on model performance. The authors reccommended using multiple covariates in multiple regression models and to analyze the importance of the features used. Additionally, that machine learning models can greatly improve accuracy but have a harder time on data outside the calibration dataset.
Specific Comments:
At times the novelty of the approach seems to be overstated. Line 67 and 298 talk about the use of internal temperature as a calibration factor. Off the shelf sensors such as the Clarity Node S (measures NO2 with an electrochemical cell) use RH and internal temperature to adjust their NO2 readings). Additionally, in Line 221 you state that there is no statisical difference between using internal or external temperature. On line 298 you reference Figure 4 to explain why internal temperature was chosen but you do not show the same analysis for external temperature or for NO2.
Section 2.3: Please include more information on the sensor pre-deployment calibration with the HORIBA instruments. It is unclear whether this calibration was conducted indoors or outdoors or the spatial relationship between the AQ stations and the HORIBA instruments. If indoors please explain the lab environment where testing occurred.
Line 196: Do you have any explanation for why more data was withdrawn from AQ2 compared to AQ1?
Figure 8a: Should this legend read "AQ1 O3 MLR" rather than NO2?
While Figure S10 summarizes the NO2 and O3 concentrations across the validation period and Table 6 for the field validation it would also be useful to include a table detailing the historical environmental conditions of both the field validation and calibration period, such as RH and temperature. This could help support the points made in line 340, as when the environmental conditions differ between pre-deployment calibration and the deployment/validation period the MRF model may suffer.
Line 331: Please re-word this sentence as the point is unclear.
Line 339: You mention global impacts of this analysis but provide no other information of how this work extends to beyond Italy.
Citation: https://doi.org/10.5194/egusphere-2023-673-RC2 - AC1: 'Reply on RC2', Alice Cavaliere, 06 Aug 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-673', Mark Joseph Campmier, 10 Jul 2023
Overall, this effectively communicates the full pipeline of data collection, calibration, and validation of NO2 and O3 low-cost sensors. The authors have shown a clear commitment to transparent science and carefully applied data science principles while staying relevant to the domain of atmospheric science. Importantly, rather than just fitting a plethora of models, the overall interpretability of features is investigated – including how the relevance of features varies across different meteorological and pollutant loading regimes over the course of a relatively long timeline.
However, this paper suffers from several structural weaknesses. It overstates the novelty of applying SHAP and generally should cite more recent literature throughout. Furthermore, the novelty of exploring feature relevance is somewhat lost in the acronym dense model performance metrics – most of which are already well characterized in literature. After restructuring the paper to more precisely state and describe its novel findings, it will be a useful reference for the community.
Introduction:
Overall, the introduction should be better organized. It would be helpful to first motivate applications of low-cost sensors with a short (1-2 sentences) summary of the relevance of fine-scale spatial-temporal NO2 and O3 patterns before diving into their design. Consider citing work like: doi.org/10.1016/j.envint.2018.04.002, doi.org/10.1136/bmj.n534. Don’t use phrases like “in the last few years” or “nowadays” (line 23) – Be specific on when low-cost sensors emerged and when deployments scaled
The phrasing in Line 45 is confusing. I suggest removing the claim that there are “no established protocols” and instead merely stating there are two common strategies. There are some existing guidelines from relevant government agencies (cfpub.epa.gov/si/si_public_file_download.cfm?p_download_id=517654, as well as publications.jrc.ec.europa.eu/repository/handle/JRC83791).
Please restructure the last two paragraphs of the introduction as to not exaggerate claims of novelty. This would not be the first study to use SHAP for environmental low-cost sensor evaluation as the authors claim at the end of the introduction section: doi.org/10.1016/j.atmosenv.2023.119692, doi.org/10.3390/s20195497, & doi.org/10.1109/SENSORS52175.2022.9967180. Furthermore, although the 3 gaps identified by the authors in the literature are still relevant areas of investigation, please better contextualize them – for example (iii) has been an active area of investigation with many recent publications as referenced earlier regarding SHAP and low-cost sensors.
Materials and Methods:
Clustering analysis does not obviously follow from a correlation analysis – especially if it is expected that many environmental variables will be collinear. Additionally, this paper emphasizes the importance of trying many supervised methods but offers no justification for the unsupervised method – this is especially tricky since K-means is probably not the best method for identifying robust clusters given the expected collinearity.
This section is very acronym and initialism heavy please consider writing out at least some of the less utilized terms. It may also enhance readability to move many of the details describing the exact model instantiations and hyperparameters (2.3.1 & 2.3.2) employed to a table in the appendix or SI – especially since many of the models are “off-the-shelf” from scikit-learn and not developed by the authors.
SI Table 1 would benefit from also including some historical data about pollution concentrations from NO2 and O3.
Results:
I recommend changing the joint plots in Figure 3 from hex-binned heatmaps to the much more intuitive scatterplots.
The purpose of the k-means analysis is still unclear here. These 6 clusters maybe the most robust set k-means could identify, but that does not mean they are a meaningful or interpretable clustering. From Figure 4, there does not seem to be an obvious regime change or large Euclidean distance between clusters. I would recommend removing these results or considering a density-based clustering approach. Similarly, the regression lines in Figure 4 are not obviously interpretable, they seem noisy and less robust than simply relying on correlation matrix. I would suggest promoting the bottom triangles of the two Pearon’s r matrices in SI Fig S2 and removing Figure 4. If Figure 4 stays in the manuscript, it should also use a different colormap, the yellow does not appear clear on my computer screen. I would recommend a categorical colormap such as Hawaii from https://www.fabiocrameri.ch/colourmaps/.
Figures 5 & 6 are very useful for telling the story of your paper. Consider enhancing them by increasing the height of the y-axis or adding jitter to the points to avoid the overlap as it adds to visual clutter. Furthermore, please change the colormap as suggested for Figure 4.
While Taylor plots like those in Figure 8 can be useful, it seems a table would more succinctly get the point across. I’d recommend moving it to the supplement.
Discussion & Conclusions
The first two paragraphs of the Discussions section can be combined and made more concise. They do not communicate the novelty of the study and reflect an overall structural problem with this manuscript – too much emphasis is placed on the individual “off-the-shelf” models rather than the much more interesting implications of feature relevance at differing concentration regimes or the role of model complexity in spatial-temporal transferability. The discussion point on seasonal transferability is quite interesting and I recommend expanding on it. The comparison of DA, PFI, and SHAP is out of place and would be much better in the methods section with references to literature.
The conclusions should include some detail about implications for work outside of Italy, in differing pollution and meteorological regimes.
Consider promoting SI Figure S10 to the main text, it is useful for understanding the discussion points as well as contextualizing the range of pollutant concentrations of this study.
Citation: https://doi.org/10.5194/egusphere-2023-673-RC1 - AC2: 'Reply on RC1', Alice Cavaliere, 06 Aug 2023
-
RC2: 'Comment on egusphere-2023-673', Anonymous Referee #2, 24 Jul 2023
General Comments:
This manuscript details the calibration process of low-cost metal oxide NO2 and O3 sensors. The authors evaluated the performances of several univariate, multivariate, linear, and non-linear calibration models. For these models the authors also analyzed the impact of individual predictors on model performance. The authors reccommended using multiple covariates in multiple regression models and to analyze the importance of the features used. Additionally, that machine learning models can greatly improve accuracy but have a harder time on data outside the calibration dataset.
Specific Comments:
At times the novelty of the approach seems to be overstated. Line 67 and 298 talk about the use of internal temperature as a calibration factor. Off the shelf sensors such as the Clarity Node S (measures NO2 with an electrochemical cell) use RH and internal temperature to adjust their NO2 readings). Additionally, in Line 221 you state that there is no statisical difference between using internal or external temperature. On line 298 you reference Figure 4 to explain why internal temperature was chosen but you do not show the same analysis for external temperature or for NO2.
Section 2.3: Please include more information on the sensor pre-deployment calibration with the HORIBA instruments. It is unclear whether this calibration was conducted indoors or outdoors or the spatial relationship between the AQ stations and the HORIBA instruments. If indoors please explain the lab environment where testing occurred.
Line 196: Do you have any explanation for why more data was withdrawn from AQ2 compared to AQ1?
Figure 8a: Should this legend read "AQ1 O3 MLR" rather than NO2?
While Figure S10 summarizes the NO2 and O3 concentrations across the validation period and Table 6 for the field validation it would also be useful to include a table detailing the historical environmental conditions of both the field validation and calibration period, such as RH and temperature. This could help support the points made in line 340, as when the environmental conditions differ between pre-deployment calibration and the deployment/validation period the MRF model may suffer.
Line 331: Please re-word this sentence as the point is unclear.
Line 339: You mention global impacts of this analysis but provide no other information of how this work extends to beyond Italy.
Citation: https://doi.org/10.5194/egusphere-2023-673-RC2 - AC1: 'Reply on RC2', Alice Cavaliere, 06 Aug 2023
Peer review completion
Journal article(s) based on this preprint
Data sets
Dataset Alice Cavaliere https://doi.org/10.5281/zenodo.7826791
Model code and software
Jupyter notebook Alice Cavaliere https://doi.org/10.5281/zenodo.7826791
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
366 | 144 | 19 | 529 | 39 | 12 | 12 |
- HTML: 366
- PDF: 144
- XML: 19
- Total: 529
- Supplement: 39
- BibTeX: 12
- EndNote: 12
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
Alice Cavaliere
Lorenzo Brilli
Bianca Patrizia Andreini
Federico Carotenuto
Beniamino Gioli
Tommaso Giordano
Marco Stefanelli
Carolina Vagnoli
Alessandro Zaldei
Giovanni Gualtieri
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(9900 KB) - Metadata XML
-
Supplement
(4018 KB) - BibTeX
- EndNote
- Final revised paper