the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evaluation of Calibration Performance of a Low-cost Particulate Matter Sensor Using Colocated and Distant NO2
Abstract. Low-cost optical particle sensors have the potential to supplement existing particulate matter (PM) monitoring systems to provide high spatial and temporal resolution. However, low-cost PM sensors have often shown questionable performance under various ambient conditions. Temperature, relative humidity (RH), and particle composition have been identified as factors that directly affect the performance of low-cost PM sensors. This study investigated if NO2, which creates PM2.5 by chemical reactions in the atmosphere, can be used to improve the calibration performance of low-cost PM2.5 sensors. To this end, we evaluated the PurpleAir PA-II, called PA-II, a popular air monitoring system that utilizes two low-cost PM sensors that is frequently deployed near air quality monitoring sites of the Environmental Protection Agency (EPA). We selected a single location where 14 PA-II units have operated for more than two years since July 2017. Based on the operating periods of the PA-II units, we then chose the period of Jan. 2018 to Dec. 2019 for study. Among the 14 units, a single unit containing more than 23 months of measurement data with a high correlation between the unit's two PMS sensors was selected for analysis. Daily and hourly PM2.5 measurement data from the PA-II unit and a BAM 1020 instrument, respectively, were compared using the federal reference method (FRM), and a per-month analysis was conducted against the BAM-1020 using hourly PM2.5 data. In the per-month analysis, three key features, temperature, relative humidity (RH), and NO2, were considered. The NO2, called colocated NO2, was collected from the reliable instrument colocated with the PA-II unit. The per-month analysis showed the PA-II unit had a good correlation (coefficient of determination, R2 > 0.819) with the BAM-1020 during the months of Nov., Dec., and Jan. in both 2018 and 2019, but their correlation intensity was moderate during other months, such as July and Sep. 2018, and Aug., Sep., and Oct. 2019. NO2 was shown to be a key factor in increasing the value of R2 in the months when moderate correlation based on only PM2.5 was achieved. This study calibrated a PA-II unit using multiple linear regression (MLR) and random forest (RF) methods based on the same three features used in the analysis studies as well as their multiplicative terms. The addition of NO2 had a much larger effect than that of RH when both PM2.5 and temperature were considered for calibration in both models. When NO2, temperature, and relative humidity were considered, the MLR method achieved similar calibration performance to the RF method. Since it is practically infeasible to colocate a reliable NO2 instrument colocation with high accuracy at low-cost PM sensors, we investigated the effectiveness of using NO2 data (which we call distant NO2), collected from monitoring sites deployed at locations far from the considered low-cost PM sensor for calibration performance enhancement. It was shown that the use of distant NO2 enhances the calibration performance compared to calibration without NO2 when it is highly correlated with colocated NO2. Overall, PA-II units have good agreement with PM2.5 monitoring systems of high quality. Moreover, the calibration performance can be improved by using machine learning algorithms and by considering temperature, RH, and especially NO2.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(730 KB)
-
Supplement
(575 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(730 KB) - Metadata XML
-
Supplement
(575 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1344', Anonymous Referee #1, 15 Aug 2023
This manuscript presents a novel approach to improve calibration of low-cost PM2.5 sensors using reference NO2 measurements from distant reference monitors (<20km). Publically available US EPA and PurpleAir PA-II outdoor sensor data were used in this study. The authors first located and quality controlled several PA-II sensors located near a reference monitor site (<100m). After sucessfully locating a sensor-monitor pair in Rubidoux, CA the authors made daily comparisons between the sensor and FRM instrument and hourly comparisons between the sensor and a BAM instrument. They concluded that the BAM instrument was sufficiently accurate to make the hourly comparisons. Over a two year study period they observed a lower correlation in summer months compared to the winter and that temperature and relative humidity had less of an impact in winter compared to summer. They tested two calibration methods, Multiple Linear Regression and Random Forest, considering additive and multiplicative terms including PM2.5, temperature, relative humidity, and NO2. The inclusion of NO2 in these methods did result in improved sensor performance, even when distant NO2 was included. While this study did test many variables/cominations of variables on sensor performance the results are only representative of a single PA-II sensor. Additionally, a whole year was used to train the calibration models. These two factors ultimately limit the applicability of their conclusions without major revisions.
Major Comments:
Section 3.1: 14 sensors were originally idenitfied in this study, however only 5 were selected based on their months of valid measurements data. Of these 5, 2 were explicitly eliminated based on correlation analysis between the sensors' and their A and B units. Based on Figure 1 it seems like both PA-II 7 and 8 would be suitable for this study while PA-II 2, 3, 5, & 6 were not (Sensor 5 included in Figure 1 but not on line 188). Your final results will be more applicable if you are able to demonstrate improvements in more than 1 sensor, even if the study period is less than 2 years.Â
Line 298: What is the reasoning behind this 1:1 data split, specifically using the whole year of 2018 to train the models and apply to 2019. This implies that in practice you have to wait a whole year before collecting valid/corrected data with this method which hinders the use of low-cost sensors. And assuming minimal sensor drift from 2018 to 2019 and similar environmental conditions.
Minor Comments:
Figure 1: Please include info about PA sensors A and B in the caption as you did on line 193.
Figure 2: Include a 1:1 line for comparison.
Figure 3: Ensure x-axes are the same for the PM2.5 graph and temperature+RH graph.
Figure sizes could be increased to improve readability.Â
Line 36: Please clarify that FRM and FEM are US EPA designations and may not be applicable to every county.Â
Line 61: "good a correlation" Please correct to "a good correlation".
Line 74: More discussion needed on how NO2 contributes to PM2.5 formation.
Line 127: Typo for US EPA
Line 131: What is the purpose of the 2-minute vs 80 sec interval?
Line 178: Please clarify the difference between the FRM instrument and the BAM instrument. Does the FRM only report daily values?
Line 206 + 236: You list 6 significant figures/3 decimal points for several of the PA-II sensors, yet these sensors are not that accurate. As per the manufacturer +/-10 ug/m3 for 0-100 ug/m3 and +/-10% for 100-500 ug/m3. Please correct.
Line 219: How are you defining the r correlation of 0.928 as "good"?
Line 220: You say performance of FRM and BAM did not correlate favorably, yet in line 203 you state that the non-FEM method compared well to FRM? Why do you conclude that the BAM is less favorably correlated to the FRM when its statistics are better than the PAs?
Line 230: Please clarify why the FRM instrument was not used to evaluate hourly performance? Were hourly FRM measurements not available?
Line 272: The referenced article does not actually consider NO2 in their PM2.5 calibration. They only used PM2.5, Temperature, RH, CO, and wind speed in their models.
Line 293: "because month has a different slope..." Do you mean " because each month..."?
Lines 311 + 355: Can these lists be included as Tables rather than in-text to improve readability and when readers look at Tables 3-5.
Line 395: "Corresponding R2 values did not differ meaningfully" Based on what statistics, do you have a p-value?
Line 408: How are you defining moderate and high correlations?
Line 412: "We used NO2 for training a calibration model" Which NO2 data to train from, from Rubidoux? Please clarify.
Line 430: "but not significantly" Based on what statistics, do you have a p-value?
Line 447: Please re-word sentence as the point is unclear.
Line 448: Please re-word to clarify that the inclusion of NO2 as an environmental factor in the calibration has potential to improve...
Section 2.2 Please include more information about the monitoring instrumentation used, especially the NO2 monitoring sites.Â
Section 3.2 + 3.3: At various points you include or drop units for your RMSE, MSE, MAE and r stats. Please be consistent. Shouldn't r (R2) be unitless? Please be consistent in using r vs R2.
Section 3.6.3: Please check units of ug/m3 as you often have "ugm3" in this section.
Equations 3, 4, & 5 could be included in the methods section rather than results.
Â
Citation: https://doi.org/10.5194/egusphere-2023-1344-RC1 -
AC1: 'Reply on RC1', Kabseok Ko, 08 Dec 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1344/egusphere-2023-1344-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Kabseok Ko, 08 Dec 2023
-
RC2: 'Comment on egusphere-2023-1344', Anonymous Referee #3, 31 Oct 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1344/egusphere-2023-1344-RC2-supplement.pdf
-
AC2: 'Reply on RC2', Kabseok Ko, 08 Dec 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1344/egusphere-2023-1344-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Kabseok Ko, 08 Dec 2023
-
RC3: 'Comment on egusphere-2023-1344', Anonymous Referee #4, 06 Nov 2023
The article's concept is very interesting and could benefit the scientific community. Having a calibration methodology that allows scientists to leverage openly available data to increase the spatial resolution of currently available official measurements at a low cost could provide novel insight into the field. Unfortunately, the study structure and data do not support the authors' claims due to the lack of a robust dataset and an unclear strategy between model training and model evaluation data groups.
- The article is confusing and hard to follow. Too much detail is given for non-relevant information but not enough for evaluation.
- The authors argue against multivariable linear regression analyses but use MLR without offering a reasonable justification for its use nor explain why its results from RF and MLR are comparable.I recommend a major revision of the dataset, hypothesis setup, and result analysis.Â
Other Notes:
- Title misspelled "Collocated".
- Line 47: "however" seems to be misplaced.
- Line 121: This sentence is poorly constructed and confusing.Citation: https://doi.org/10.5194/egusphere-2023-1344-RC3 -
AC3: 'Reply on RC3', Kabseok Ko, 08 Dec 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1344/egusphere-2023-1344-AC3-supplement.pdf
-
AC3: 'Reply on RC3', Kabseok Ko, 08 Dec 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1344', Anonymous Referee #1, 15 Aug 2023
This manuscript presents a novel approach to improve calibration of low-cost PM2.5 sensors using reference NO2 measurements from distant reference monitors (<20km). Publically available US EPA and PurpleAir PA-II outdoor sensor data were used in this study. The authors first located and quality controlled several PA-II sensors located near a reference monitor site (<100m). After sucessfully locating a sensor-monitor pair in Rubidoux, CA the authors made daily comparisons between the sensor and FRM instrument and hourly comparisons between the sensor and a BAM instrument. They concluded that the BAM instrument was sufficiently accurate to make the hourly comparisons. Over a two year study period they observed a lower correlation in summer months compared to the winter and that temperature and relative humidity had less of an impact in winter compared to summer. They tested two calibration methods, Multiple Linear Regression and Random Forest, considering additive and multiplicative terms including PM2.5, temperature, relative humidity, and NO2. The inclusion of NO2 in these methods did result in improved sensor performance, even when distant NO2 was included. While this study did test many variables/cominations of variables on sensor performance the results are only representative of a single PA-II sensor. Additionally, a whole year was used to train the calibration models. These two factors ultimately limit the applicability of their conclusions without major revisions.
Major Comments:
Section 3.1: 14 sensors were originally idenitfied in this study, however only 5 were selected based on their months of valid measurements data. Of these 5, 2 were explicitly eliminated based on correlation analysis between the sensors' and their A and B units. Based on Figure 1 it seems like both PA-II 7 and 8 would be suitable for this study while PA-II 2, 3, 5, & 6 were not (Sensor 5 included in Figure 1 but not on line 188). Your final results will be more applicable if you are able to demonstrate improvements in more than 1 sensor, even if the study period is less than 2 years.Â
Line 298: What is the reasoning behind this 1:1 data split, specifically using the whole year of 2018 to train the models and apply to 2019. This implies that in practice you have to wait a whole year before collecting valid/corrected data with this method which hinders the use of low-cost sensors. And assuming minimal sensor drift from 2018 to 2019 and similar environmental conditions.
Minor Comments:
Figure 1: Please include info about PA sensors A and B in the caption as you did on line 193.
Figure 2: Include a 1:1 line for comparison.
Figure 3: Ensure x-axes are the same for the PM2.5 graph and temperature+RH graph.
Figure sizes could be increased to improve readability.Â
Line 36: Please clarify that FRM and FEM are US EPA designations and may not be applicable to every county.Â
Line 61: "good a correlation" Please correct to "a good correlation".
Line 74: More discussion needed on how NO2 contributes to PM2.5 formation.
Line 127: Typo for US EPA
Line 131: What is the purpose of the 2-minute vs 80 sec interval?
Line 178: Please clarify the difference between the FRM instrument and the BAM instrument. Does the FRM only report daily values?
Line 206 + 236: You list 6 significant figures/3 decimal points for several of the PA-II sensors, yet these sensors are not that accurate. As per the manufacturer +/-10 ug/m3 for 0-100 ug/m3 and +/-10% for 100-500 ug/m3. Please correct.
Line 219: How are you defining the r correlation of 0.928 as "good"?
Line 220: You say performance of FRM and BAM did not correlate favorably, yet in line 203 you state that the non-FEM method compared well to FRM? Why do you conclude that the BAM is less favorably correlated to the FRM when its statistics are better than the PAs?
Line 230: Please clarify why the FRM instrument was not used to evaluate hourly performance? Were hourly FRM measurements not available?
Line 272: The referenced article does not actually consider NO2 in their PM2.5 calibration. They only used PM2.5, Temperature, RH, CO, and wind speed in their models.
Line 293: "because month has a different slope..." Do you mean " because each month..."?
Lines 311 + 355: Can these lists be included as Tables rather than in-text to improve readability and when readers look at Tables 3-5.
Line 395: "Corresponding R2 values did not differ meaningfully" Based on what statistics, do you have a p-value?
Line 408: How are you defining moderate and high correlations?
Line 412: "We used NO2 for training a calibration model" Which NO2 data to train from, from Rubidoux? Please clarify.
Line 430: "but not significantly" Based on what statistics, do you have a p-value?
Line 447: Please re-word sentence as the point is unclear.
Line 448: Please re-word to clarify that the inclusion of NO2 as an environmental factor in the calibration has potential to improve...
Section 2.2 Please include more information about the monitoring instrumentation used, especially the NO2 monitoring sites.Â
Section 3.2 + 3.3: At various points you include or drop units for your RMSE, MSE, MAE and r stats. Please be consistent. Shouldn't r (R2) be unitless? Please be consistent in using r vs R2.
Section 3.6.3: Please check units of ug/m3 as you often have "ugm3" in this section.
Equations 3, 4, & 5 could be included in the methods section rather than results.
Â
Citation: https://doi.org/10.5194/egusphere-2023-1344-RC1 -
AC1: 'Reply on RC1', Kabseok Ko, 08 Dec 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1344/egusphere-2023-1344-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Kabseok Ko, 08 Dec 2023
-
RC2: 'Comment on egusphere-2023-1344', Anonymous Referee #3, 31 Oct 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1344/egusphere-2023-1344-RC2-supplement.pdf
-
AC2: 'Reply on RC2', Kabseok Ko, 08 Dec 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1344/egusphere-2023-1344-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Kabseok Ko, 08 Dec 2023
-
RC3: 'Comment on egusphere-2023-1344', Anonymous Referee #4, 06 Nov 2023
The article's concept is very interesting and could benefit the scientific community. Having a calibration methodology that allows scientists to leverage openly available data to increase the spatial resolution of currently available official measurements at a low cost could provide novel insight into the field. Unfortunately, the study structure and data do not support the authors' claims due to the lack of a robust dataset and an unclear strategy between model training and model evaluation data groups.
- The article is confusing and hard to follow. Too much detail is given for non-relevant information but not enough for evaluation.
- The authors argue against multivariable linear regression analyses but use MLR without offering a reasonable justification for its use nor explain why its results from RF and MLR are comparable.I recommend a major revision of the dataset, hypothesis setup, and result analysis.Â
Other Notes:
- Title misspelled "Collocated".
- Line 47: "however" seems to be misplaced.
- Line 121: This sentence is poorly constructed and confusing.Citation: https://doi.org/10.5194/egusphere-2023-1344-RC3 -
AC3: 'Reply on RC3', Kabseok Ko, 08 Dec 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-1344/egusphere-2023-1344-AC3-supplement.pdf
-
AC3: 'Reply on RC3', Kabseok Ko, 08 Dec 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
395 | 175 | 27 | 597 | 51 | 14 | 15 |
- HTML: 395
- PDF: 175
- XML: 27
- Total: 597
- Supplement: 51
- BibTeX: 14
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Kabseok Ko
Seokheon Cho
Ramesh R. Rao
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(730 KB) - Metadata XML
-
Supplement
(575 KB) - BibTeX
- EndNote
- Final revised paper