Preprints
https://doi.org/10.5194/egusphere-2026-1273
https://doi.org/10.5194/egusphere-2026-1273
03 Jun 2026
 | 03 Jun 2026
Status: this preprint is open for discussion and under review for Atmospheric Measurement Techniques (AMT).

A tuneable framework for outlier detection in PM2.5 air sensor networks during wildland fire smoke events

Stuart J. Illson and Karoline K. Barkjohn

Abstract. In recent years the use of air sensors has rapidly expanded across North America to measure fine particulate matter (PM2.5), particularly in response to increasing air quality impacts from wildland fire. With the benefit of enhanced spatial and temporal coverage, the scientific community and the public have come to rely on sensor networks as valuable sources of air quality information. With an increasing variety of sensor devices being deployed, there is a need to validate and harmonize PM2.5 data between different device types. While significant attention has been given to calibration and correction equations to improve the accuracy of a given sensor's measurement, there is a need to develop tractable and generalizable methods of identifying malfunctioning or unreliable sensors, given the maintenance, siting, and operation of many of these devices is unknown. In this paper, we propose a method of identifying outlier PM2.5 sensors, defined as those whose measurements deviate strongly from other local measurements due to hardware faults or to hyper-local environmental conditions that are not representative of typical ambient air quality conditions. While detecting outliers during typical conditions is a fairly straightforward task, detecting outliers during smoke events is challenging due to real, erratic shifts in PM2.5 concentrations. Here, we present a novel method of detecting outliers within sensor networks by combining measures from information theory and machine learning. We first define a tuneable, rule-based detection function that balances the Shannon entropy of a local network against the information content of an individual sensor's measurement. We then use this function, together with additional information-theoretic and short-term temporal features, to train a gradient-boosted decision tree for automated outlier detection. Hourly PM2.5 measurements from various device types were collected for 11 unique smoke events across North America in 2024 and 2025, and a stratified sample of sensor data were randomly perturbed to simulate 5 commonly seen faults. In each of these cases, we assessed each method's ability to detect the simulated faults. We demonstrate that either of these methods, while trained on a semi-synthetic dataset, can act as a useful data validation procedure when applied to both real-time air quality reporting and retrospective analysis.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share
Stuart J. Illson and Karoline K. Barkjohn

Status: open (until 08 Jul 2026)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
Stuart J. Illson and Karoline K. Barkjohn
Stuart J. Illson and Karoline K. Barkjohn
Metrics will be available soon.
Latest update: 03 Jun 2026
Download
Short summary
This study investigates a generalizable method for identifying outliers within networks of air quality sensors, and discusses its challenges during wildland fire smoke events, which may obscure other approaches by introducing real variability. Tested using 9 months of data from over 19,000 devices, the methods developed distinguish malfunctioning sensors, localized conditions affecting a single sensor, and genuine air quality impacts across a range of network densities and smoke conditions.
Share