<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" specific-use="SMUR" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">EGUsphere</journal-id>
<journal-title-group>
<journal-title>EGUsphere</journal-title>
<abbrev-journal-title abbrev-type="publisher">EGUsphere</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">EGUsphere</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub"></issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/egusphere-2026-1273</article-id>
<title-group>
<article-title>A tuneable framework for outlier detection in PM2.5 air sensor networks during wildland fire smoke events</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Illson</surname>
<given-names>Stuart J.</given-names>
<ext-link>https://orcid.org/0000-0001-8927-817X</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Barkjohn</surname>
<given-names>Karoline K.</given-names>
<ext-link>https://orcid.org/0000-0001-6197-4499</ext-link>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>School of Environmental and Forestry Sciences, University of Washington, Seattle, 98195, USA</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Office of State Air Partnerships, U.S. Environmental Protection Agency, Durham, NC 27711, USA</addr-line>
</aff>
<pub-date pub-type="epub">
<day>03</day>
<month>06</month>
<year>2026</year>
</pub-date>
<volume>2026</volume>
<fpage>1</fpage>
<lpage>49</lpage>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2026 Stuart J. Illson</copyright-statement>
<copyright-year>2026</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://egusphere.copernicus.org/preprints/2026/egusphere-2026-1273/">This article is available from https://egusphere.copernicus.org/preprints/2026/egusphere-2026-1273/</self-uri>
<self-uri xlink:href="https://egusphere.copernicus.org/preprints/2026/egusphere-2026-1273/egusphere-2026-1273.pdf">The full text article is available as a PDF file from https://egusphere.copernicus.org/preprints/2026/egusphere-2026-1273/egusphere-2026-1273.pdf</self-uri>
<abstract>
<p>In recent years the use of air sensors has rapidly expanded across North America to measure fine particulate matter (PM&lt;sub&gt;2.5&lt;/sub&gt;), particularly in response to increasing air quality impacts from wildland fire. With the benefit of enhanced spatial and temporal coverage, the scientific community and the public have come to rely on sensor networks as valuable sources of air quality information. With an increasing variety of sensor devices being deployed, there is a need to validate and harmonize PM&lt;sub&gt;2.5&lt;/sub&gt; data between different device types. While significant attention has been given to calibration and correction equations to improve the accuracy of a given sensor&apos;s measurement, there is a need to develop tractable and generalizable methods of identifying malfunctioning or unreliable sensors, given the maintenance, siting, and operation of many of these devices is unknown. In this paper, we propose a method of identifying outlier PM&lt;sub&gt;2.5&lt;/sub&gt; sensors, defined as those whose measurements deviate strongly from other local measurements due to hardware faults or to hyper-local environmental conditions that are not representative of typical ambient air quality conditions. While detecting outliers during typical conditions is a fairly straightforward task, detecting outliers during smoke events is challenging due to real, erratic shifts in PM&lt;sub&gt;2.5&lt;/sub&gt; concentrations. Here, we present a novel method of detecting outliers within sensor networks by combining measures from information theory and machine learning. We first define a tuneable, rule-based detection function that balances the Shannon entropy of a local network against the information content of an individual sensor&apos;s measurement. We then use this function, together with additional information-theoretic and short-term temporal features, to train a gradient-boosted decision tree for automated outlier detection. Hourly PM&lt;sub&gt;2.5&lt;/sub&gt; measurements from various device types were collected for 11 unique smoke events across North America in 2024 and 2025, and a stratified sample of sensor data were randomly perturbed to simulate 5 commonly seen faults. In each of these cases, we assessed each method&apos;s ability to detect the simulated faults. We demonstrate that either of these methods, while trained on a semi-synthetic dataset, can act as a useful data validation procedure when applied to both real-time air quality reporting and retrospective analysis.</p>
</abstract>
<counts><page-count count="49"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>