the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Real-time automated quality control of atmospheric aerosol time series
Abstract. Automated, reproducible quality control (QC) of aerosol optical time series is essential for comparable, reusable data in observation networks and research infrastructures. We present a real-time-capable QC workflow for AE33 Aethalometer equivalent black carbon (eBC) and Aurora 4000 nephelometer scattering measurements (σsca). The workflow is implemented in the open-source SaQC framework and wrapped by the actris_qc package, which encodes community QA/QC recommendations into a three-stage, machine-readable rule set for device-control, channel-specific and derived-variable checks. The pipeline is parameterised using a catalogue of instrument anomalies from the urban TROPOS site and validated at the rural Melpitz station, demonstrating robust transfer across contrasting environments. It reliably flags outliers, noise, plateaus and device malfunctions while preserving genuine atmospheric variability and achieves real-time performance on standard hardware. The modular, configuration-driven design enables straightforward adaptation to additional instruments and networks, providing harmonised QC flags and provenance and enabling FAIR-compliant and AI-ready downstream use in monitoring systems.
- Preprint
(23544 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- EC1: 'Comment on egusphere-2026-235', Griša Močnik, 07 Apr 2026
-
RC1: 'Comment on egusphere-2026-235', Anonymous Referee #1, 05 Jun 2026
General comments
Houben and coauthors present a new data processing workflow for quality control (QC) of aethalometer (AE33) and nephelometer (Aurora 4000) measurements. This is an important and timely topic given the intense work on standardized, in situ aerosol data chains currently being performed by many research groups across the ACTRIS network. It is a relevant topic for AMT. However, there are major shortcomings in the work that should be addressed before publication.
We write this review as potential users of these tools. We have assessed the manuscript as well as the companion software package on which it is based.
The first major issue is one of scope. The abstract and introduction promise an automated, real-time capable, FAIR-compliant, and AI-ready QC pipeline for AE33 and Aurora 4000 measurements. In reality, what is presented is a proof-of-concept of a potential first step that could be used on raw, level 0 data to partially take it towards level 1 data. It is still a long way from a true end-to-end data pipeline for these instruments. This is fine since the overall goal is probably too big to be dealt with in a single research article. But the limited scope of the work should be much more faithfully represented. We think it is important to clarify the steps that this workflow does not perform - ideally in specific relation to the ACTRIS requirements for level 0-2 AE33 and Aurora 4000 data - in order to help direct future work.
The second major issue is conciseness. The manuscript reads more like a thesis or project report. Very substantial editing is required. For example, The introductory Sections 1 and 2 currently take up almost 7 pages. A lot of this is repeated content. To take only one example, the paragraph on line 30, page 4 simply repeats the same statements in the list above it. There are also many instances where the same references are repeatedly cited. In addition, there is a lot of irrelevant content. For example, Section 2.4 discusses low cost sensors in detail, even though these sensors are not part of the presented research results. By removing irrelevant details and repeated information it should be possible to drastically reduce the length of the manuscript without losing any of the essential details.
There are several, fundamental technical issues with the presented workflow.
Comments on the flagging scheme and associated workflow outputs
The actris_qc package defines a basic flagging scheme with only three categories (GOOD, DOUBTFUL, BAD). It is mentioned that users could supply their own flag dictionaries on top of this scheme for their own specific needs. However, it is not clear how these three basic categories could map onto the much more detailed flagging scheme used for ACTRIS/EBAS data submissions.
For example, the workflow provides no mechanism for distinguishing values that exceed the fixed DOUBTFUL and BAD thresholds due to instrument malfunction from values that exceed them due to genuine extreme atmospheric and/or contamination events. The means the workflow does not distinguish between data that is BAD because the instrument was malfunctioning and data that is BAD because it is measuring a non-representative signal - a distinction which is formally encoded in the ACTRIS/EBAS framework as the difference between invalid and contaminated data. Contaminated data may be scientifically valuable for various reasons even if it should be excluded from background climatology. Collapsing both situations into a single BAD flag discards potentially useful information and may produce AI training datasets with artificially truncated distributions that underrepresent genuine high-concentration events.
For the example datasets used in the study, it is reported that 24% of data was flagged as DOUBTFUL or BAD. It would be useful to know if this is expected behavior and standard for this type of dataset, and to discuss the implications for ACTRIS data reporting requirements.
The manuscript mentions that the workflow produces harmonized, FAIR-compliant QC flags with full provenance, but inspection of the actual CSV outputs reveals that the flag cause field is populated with OTHER for every flagged data point without exception. Specific cause codes used by the community are available, for example, in the ACTRIS vocabulary (INSTRUMENT_ERROR, MISSING_VALUE, ...). Is there a reason such codes have not been introduced and used?
Also missing is discussion of how the how the generated QC flags written to CSV should flow back into the workflow and be applied to the data. This is a non-trivial step that requires thoughtful design.
One consequence of this is that there is no implemented feedback loop between the continuous QC output and an operator review workflow. The manuscript implies a seamless path from automated flagging to FAIR-compliant data products, but the actual system appears to have no tooling for an operator to query recent flags, confirm or dispute automated decisions, or trigger targeted reprocessing of a specific window without modifying source code. This gap between the described capability and the implemented reality should at the least be acknowledged as a significant limitation of the current software release
The comment field in the output CSV is always empty for flagged data points. The descriptive label strings defined in the code (for example "BAD:flagRange: flow_1 < 3.7 or flow_1 > 4.05") appear to be used internally by SaQC for plotting but are never written to the output file. As a consequence it is impossible to determine from the CSV output alone whether a flagRange entry was triggered by the BAD threshold or the DOUBTFUL threshold, or which source variable caused a transferFlags entry.
The QC flag values exported to CSV are numeric codes with no legend, mapping table, or reference specification saved alongside them. A downstream user cannot interpret the flags without reading the source code, which is inconsistent with the FAIR Reusable principle that data should be self-describing.
Thresholds, configuration, reproducibility, and transferability
A key goal of this work is that the pipeline is transferable to other stations. However, we see several major barriers that will hinder this.
The decision to keep scientifically grounded QC thresholds out of the operational config.yml is defensible and arguably preferable to exposing them as freely editable parameters. However, the current implementation scatters these values across hundreds of lines of processing code with no central documentation, no inline source references, and no mechanism for distinguishing universally applicable thresholds from those that were empirically derived for TROPOS and may require adjustment at other sites. This problem could be solved, for example, by consolidating all thresholds into a dedicated, well-documented module or parameter file where each value is annotated with its scientific source, units, and applicability scope.
The configuration file used to produce a given QC result is never saved alongside the output. If thresholds are adjusted between runs there is no mechanism to determine which parameter values produced which output file. Combined with the absence of a software version identifier anywhere in the output, this means the outputs cannot be unambiguously traced back to a specific code and configuration state, which is inconsistent with the FAIR Reusable principle of detailed provenance.
The noise detection parameter optimization files are hardcoded to specific device serial numbers: noise_ebc_w1_ae33_sn706_OPT.csv for the aethalometer and aurora4000_sn213450_OPT.csv for the nephelometer. The Melpitz device (ae33_sns0700705) has no corresponding optimized parameter file, which presumably means the noise detection will fail or fall back silently for that instrument at runtime (see further discussion below).
There is a discrepancy between the documented lower eBC threshold (-0.075 µg/m³) and corresponding implemented value (-0.0075 µg/m³). It is unclear which value was used in the results presented in the manuscript.
One general limitation of the workflow is that it is applied to only two measurement channels, i.e., the shortest wavelength channels of each of the instruments (ebc_w1 at 370 nm; sca_450 at 450 nm). This is actually a quite major limitation that should be more clearly discussed (see general comment above), including its implications for multi-wavelength scientific and regulatory use cases. In the context of the configuration issues, it is currently unclear whether and to what extent the developed parameterizations can be transferred to other channels, given that artefacts like LED degradation, detector drift, and filter inhomogeneity can be wavelength dependent.
The anomaly catalogue plays a dual role as both training data for parameter optimization and evaluation benchmark, which creates a circularity that the Melpitz validation only partially resolves. The paper would benefit from a clearer description of the catalogue construction process - how many distinct anomaly periods were identified, what fraction of the total record they represent, and whether the taxonomy was developed prospectively or retrospectively. It would also strengthen the work to clarify whether the root causes assigned to each anomaly (e.g. filter change, LED instability) were confirmed from instrument event logs or inferred from the signal alone, since this affects how confidently the parameterization can be expected to transfer to sites without the same maintenance history.
It is also uncertain how it is envisioned that the anomaly catalogue fits into the intended operational workflow. Inspection of the code reveals that anomalies are hardcoded entries in a Python source file (examples/anomalies.py), which means that adding a new anomaly requires editing and redeploying source code. For an operational system this is not sustainable. More importantly, the anomaly system is a manual reprocessing mechanism, not automated anomaly detection: it requires a human to have already identified a problematic time window before the system can process it. This appears to be inconsistent with the aim of developing an automated, real-time QC system. These are acceptable (and perhaps necessary) limitations, but they should be explicitly noted.
Parameter Optimization Transparency
The parameter optimization procedure for noise detection algorithm is lacking methodological details. The algorithm is a multi-objective genetic algorithm that searches over four parameters of the flagByScatterLowpass function. The parameters of this algorithm were optimized against a manually labelled ground truth dataset that was obtained using this flagByClick functionality. This manually labelled ground truth is not included in the data repository. Furthermore, no details of the optimization procedure are provided. Therefore, it is not clear if or how this process could be reproduced for other sites. One can only assume with some difficulty, since the optimization workflow was not repeated for Melpitz or for any other device. These limitations should be discussed.
Testing and Software
The README file in the software package references a tests/ directory with conftest.py and test_data_manager.py, but these files are absent from the actual codebase. For a software package submitted as a companion to a scientific paper, intended for operational deployment, and claimed to produce reproducible and FAIR-compliant outputs, the complete absence of automated tests is a worry. We suggest that at the least, tests that run the pipeline against the bundled example data files and verify the output format and flag scheme should be added.
It also appears that exceptions during anomaly processing are caught, printed to stdout, and silently swallowed - the pipeline continues to the next anomaly and run.py always exits with code 0 regardless of whether any data was successfully processed. This means a silently failed QC run produces a data gap with no alert, no structured failure record, and no non-zero exit code for monitoring systems to detect. This limitation should be addressed or at least discussed.
Other specific comments
P1, L31: Many acronyms used without first being defined.
P10, L21: Please provide more details on the data sources. E.g., the data loggers used and whether any previous processing steps were applied to their outputs.
P11, Fig. 2: It is not clear what 'exceeding values' mean.
P13, L5: It should be clarified here and throughout the manuscript that this 'ACTRIS aerosol in situ' solution. ACTRIS has 5 other components that use their own QA/QC workflows, some of which are already quite developed and probably worthwhile discussing given their similarities to the present case (e.g., the gas phase in situ QC workflow).
P13, L11: This workflow can't be described as 'station-level', or not even 'instrument-level' for that matter, given that currently it only operates on single measurement channels.
P14, L42: How are the 10 sec level flags resampled to 1 min?
P15, Fig. 5: The output cannot be considered a 'final quality assured dataset' given the limitations discussed above.
P17, L2: What happens for ATTN values between 100 and 120?
P17, L9: How was this threshold determined for this AE33 unit?
P21, Fig. 7: Panel a shows that the Aurora 4000 instrument zeroing periods are included in the input data. This requires further discussion. Such data should be flagged independently as it should be possible to isolate and analyze these data independently.
P22, L2: Signal-to-noise appears to be the wrong term to use here.
Citation: https://doi.org/10.5194/egusphere-2026-235-RC1 -
RC2: 'Comment on egusphere-2026-235', Anonymous Referee #2, 08 Jun 2026
Houben et al. presents a data processing workflow and software for aethalometer and nephelometer observations. Given the high number of sites (ie potential users) and the need for harmonized QA/QC, this work is crucial for the aerosol in-situ community, not only for ACTRIS but also at a global scale. As such, it is very relevant for AMT.
As for Reviewer#1, I am reviewing the paper as a potential user of the software. I do have some concerns regarding the article, which are listed below.
- the manuscript is long, some sections should be more concise such the introduction or Section 2. For instance, I don't think background is needed for low-cost sensors since AE33 and Aurora4000 don't fit in this category.
- I think the scope of the paper should be clarified. The authors present a workflow and a software package, but also, through detailed examples, a way to configure the QA/QC: which parameter to use, which thresholds, which flag. Therefore it is not clear wether the authors also provide recommendations for these as well, and how they are in line with current ACTRIS recommendations provided by CAIS-ECAC. Harmonization comes from using common tools and common protocols. As such, it is not clear as well if previous data should be re-evaluated with this tool.
On a more general aspect, it is not clear how the paper fits in ACTRIS Data Management Plan and existing softwares already in place. Does it handle NASA-AMES format ? Can it be used (or how can it be implemented) for level0, 1, 2 generation ?
How the "BAD", "DOUBTFUL", "VALID" flags can be compatible with the ACTRIS flagging system ? For instance, the authors flagged the data with a "BAD" flag if the inlet RH is higher than 40%. In ACTRIS, it should be flagged with 640 ("Instrument internal relative humidity above 40%"), but remains valid. For AE33, status !=0 leads to a "BAD" flag; this is far too reductive. There are status numbers which are only warnings, and do not report a malfunction of the instrument. Plus, the status variable results from a binary combination of different individual status which can be decoded. I think this is important to take this into account.- the real-time aspect of the workflow / software needs to be better demonstrated. NRT is a potential application, but the automated QA/QC can also be applied on an "offline mode", right ? To this regard, I would mention NRT application in the text, but not in the title.
p13-14, NRT is a bit discussed, but it is not clear how producing qualified data locally (ie at the station) is compatible with ACTRIS Data Management Plan ?- In NRT (but also in "offline" mode), what does the "DOUBTFUL" flag mean ? can the data still be considered valid, or not ?
- as a potential user, I tried to implement the workflow and the software for my site. I must admit that I miserably failed. Clear step-by-step guidance is critical. I sincerely suggest the authors to put themselves in the users' place; and potentially users with strong atmospheric science background, but little coding experience. This is also how this tool can be a success story or not. User-friendliness and guidance should always be an important aspect of the development process.
- p8, it is not clear what is "uncorrected and corrected eBC". eBC refers to a framed concept with proper recommendations and not "raw" BC data. It is not clear wether the proposed software is able to provide eBC concentrations as defined in Savadkoohi et al. (2024).
- As suggested by the Editor, it would be very valuable to implement the automatic evaluation of the compensation algorithm by comparing absorption (or BC) before and after each spot change. It would also interesting to use the complete spectral dependance of BC to calculate AAE from ln(abs) vs ln(lambda), which can enable to check the quality of the regression (r²), as recommended in Savadkoohi et al. (2025).
Citation: https://doi.org/10.5194/egusphere-2026-235-RC2
Data sets
Real-time automated quality control of atmospheric aerosol time series Timo Houben, Thomas Müller, Peter Lünenschloss , Jens Voigtländer, Thomas Trabert, Ema Vosgerau, David Schäfer, and Jan Bumberger https://doi.org/10.5281/zenodo.18234701
Model code and software
Real-time automated quality control of atmospheric aerosol time series Timo Houben, Thomas Müller, Peter Lünenschloss , Jens Voigtländer, Thomas Trabert, Ema Vosgerau, David Schäfer, and Jan Bumberger https://doi.org/10.5281/zenodo.18234956
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,057 | 868 | 65 | 1,990 | 69 | 67 |
- HTML: 1,057
- PDF: 868
- XML: 65
- Total: 1,990
- BibTeX: 69
- EndNote: 67
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Dear authors,
an alternative way to check in real time for the loading effect, is to check for jumps in eBC concentration(s) at the time of the tape advance. with the loading parameter k=0,05 (fresh urban BC), this can be easily 50% jump.
Kindest regards,
Griša Močnik