the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Benchmarking Laser-Induced Fluorescence and Machine Learning for real-time identification of bacteria in bioaerosols
Abstract. Microorganisms are ubiquitous in the environment, playing key roles in all ecosystems, including the atmosphere, with airborne dissemination via particulate matter being essential for many microorganisms’ life cycles. However, the atmosphere as a microbial ecosystem has been severely understudied, mostly due to the challenging technical difficulties in sampling and characterizing it and the presumed irrelevance of the atmospheric environment for microbes. So far, most recent studies use metagenomic sequencing to assess aerobiome diversity, which can be biased and hurdled due to the inherent ultra-low DNA yield of air samples. Previous research has already demonstrated the potential use of Laser-Induced Fluorescence (LIF) and machine learning (ML) to characterize the vegetal fraction of bioaerosols, by classifying pollen particles using the Rapid-E bioaerosol detector (Plair SA) and neural network classifiers. In this study, we present a new methodology for near real-time (NRT) automatic recognition of microbial particles in the air: first by replacing Rapid-E’s visible and ultraviolet (UV) laser (337 nm) with another laser (266 nm) optimized to excite fluorophores in bacterial and fungal cell membranes. We tested this new setup with artificially generated aerosols enriched with five distinct bacterial species. Employing Random Forest classifiers, we were able to: (a) detect bacterial particles (96.74 % class-balanced accuracy), and (b) discriminate between the different species (69.24 % class-balanced accuracy across the different species in the validation set). This innovative approach sets a new range of possibilities for the rapid and precise monitoring of airborne microbial communities, offering a valuable tool for both ecological studies and public health surveillance.
- Preprint
(6696 KB) - Metadata XML
-
Supplement
(5544 KB) - BibTeX
- EndNote
Status: open (until 09 Oct 2025)
-
RC1: 'Comment on egusphere-2025-2484', Federico Carotenuto, 05 Aug 2025
reply
-
AC1: 'Reply on RC1 (0)', Alejandro Fontal, 04 Sep 2025
reply
We thank the reviewer for the positive assessment of our work and for highlighting its relevance within the field of aerobiology. We agree that optical methods remain largely focused on pollen taxa, and one of the motivations of this study was precisely to explore the potential of UV-LIF instruments for bacterial discrimination. We appreciate the reviewer’s constructive comments and suggestions, which have helped us to improve the clarity and robustness of the manuscript.
Below we address each of the points raised to the best of our knowledge.
We will reply to each point individually, in line with the interactive discussion format, so that every comment can be considered and followed up if needed.
Citation: https://doi.org/10.5194/egusphere-2025-2484-AC1 -
AC2: 'Reply on RC1 (1)', Alejandro Fontal, 04 Sep 2025
reply
Can you please (even as supplementary material) give more information about the modifications made to the PLAIR? I think it would be interesting for other researchers as well to understand more in detail how such a modification can be made.
Paragraph 2.2.1. Can you please provide a figure of your aerosolization set-up?
We have added a section in the supplementary material where we describe in more detail the modifications performed on the Rapid-E device to integrate the new laser. In addition, we generated a schematic diagram that illustrates these changes and the overall aerosolization set-up in greater detail. We have now replaced Figure 2 in the manuscript, which was a simple picture of the updated device, with this figure so as to provide a more comprehensive idea of the modifications and the aerosolization process. The figure shows the process by which samples are aerosolized using the Palas AGK 2000 nebulizer, transferred through aerosol tubing, and introduced into the Rapid-E’s inlet.
To better illustrate the modifications to the device, the diagram also shows the integration of the ONDA NS 266 nm laser, which was soldered into a new module positioned just below the original unit and connected via a system of mirrors and tubing to direct the laser beam into the particle stream entering Rapid-E through its nozzle.
Since we cannot attach images over 500x500 px here (and the system seems to crash whenever we try a smaller one, anyway), a high resolution version of the figure can be accessed in the project's GitHub repository, along with the rest of the code and outputs of the study:
https://github.com/AlFontal/lif-bacteria-aerosols-ms/blob/main/output/figures/combi_ms/fig_2_aerosolization_uv_integration_diagram.png
PD: Also adding a PDF with the image as a supplement just in case, as we are not supposed to submit the revised manuscript in the interactive discussion either.
-
AC3: 'Reply on RC1 (2)', Alejandro Fontal, 04 Sep 2025
reply
Also, was this modification purely driven by optical considerations or from a previous experience where the instrument “as-is” failed to detect bacteria? A comparison of non-modified and modified PLAIR would also be interesting to understand the degree of improvement in detection.
The version of the Rapid-E originally available to us was equipped with a 335 nm UV laser, a wavelength close to optimal for chlorophyll excitation. This choice was great for the original purpose given that the instrument was initially designed for pollen identification and classification. However, excitation at 335 nm is suboptimal for many bacterial fluorophores. Discussions with the manufacturer, colleagues, and published studies all pointed to the consensus that this wavelength would not provide sufficient sensitivity for bacterial detection, especially in a semi-real time manner as is the case.
As reported in the literature and discussed in the main text, key biomolecules characteristic of bacterial cells, including NADH, FAD, and the aromatic amino acids tryptophan, tyrosine, and phenylalanine, show stronger excitation in the deep-UV range around 260–280 nm, even at particle sizes typical of bacteria (Sivaprakasam et al., 2004; Pan et al., 2009; Hill et al., 2013, 2015). For this reason, and to maximize discrimination capability, we opted to integrate a 266 nm DPSS laser for the specific purpose of bacterial detection. Unfortunately, we did not run a comparable test with the original device “as is” before the modification, and since the process was long and iterative, we did not perform a direct comparison between the original configuration and the modified version either.
We agree with the reviewer that such a benchmark would be valuable to quantify the exact gain in classification power provided by the modifications, but we consider it outside the scope of this particular work.
Our aim was instead to demonstrate that the repurposed modified instrument could classify bacterial aerosols with sufficient discriminatory power.
That being said, and to explicitly answer the question framed by the reviewer: the modification was mostly driven by optical considerations but also derived from the experience of colleagues who had previously failed to succeed in microbial detection with similar setups.
References:
Sivaprakasam, V., Huston, A. L., Scotto, C., & Eversole, J. D. (2004). Multiple UV wavelength excitation and fluorescence of bioaerosols. Optics express, 12(19), 4457-4466.
Pan, Y. L., Pinnick, R. G., Hill, S. C., & Chang, R. K. (2009). Particle-fluorescence spectrometer for real-time single-particle measurements of atmospheric organic carbon and biological aerosol. Environmental science & technology, 43(2), 429-434.
Hill, S. C., Pan, Y. L., Williamson, C., Santarpia, J. L., & Hill, H. H. (2013). Fluorescence of bioaerosols: mathematical model including primary fluorescing and absorbing molecules in bacteria. Optics express, 21(19), 22285-22313.
Hill, S. C., Williamson, C. C., Doughty, D. C., Pan, Y. L., Santarpia, J. L., & Hill, H. H. (2015). Size-dependent fluorescence of bioaerosols: Mathematical model using fluorescing and absorbing molecules in bacteria. Journal of Quantitative Spectroscopy and Radiative Transfer, 157, 54-70.
Citation: https://doi.org/10.5194/egusphere-2025-2484-AC3 -
AC4: 'Reply on RC1 (3)', Alejandro Fontal, 04 Sep 2025
reply
Paragraph 2.2.3. Can you please provide details about bacterial growth (media, time, temperature of growth, …) as well as the technique used for identification of the species (how was MS-TOF used in this context?).
We have edited the text and amplified the level of details with regards to the bacterial growth and the identification by MALDI-TOF:
L132-140 now:
"We analysed five bacterial species commonly found in urban bioaerosols, which were obtained from air samples collected on quartz fiber filters using a high-volume sampler (MCV, Spain) on the rooftop of our laboratory (AIRLAB, Barcelona, Spain). Filter portions were placed in contact with nutrient agar plates, then removed, and the plates were incubated at 37 °C for 24 h. Morphologically distinct colonies were subsequently subcultured under the same conditions to obtain pure isolates. These isolates were identified using MALDI-TOF MS (LT MicroFLex, Bruker Daltonics, Germany). For each isolate, a small fraction of biomass was spotted onto the target plate, after which 1 μL of a saturated HCCA matrix solution was added and allowed to dry. Each sample was spotted in duplicate, and each spot was measured twice, yielding four mass spectra per isolate. The resulting spectra were compared with the Bruker bacterial library v.9.0. The complete taxonomic classification of the bacterial species used in the experiments is presented in Table 1."
Citation: https://doi.org/10.5194/egusphere-2025-2484-AC4 -
AC5: 'Reply on RC1 (4)', Alejandro Fontal, 04 Sep 2025
reply
Paragraph 2.3.1. At line 158 is stated that “No transformation was needed for incorporating fluorescence spectra and lifetime data into the models”. I would like to better understand this point since, as far as I understand, per each particle (i.e.: sample) your features were all different in size. The fluorescence spectrum should be a [1,32] vector (intensity vs. 32 wavelengths); the lifetime a [4,64] matrix (4 bands vs. 64 nanoseconds) and the scattering a [24,60] matrix (24 angles vs. 60 microseconds following your cropping). How was this difference handled to generate a consistent input for the random forest classifier?
We appreciate the comment and understand how it might lead to an ambiguous understanding in the way we originally phrased it. What we meant is that both fluorescence spectra and fluorescence lifetime outputs are consistently acquired by the instrument as fixed-size arrays, so they could be directly incorporated into the models without further preprocessing, unlike the scattering data. Specifically, the fluorescence spectra are recorded as 32-channel vectors at 8 different timepoints (so, actually, a [8,32] matrix ) and the lifetime outputs as [4,64] matrices (Figure 1 should clarify this). These matrices are then flattened into one-dimensional vectors ([256] and [256]) before being passed to the classifier.
In contrast, scattering signals varied in duration between particles, which is why we applied the cropping, zero-padding, and normalization procedure described in the text to ensure consistent input dimensionality.
Random Forests (and basically all tree-based methods) don’t rely on distance metrics so they are not sensitive to the scale of the inputs, which in our case is advantageous since this eases the ease of incorporating heterogeneous inputs as ours. The structural nature of data, however, gets lost, so each feature is understood as independent from each other by the RF, and it needs to “learn” it if relevant. This doesn’t seem to be a problem for the fluorescence spectra/lifetimes, but might explain why we see little predictive power gain from the scattering images.
In any case, to clarify this point, we have rephrased the relevant sentence in the manuscript as follows (now L165-168):
“No additional transformation is required for incorporating fluorescence spectra and lifetime data into the models, as these are acquired by the instrument in fixed dimensions (32-channel spectra x 8 acquisitions and 4 lifetime ranges x 64-channels, all later flattened into vectors). However, light scattering images present a challenge due to their irregular shapes, as the total number of acquisitions depends on the duration of the detected scattered light signal”
Citation: https://doi.org/10.5194/egusphere-2025-2484-AC5 -
AC6: 'Reply on RC1 (5)', Alejandro Fontal, 04 Sep 2025
reply
Figure 4. From the figure and from the text it is unclear to me how the samples were labelled. Were simply all the spectra from the aerosolization of a given bacteria considered as pure bacterial spectra with no filtering (except for the time cropping and the fluorescence threshold)? So, essentially, all spectra were “auto-labelled” depending on their source assuming no interference? Also, how was the multiclass classifier trained?
Indeed, this is one of the main challenges when generating training data from bioaerosols: we know that some of the particles produced will contain the species of interest, but we cannot guarantee that every droplet entering the device will do so (or even that they will always carry any biological material at all). This makes the labelling process closer to a pseudo-labelling approach rather than generating a real ground truth. With this in mind, however, we have done our best to attempt to minimize the potential labelling errors in our process:
- First, we apply a strict fluorescence threshold, using a higher cutoff than in previous studies (e.g., Šaulienė et al. 2019), which also relied on thresholds to exclude unwanted particles such as pollen. Our 2000 a.u. cutoff lies within the 95th to 99th percentile range across all sample groups, so the majority of “empty” particles are discarded at this step.
- Second, we observed that the control group also contained particles with fluorescence signals. This means that fluorescence alone cannot serve as the only variable to distinguish bacteria-containing particles from those in bacterial-free aerosols. For this reason, we first train the binary classifier to separate bacteria-containing particles from all others (using the subset already passing the fluorescence threshold), and then train the multiclass classifier exclusively on the bacterial-enriched aerosols.
That said, we acknowledge that many empty particles likely remain among those labelled as bacterial, and conversely, that many valid bacterial particles may have been discarded because they did not reach the fluorescence threshold. The fraction of non-bacterial particles incorrectly labelled as bacterial is likely limited, since the models show strong discrimination performance at the binary level on completely unseen data. Moreover, even simple two-dimensional PCA projections of the predictors already reveal clear separability between groups, which would be unlikely if both contained largely overlapping “empty” signals.
Citation: https://doi.org/10.5194/egusphere-2025-2484-AC6 -
AC7: 'Reply on RC1 (6)', Alejandro Fontal, 04 Sep 2025
reply
Thanks for the remark, as indeed the current plot was rather ambiguous without labels or ticks in the y-axis. The y-axis here basically represents the index of each particle, and as the selected particles were the top 100 with the highest peak fluorescence at any acquisition time, it is an indicator of the top n particles with respect to that metric for each of the groups. For clarity, we have updated the panel C figure to include the label and the indexes 1, 50 and 100 as ticks. We have also opted to reverse the order of the y-axis, since the previous sorted 100 to 1 in a top-to-bottom order, which is rather confusing especially taking into account that now we explicitly label the indexes. For further clarification, I attach with the comment a high-resolution PNG and a SVG of the figure, and the construction of the data behind the figure and commands can be openly accessed here:
https://alfontal.github.io/lif-bacteria-aerosols-ms/fluorophores_ms.html#spectra-for-top-most-fluorescent-particles
-
AC1: 'Reply on RC1 (0)', Alejandro Fontal, 04 Sep 2025
reply
Data sets
Rapid-E output for aerosolized fluorophores and Bacteria Alejandro Fontal, Sílvia Borràs, Lídia Cañas, Sofya Podzniakova, Xavier Rodó https://doi.org/10.5281/zenodo.15485702
Model code and software
GitHub Repository containing model code definitions and figures generation Alejandro Fontal, Sílvia Borràs, Lídia Cañas, Sofya Podzniakova, Xavier Rodó https://github.com/AlFontal/lif-bacteria-aerosols-ms
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
947 | 52 | 6 | 1,005 | 11 | 23 | 25 |
- HTML: 947
- PDF: 52
- XML: 6
- Total: 1,005
- Supplement: 11
- BibTeX: 23
- EndNote: 25
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
The presented work showcases the capabilities of a modified version of a PLAIR UV-LIF to distinguish certain bacteria from background noise as well as one bacterial species from the others. These kinds of investigations are certainly relevant for aerobiology as optical methods represent a new frontier of aerobiological sampling, and these kinds of studies are generally limited to pollen taxa.
While the work is interesting, I would have some comments for the authors: