the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Chemical sparsity in Bayesian receptor models for aerosol source apportionment
Abstract. Aerosol source apportionment is a key tool for understanding the origins of atmospheric particulate matter and for guiding effective air quality management strategies. However, source apportionment techniques still struggle to properly separate highly correlated sources without relying on restrictive a priori information, possibly skewing the solution and adding subjective operator input, with varying degrees of benefit. This study introduces sparsity into the Bayesian Autocorrelated Matrix Factorisation (BAMF) model with the aim of removing non-essential species contribution in the unconstrained profiles, which is expected to improve the separation of factors. The regularised horseshoe prior (HS) has been added to BAMF (BAMF+HS) to promote composition matrix F sparsity, shrinking low-signal contributions to the solutions. BAMF+HS was evaluated using three synthetic datasets designed to reflect increasing levels of data complexity (Toy, Offline, and Online), and a real-world multi-site filter dataset. The results demonstrate that BAMF+HS effectively enforces sparsity in offline datasets and that this improves accuracy in reconstructing source profiles and time series compared to BAMF and Positive Matrix Factorisation (PMF). However, its application to higher-complexity ACSM datasets revealed sensitivity to sampling instability hindering sparsification. With that, even though sparsity was not achieved, the quality of the BAMF+HS solution metrics were not deprecated compared to BAMF. Overall, this work underscores the value of incorporating profile sparsity as a solution property in Bayesian source apportionment, and positions BAMF+HS as a promising model for source apportionment.
- Preprint
(2475 KB) - Metadata XML
-
Supplement
(2894 KB) - BibTeX
- EndNote
Status: open (until 21 Jan 2026)
- RC1: 'Comment on egusphere-2025-5253', Anonymous Referee #1, 08 Jan 2026 reply
Data sets
Datasets for BAMF+HS test Marta Via et al. https://github.com/martavia0/BAMF-horseshoe/tree/main/datasets
Model code and software
Models for Bayesian Matrix Factorisation Marta Via et al. https://github.com/martavia0/BAMF-horseshoe/tree/main/models
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 144 | 67 | 13 | 224 | 38 | 24 | 21 |
- HTML: 144
- PDF: 67
- XML: 13
- Total: 224
- Supplement: 38
- BibTeX: 24
- EndNote: 21
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Via et al. extend the Bayesian Autocorrelated Matrix Factorisation (BAMF) model for aerosol source apportionment by introducing profile sparsity via a regularised horseshoe (HS) prior on the composition matrix . This yields BAMF+HS, a Bayesian receptor model where: the classical receptor formulation is retained. Temporal autocorrelation in source contributions (from BAMF) is kept. A regularised HS prior shrinks low-signal entries of toward zero, encouraging chemically parsimonious profiles.
They evaluate BAMF+HS on:
They compare against BAMF (without sparsity) and PMF. The main findings are:
Overall, this is a timely and well-motivated methodological contribution to the atmospheric source-apportionment literature. The explicit use of a regularised horseshoe prior to enforce chemical sparsity in a Bayesian receptor model addresses a longstanding challenge in separating correlated sources without heavy, subjective constraints. The synthetic evaluations and the application to a real multi-site offline filter datasets are convincing. I recommend publication in AMT after major revision by addressing following comments:
General comments:
Specific comments:
Line 25: what is Toy?
Line 28: define ACSM
Line 39: Be more careful in referring OP as toxicity. I will be more conservative on this by refereeing it to one of the health metrics.
Line 43: first time introduce PMF needs to be spelled out. Also, PMF is one of the RMs to conduct source apportionment analysis instead of the only approach to do source apportionment. Please rephrase. Also, SA is not just identification, but also quantification sources. You will need to make that clear.
Line 46: decomposes -> deconvolute.
Line 50: I will avoid using mathematical terms like ℝn·m to accomendate wider audiences, suggesting spelling it out.
Line 53: some introduction about unconstrained PMF or constrained PMF is necessary.
Line 55: disentanglement to “identification”
Line 57: CMB does not 100% equal to fully constrained PMF. Also, I don’t know why you introduce CMB here. Perhaps you will need a few sentences in this paragraph to introduce the limitations of PMF or CMB in general.
Line 70: approach -> conduct
Line 82: overlapping emissions -> mixed emission sources
Line 83: slight F differences -> slight differences of F
Line 84-83: use even simpler language to briefly explain sparsity and why it makes sense to enforce it in PMF analyses. Also, change elements to variables since not all the variables of F is element.
Line 86-87: Change it to: “The accomplishment of sparse source fingerprints could represent “cleaner” emission sources without mixing among resolved factor profiles.”
Line 102: What is N? please introduce it in the text
Line 222: avoiding using hence twice in one sentence
Line 256: OK, now I understand what is the toy dataset. Is it more appropriate to use dummy instead of toy? It doesn’t make much sense to me when I first saw it in the beginning of the manuscript without context.
Table 3: For the “toy” dataset, the BAMF or BAMF+HS in general is worse than PMF results, could this be a major flaw of the BAMF? How can this be addressed?
Figure 1: it’s a bit confusing for me with the y-axis. They are not real m/z, right? Also, what is the unit of the y axis? Have you done some repeats of CMB or CMB+HS, and the y-axis are the frequency of the iterations end up with these concentrations? It’s not clear from your text and figure captions. Please clarify.
Section C.1 of SI: there are inconsistencies in BMF or BAMF, BAMF-GS or BAMF+GS.
Figure 3: You will need a legend for which color is which…
Figure 8: I’m still confused about what you are showing here. Is it the autocorrelation of each model of each source? Is it the model vs truth for each source for each model? Or is it the correlation of the autocorrelation between model vs truth? If it’s the third one, what does it mean?