Preprints
https://doi.org/10.5194/egusphere-2022-1221
https://doi.org/10.5194/egusphere-2022-1221
20 Dec 2022
 | 20 Dec 2022

Positive Matrix Factorization of Large Aerosol Mass Spectrometry Datasets Using Error-Weighted Randomized Hierarchical Alternating Least Squares

Benjamin Sapper, Daven Henze, Manjula Canagaratna, and Harald Stark

Abstract. Weighted positive matrix factorization (PMF) has been used by scientists to find small sets of underlying factors in environmental data. However, as the size of the data has grown, increasing computational costs have made it impractical to use traditional methods for this factorization. In this paper, we present a new weighting method to dramatically decrease computational costs for these traditional algorithms. We then apply this weighting method with the Randomized Hierarchical Alternating Least Squares (RHALS) algorithm to a large environmental dataset, where we show that interpretable factors can be reproduced using these methods. We show this algorithm results in a computational speedup of 38, 67, and 634 compared to the Multiplicative Update (MU), deterministic Hierarchical Alternating Least Squares (HALS), and non-negative Alternating Least Squares (ALS) algorithms, respectively. We also investigate rotational ambiguity in the solution, and present a simple “pulling” method to rotate a set of factors. This method is shown to find alternative solutions, and in some cases, lower the weighted residual error of the algorithm.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Share

Journal article(s) based on this preprint

19 May 2025
Positive matrix factorization of large real-time atmospheric mass spectrometry datasets using error-weighted randomized hierarchical alternating least squares
Benjamin C. Sapper, Sean Youn, Daven K. Henze, Manjula Canagaratna, Harald Stark, and Jose L. Jimenez
Geosci. Model Dev., 18, 2891–2919, https://doi.org/10.5194/gmd-18-2891-2025,https://doi.org/10.5194/gmd-18-2891-2025, 2025
Short summary
Benjamin Sapper, Daven Henze, Manjula Canagaratna, and Harald Stark

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2022-1221', Anonymous Referee #1, 11 Apr 2023
  • RC2: 'Comment on egusphere-2022-1221', Anonymous Referee #2, 20 Apr 2023

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2022-1221', Anonymous Referee #1, 11 Apr 2023
  • RC2: 'Comment on egusphere-2022-1221', Anonymous Referee #2, 20 Apr 2023

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
AR by Benjamin Sapper on behalf of the Authors (15 Jun 2023)  Author's response   Author's tracked changes   Manuscript 
ED: Referee Nomination & Report Request started (20 Jun 2023) by Klaus Klingmüller
RR by Anonymous Referee #1 (07 Jul 2023)
RR by Anonymous Referee #2 (31 Jul 2023)
ED: Reconsider after major revisions (26 Aug 2023) by Klaus Klingmüller
AR by Sean Youn on behalf of the Authors (01 Aug 2024)  Author's response   Author's tracked changes   Manuscript 
ED: Referee Nomination & Report Request started (14 Aug 2024) by Klaus Klingmüller
RR by Anonymous Referee #3 (08 Feb 2025)
ED: Publish subject to minor revisions (review by editor) (11 Feb 2025) by Klaus Klingmüller
AR by Sean Youn on behalf of the Authors (21 Feb 2025)  Author's response   Author's tracked changes   Manuscript 
ED: Publish as is (03 Mar 2025) by Klaus Klingmüller
AR by Sean Youn on behalf of the Authors (05 Mar 2025)

Journal article(s) based on this preprint

19 May 2025
Positive matrix factorization of large real-time atmospheric mass spectrometry datasets using error-weighted randomized hierarchical alternating least squares
Benjamin C. Sapper, Sean Youn, Daven K. Henze, Manjula Canagaratna, Harald Stark, and Jose L. Jimenez
Geosci. Model Dev., 18, 2891–2919, https://doi.org/10.5194/gmd-18-2891-2025,https://doi.org/10.5194/gmd-18-2891-2025, 2025
Short summary
Benjamin Sapper, Daven Henze, Manjula Canagaratna, and Harald Stark
Benjamin Sapper, Daven Henze, Manjula Canagaratna, and Harald Stark

Viewed

Total article views: 1,005 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
662 300 43 1,005 40 45
  • HTML: 662
  • PDF: 300
  • XML: 43
  • Total: 1,005
  • BibTeX: 40
  • EndNote: 45
Views and downloads (calculated since 20 Dec 2022)
Cumulative views and downloads (calculated since 20 Dec 2022)

Viewed (geographical distribution)

Total article views: 978 (including HTML, PDF, and XML) Thereof 978 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 19 May 2025
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Positive Matrix Factorization (PMF) has been used by atmospheric scientists to extract underlying factors present in large datasets. This paper presents a new technique for weighted PMF that drastically reduces the computational costs of previously developed algorithms. We use this technique to deliver interpretative factors and solution diagnostics from an atmospheric chemistry dataset.
Share