the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Reviews and syntheses: Use and misuse of peak intensities from high resolution mass spectrometry in organic matter studies: opportunities for robust usage
William Kew
Allison Myers-Pigg
Christine Chang
Sean Colby
Josie Eder
Malak Tfaily
Jeffrey Hawkes
Rosalie Chu
Abstract. Earth’s biogeochemical cycles are intimately tied to the biotic and abiotic processing of organic matter (OM). Spatial and temporal variation in OM chemistry is often studied using high resolution mass spectrometry (HRMS). An increasingly common approach is to use ecological metrics (e.g., within-sample diversity) to summarize high-dimensional HRMS data, notably Fourier transform ion cyclotron resonance MS (FTICR MS). However, problems arise when HRMS peak intensity data are used in a way that is analogous to abundances in ecological analyses (e.g., species abundance distributions). Using peak intensity data in this way requires the assumption that intensities act as direct proxies for concentrations, which is often invalid. Here we discuss theoretical expectations and provide empirical evidence why concentrations do not map to HRMS peak intensities. The theory and data show that comparisons of the same peak across samples (within-peak) may carry information regarding variation in relative concentration, but comparing different peaks (between-peak) within or between samples does not. We further developed a simulation model to study the quantitative implications of both within-peak and between-peak errors that decouple concentration from intensity. These implications are studied in terms of commonly used ecological metrics that quantify different aspects of diversity and functional trait values. We show that despite the poor linkages between concentration and intensity, the ecological metrics often perform well in terms of providing robust qualitative inferences and sometimes quantitatively-accurate estimates of diversity and trait values. We conclude with recommendations for using peak intensities in an informed and robust way for natural organic matter studies. A primary recommendation is the use and extension of the simulation model to provide objective, quantitative guidance on the degree to which conceptual and quantitative inferences can be made for a given analysis of a given dataset. Without objective guidance, researchers that use peak intensities are doing so with unknown levels of uncertainty and bias, potentially leading to spurious scientific outcomes.
- Preprint
(1889 KB) -
Supplement
(367 KB) - BibTeX
- EndNote
William Kew et al.
Status: final response (author comments only)
-
RC1: 'Comment on kew et al _ egusphere', Anonymous Referee #1, 15 Nov 2022
Comments on: Use and misuse of peak intensities for HRMS on OM studies: opportunities for robust usage. Kew et al. https://doi.org/10.5194/egusphere-2022-1105
General comments
The authors present a theoretical review of factors affecting the relationship between absolute quantity of a compound vs. analytical response as measured by high resolution mass spectrometry, as well as an experiment showing the differences in response factor for standard compounds in (negative ion) ESI-MS when measured in different matrices. They then discuss the ramifications of the fact that equal quantities of different compounds in a sample can produce different mass peak intensities or that the same quantity of compound may have a variable response in different samples depending on matrix, for the treatment of HRMS data from DOM analysis with statistical approaches designed for population ecology. As my expertise is mostly in the area of MS I will focus most of my comments on those sections.
The theoretical factors governing the response and identification of compounds in MS analysis, that the authors describe, are well known. Within the LC/MS community it is well known that one should not compare apples with pears, and that even comparing apples may be tricky as quantitation is difficult and often semi-quantitative. Not understanding the confounding factors can lead to over-interpretation of lc/ms data. Whether this message is something that needs to (still) be learned within the DOM community is unclear to me. I hope a reviewer from that community has more to say about that. The experiments with a set of standard compounds measured in different matrices is a nice illustration of the effect of ion suppression by matrix but is probably not necessary for the message of the manuscript as it is mostly a text book experiment with accordingly predictable outcome. The more novel part of this manuscript lies in the discussion on how these quantitative errors and uncertainties might influence the data treatment and outcomes. However, as I indicate in the comments below, I don’t think that the in silico data set was generated with sound choices. Below I list several specific comments and questions to be addressed. In general my recommendation is to shorten and simplify the descriptions of the factors governing quantitative response in (LC)MS, and focus the manuscript on the consequences for data treatment. A welcome additions would be a discussion on how to remedy the problems. As this probably entails more than major revisions, I recommend rejection at this time.
Specific comments
The second paragraph (lines 49 to 55) can be shortened and incorporated in the third paragraph, specifically in between line 67 and 68.
Line 75 to 76: peak intensity actually is proportional to differences in concentration for a certain compound, if the conditions are kept the same, otherwise no-one would be able to produce a standard curve. However the response factor (response per amount) may differ from compound to compound and for a compound depending on matrix and other factors. This should be rephrased.
Line 96 to 99: move this section to the end of this section 2.3 as this is a concluding remark.
Line 101 to 112: Formation of (mixed) dimers and trimers should also be mentioned, as well as in-source fragmentation. There are several nice reviews on ion suppression , it would be good to reference a few here. The influence of the choice of ionization mode ( + vs - ionization; APCI vs ESI) should also be discussed. Also, a discussion on the use or (mis) use of internal standards would be a nice addition to this discussion
Line 114 to 122: the issue described here is in fact not an ionization bias, but the inability to separate isobaric species. It simply describes the fact that when analyzing a complex (DOM) sample, any peak in any MS1 spectrum can in fact consist of the signal of multiple compounds with identical m/z (within the mass accuracy specifications) and is therefore a cumulative response of those compounds. This would still be the case even if the ionization of each compound was perfect at 100% efficiency.
Section 2.3 The effect of dilution of any compound by abundant matrix should be discussed: in trapping instruments the fill rate of the trap (or ion target) is often a programmable parameter. If a compound, present
at a given quantity is the most abundant compound there, than the trap is mostly filled with that compound. If
the compound is present in the same quantity but with together with a lot of matrix ions, the matrix ions will
take up part or most of the available space in the trap, effectively ‘diluting” the compound and thereby to
underestimation of its actual quantity. Maybe a better title for this section would be “ion transmission and
collection”
Lines 172 to 251: Section 3, in my opinion, can be removed in its entirety. These are predictable text book
experiments and whatever extra information is discussed can be incorporated in section 2.
Line 331 to 354: generation of in silico data set. I find the use of errors well above 1 (like 1.5) debatable.
Although ion enhancement does happen it is very rare and seldom that pronounced. Even in fig 4, the increase
shown in panel C from 0 to 2 ppm matrix added, is attributed to the addition of endogenous compounds from
the matrix. So is this realistic? What would happen if you make the error non-gaussian?
Line 375: Again a comment about the choices underlying the in silico data set: Ionization efficiencies do not
randomly vary for a compound across samples. In general, more complex samples with more matrix will have
less ionization efficiency for all compounds in that sample. The deviations are not truly random.
Lines 385-408: as wordy as some of the other section are, as quickly the authors go through the consequences
for the application of the statistical models. Some parts were truly unreadable to me as a person not involved
in statistics and models. The concept of mean trait values and figure 11 are hardly introduced and poorly
explained.
Line 483 – 489; After reading the entire manuscript the conclusion that the ecological metrics actually perform
quite well, came as a bit of surprise to me. The tone of the manuscript as a whole is quite negative towards
these concepts.
Fig 4: What does relative intensity mean here? Relative to what? As no bar is ever reaching 100%, is the
response of the compound with no matrix or SRFA added 100%? Please clarify. Which of the differences
between bars is statistically significant? However, I recommend to remove this figure along with section 3.
Fig. 8. Panel A, B and D are data clouds, which seems a logical outcome of the way errors are assigned in the in
silico data set. But why the convergence to 0 for the true to observed difference in panel C? I do not see the
strong resemblance between panel A and C described in line 383 of the text.Citation: https://doi.org/10.5194/egusphere-2022-1105-RC1 -
RC2: 'Comment on egusphere-2022-1105', Anonymous Referee #2, 20 Mar 2023
Given the growing literature on the application of ecological analyses to high-resolution mass spectrometry, a careful assessment of the bias, flaws, and assumptions is timely and a welcome contribution to the expanding subdiscipline. In particular, the suggestion to expand modeling approaches, including machine learning or hierarchical modeling, to quantify the magnitude of errors is an important one for future research.
Meanwhile, the empirical results in this study show that many ecological metrics derived from the peak intensities provide valid patterns. This is an important result and could be highlighted alongside papers demonstrating the acquisition of quantitative data from HMRS (e.g., Kruve et al., 2020; Groff et al., 2022).
On line 483, it is stated that HRMS has many weaknesses, just like any analytical platform. Most practitioners of HRMS would agree that the biases presented in this paper are present (as reviewed in Urban et al., 2016; Kujawinski et al., 2010). Many of these biases (Viera-Silva et al., 2019) exist with other compositional data as well (unlike the statement on lines 73-77), such as microbiome data and have produced solutions such as those reviewed in Gloor et al 2017, such as the creation of internal standards (Hardwick et al., 2018).
Together, these two foci highlight the most problematic aspect of this paper: The tone is much too negative to accurately reflect the reality of the (very common) use of peak intensities. A more balanced and contrasted view should be taken so as not to alienate specialist readers or mislead those less well-versed. The tone leaves the feeling that the bulk of the literature in the past decade is highly flawed and not to be trusted. This is not impossible, but if this is what the authors are trying to convey, the analysis must be made much more robust. I suggest the authors change the tone to ensure that the key messages (the utility of ecological metrics and some of their drawbacks) are most effectively conveyed.
I am also concerned that some of the assumptions of the empirical work have significant technical flaws. This may be improved by providing greater transparency for the selection choices.
1) The errors of their simulation model. How can a random selection between 0 and 100 for the simulated errors be justified (lines 352 and 369)? Why is 0 included in the random selection? The decision for this range should be motivated by actual evidence, such as from the experiments measuring variation in peak intensities of analytes of known concentration. Examining Fig, 4b. it looks like a better error selection would be between 1-8. Without further justification, the results of the in silica simulation model appear quite arbitrary.
2) Peak intensities are normally distributed in HRMS data (e.g. He et al., 2020). The way the authors generate random intensities does not reproduce the normal distribution in peak intensities.
3) The number of peaks. The simulation models use either 100 or 1000 peaks. These are not environmentally relevant. Most environmental studies have several thousand peaks, where the authors nicely and unequivocally show in Fig. S2 that there is absolutely no bias in the calculation of ecological metrics, that is, the observed vs true R2 values approximate 1. For this reason, any simulations with small numbers of peaks are misleading and not relevant to most studies.
4) Why these specific standards? What are their features beyond just an absence in natural DOM. I think there needs to be a description of what makes the molecular structures, ionization properties, etc… of these analytes appropriate spike-ins?
5) No evidence is presented why these higher analyte concentrations (>200 ppb) or with relative intensities of individual peaks >1% will ever be realistic. I similarly don’t understand why the summed relative intensities exceed 100% in Fig 4a in the absence of SRFA.
There are also several strong statements that I do not believe are sufficiently supported by the scientific evidence presented in the paper. These are on line 245 “Strategies to use calibration curves will fail” and line 324 “The previous sections show that between-peak changes in peak intensity do not accurately reflect between-peak changes in abundance”.
Fig. 4 seems to show that the relative intensity scales with concentration. The authors can predict this by a nonlinear model or GAM for each analyte, or with a single model where the slope varies with the m/z (for example). Without having attempted such an analysis it is difficult to understand how this statement is supported. Figure 4 also nicely shows that you can reach quantitative assumptions between-peaks. As shown in Figure 4a, at low concentrations of the three different molecules (<100 ppb), the signal intensities seem statistically indistinguishable. A similar result is seen, especially in the MeOH Matrix, at higher concentrations of SRFA that effectively dilute the analytes to representative concentrations. These results suggest that the analytes are performing quantitatively.
Additional comments:
Line 60: The authors should cite the reference of the first use of 21T FT-ICR-MS (Smith et al., 2018).
L195-205 – This point is already made on L122
Figure 4- relative intensity is not explained. Relative to what?
Thank you for your thoughtful consideration of these points.
Refs:
Kruve, A. (2020). Strategies for drawing quantitative conclusions from nontargeted liquid chromatography–high-resolution mass spectrometry analysis.
Groff, L. C., Grossman, J. N., Kruve, A., Minucci, J. M., Lowe, C. N., McCord, J. P., ... & Sobus, J. R. (2022). Uncertainty estimation strategies for quantitative non-targeted analysis. Analytical and Bioanalytical Chemistry, 414(17), 4919-4933.Urban, P. L. (2016). Quantitative mass spectrometry: an overview. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2079), 20150382.
Kujawinski, E. B. (2002). Electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (ESI FT-ICR MS): characterization of complex environmental mixtures. Environmental Forensics, 3(3-4), 207-216
Vieira-Silva, S., Sabino, J., Valles-Colomer, M., Falony, G., Kathagen, G., Caenepeel, C., ... & Raes, J. (2019). Quantitative microbiome profiling disentangles inflammation-and bile duct obstruction-associated microbiota alterations across PSC/IBD diagnoses. Nature microbiology, 4(11), 1826-1831.
Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V., & Egozcue, J. J. (2017). Microbiome datasets are compositional: and this is not optional. Frontiers in microbiology, 8, 2224.
Hardwick, S. A., Chen, W. Y., Wong, T., Kanakamedala, B. S., Deveson, I. W., Ongley, S. E., ... & Mercer, T. R. (2018). Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis. Nature communications, 9(1), 3096.
He, Chen, et al. "In-house standard method for molecular characterization of dissolved organic matter by FT-ICR mass spectrometry." ACS omega 5.20 (2020): 11730-11736.
Smith, D. F., Podgorski, D. C., Rodgers, R. P., Blakney, G. T., & Hendrickson, C. L. (2018). 21 tesla FT-ICR mass spectrometer for ultrahigh-resolution analysis of complex organic mixtures. Analytical chemistry, 90(3), 2041-2047.
Citation: https://doi.org/10.5194/egusphere-2022-1105-RC2
William Kew et al.
William Kew et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
485 | 237 | 11 | 733 | 54 | 3 | 9 |
- HTML: 485
- PDF: 237
- XML: 11
- Total: 733
- Supplement: 54
- BibTeX: 3
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1