the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Occupancy history influences extinction risk of fossil marine microplankton groups
Abstract.
Geographic range has long been acknowledged as an important determinant of extinction risk. The trajectory of geographic range through time, however, has not received as much scientific attention. Here, we test the role of change in geographic range – assessed by a measure of proportional occupancy of grid cells – in determining the extinction risk in four major microplankton groups: foraminifera, calcareous nannofossils, radiolarians, and diatoms. Logistic regression was used to assess the importance of standing occupancy, occupancy change, and sampling probability in the extinction risk of species. We find that while standing occupancy is a major determinant of extinction risk in all microplankton groups, change in occupancy accounts for an average of 52 % of the explanatory power of the three analyzed variables, with a maximum value of 92 %. Sampling probability was also found to be consistently informative, with an average of 6 % and a maximum value of 22 %. Our results highlight the importance of incorporating both geographic range and its change through time, as well as sampling probability, into extinction models. The ability of occupancy trajectory to help predict extinction risk underlines the necessity of paleontological data in modern conservation efforts.
- Preprint
(729 KB) - Metadata XML
-
Supplement
(697 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2597', Steven Holland, 05 Feb 2025
# Overview
This is an intriguing manuscript of a rich data set appropriately analyzed with multivariate logistic regression. It makes a significant point for four major planktonic groups: that change in occupancy is a good predictor of extinction risk. It substantially adds to the ecological literature on extinction risk, emphasizing a factor that can be addressed only through the fossil record. As the authors point out, the importance of occupancy for extinction risk has been shown for marine invertebrates. Still, it is encouraging to see that change in occupancy has importance that extends to the plankton and is likely to matter more broadly.
The manuscript is well-written, and the figures and tables are well-constructed. I have a few general points that should addressed before publication, plus a few minor comments. In short, the authors should:
* Simplify the models so that they include only statistically significant terms.
* Present a table of model coefficients for each 16 taxon/time-bin combination, displaying only the statistically significant fitted coefficients.
* Present a table of LMG relative importance for each of the statistically significant coefficients, as well as the overall percent variance explained for each of the 16 taxon/time-bin combinations.
* Update the text to emphasize patterns in LMG relative importance across the models.
* Delete the tables (2 and 3) and figures (3 and 4) that are no longer needed.These are easily accomplished, so I consider these moderate revisions. I would be happy to review a revised manuscript.
# Principal comments
## Multiplicative vs. additive models
There's a mixup about the meaning of the + and * symbols in specifying models in R, and this is not unique to the authors; many are misled by these symbols. These symbols are overloaded (in programmer speak); that is, in the context of specifying a linear model, they do not have their commonplace meanings of add and multiply. In R's formulation of a linear model, y ~ x + z means "y as a function of x and z," not "y as a function of x plus (or added to) z." Likewise, y ~ x * z means "y as a function of x and z and the interaction x and z," not "y as a function of x times (or multiplied by) z."
Confusing these meanings has led the authors to describe some models as additive and others as multiplicative; instead, these should be described as "without interactions" and "with all interactions.” This is a small matter that can be quickly dealt with; I raise it because it feeds into the following more critical comments on model selection and reporting.## Model selection
A surprising result of this study is that the full-interaction model (extinction ~ o * oc * s, where o is occupancy, oc is occupancy change, and s is sampling) is selected in 15 out of the 16 group/bin-size combinations (Table 2). When alternative models — Table S2, where 16/16 favor the full-interaction model, and Table S4, where 13/16 favor the full-interaction model — are considered, this rises to 44/48 (92%) of all models. This full-interaction model (extinction ~ o * oc * s) expands to the formula extinction ~ o + oc + s + o:oc + o:s + oc:s + o:oc:s (each of the terms with a colon indicates the interaction of predictor variables), so it is also called the saturated model.
The fact that the saturated model was selected in 92% of the cases is highly unusual, and I've never come across such a situation in my work, including all the student projects in my data analysis class. By selecting these models, the authors are, in effect, claiming that all seven terms (o, oc, s, o:oc, o:s, oc:s, and o:oc:s) are statistically significant (i.e., the coefficients are demonstrably non-zero). That's what is unusual; seldom are all the interaction terms significant, and often, one or more non-interaction terms are not significant.
Examining the supplemental material gives a glimpse of what is going on. For example, using the foraminifera / 1 m.y. combination, where the full-interaction ("multiplicative" in table 2) model was selected, a summary of the model [using summary(foram_1000_mod1)] shows that four of the predictors are not statistically significant (o, o:oc, o:s, o:oc:s). The model must be simplified to include only the significant terms. Although one might be tempted to discard all the non-significant terms at this step, that's inadvisable because removing one term can cause another term to become significant. One must us model simplification so that significant terms are included, and non-significant terms are excluded(see the excellent coverage in Crawley's The R Book).
With seven predictor variables, exploring all possible (128) combinations manually is not feasible. The stepAIC() function provides a way to simplify the process. It starts with the saturated model (e.g., ext ~ o * oc * s), removes non-significant terms using AIC, and stops when a simplified model containing only statistically significant (i.e., demonstrably non-zero) coefficients is left. Calling summary() on that model shows the coefficients. What it produces is the simplest model that fits the data best. The authors can do this easily by calling three commands for each model; here's an example using the foraminifera/1 m.y. data:
saturatedModel <- glm(extinction ~ raw_prop_sampled * delta_1 * sampling, family = binomial(link ="logit"), data = F_1000000[[1]])
simplifiedModel <- stepAIC(saturatedModel)
summary(simplifiedModel)Using AIC for model selection, as the authors have done, is good, but there are many more models to consider than the five the authors picked (o, oc, s, o+oc+s, and o*oc*os). StepAIC() facilitates finding the best models. The saturated model may be better than the other four in most cases, but this is likely because the four simple models miss significant predictors. Using stepAIC() also eliminates the need to test the four simple cases they do explicitly. Instead, start with the saturated model and let stepAIC() simplify it. If the best model is one of the authors’ four simple cases (o, oc, s, o+oc+s), that will be discovered by using stepAIC().
## Model reporting
The results of the models need to be reported, something that’s largely missing from the current manuscript. For any mathematical model, two aspects need to be reported: the fitted coefficients and their effect sizes, that is, how necessary each coefficient is for modeling the response variable (extinction here). Although the authors touch on effect size partly by reporting some LMG relative importance values, they don't do it comprehensively, which makes comparing the relative importance of the predictor variables impossible. LMG relative importance is one of the most informative parts of the modeling because it's not just whether occupancy change matters but how much it matters relative to the other predictors (occupancy, sampling, and the various interactions).
Two tables are needed in the main body of the manuscript and not in the supplemental material; both should have 16 rows, one each for the 16 taxon/bin-size combinations. The first should have seven columns for the model coefficients for o, oc, s, o:oc, o:s, oc:s, and o:oc:s, obtained with summary(model). Where a coefficient is not statistically significant (i.e., not demonstrably different from zero), it should be left blank, or a dash should be used instead of its value. The caption should indicate that these are the statistically significant (i.e., statistically non-zero) coefficients for each model. Given the rampant confusion over what statistically significant means, it's advisable to add the "i.e., statistically non-zero" to clarify what significance means.
The second table is similar, but it should present effect size through LMG relative importance for each of the statistically significant model coefficients (all seven). Again, leave blanks or just a dash for the non-significant coefficients in any model. The table should also have an additional column for overall explained variance or something similar to show how completely the entire model can explain extinction.
These two tables will convey what's important about the models, and they allow Tables 2 & 3 and Figures 3 & 4 to be deleted as they become unnecessary. It is unnecessary to report the model AIC values in the main body of the text; they’re helpful in model selection but not for comparing among the models owing to differences in sample size. They’re also not as relevant as the model coefficients and relative importance.
Adding these tables will also necessitate new text to discuss the patterns in relative importance among the taxon/bin-size contributions. Those will be an essential take-home from this research and one of broad interest, especially for the comparative importance of occupancy, occupancy change, and sampling. If specific interactions are common and meaningful, that will also be worth discussion.
## Why history might matter
It's fascinating the occupancy change matters, as has been shown for several marine invertebrate groups, especially so at the time scales of 100 kyr, 200 kyr, 500 kyr, and 1 myr. Organisms have no way of looking backward; their populations go up or down based on conditions in the moment, so the importance of occupancy change must lie in the state of the ecological system as a whole. In other words, if occupancy change matters, it must mean that something about the present ecological system is unfavorable, and that it has been persistently unfavorable.
It does make me wonder if the increasing importance of occupancy at longer time scales is just a reflection of a species being on the declining side of Foote/Liow occupancy curves. In other words, is all that's being said is that a species in a long-term decline is more likely to go extinct than one that is not? Even if it is, that's distinct from rarity, which ecologists focus on, and it's an insight that can be gained only from the fossil record.
# Minor matters, keyed to line numbers
48: replace "looking at" with "of".
52: By disappearance, is extinction meant? Extirpation, too?
63: Insert a short explanation of what is meant by sampling. Is it the proportion of cells having some data?
96: No change is needed, but I am surprised by the low proportion of extant species with usable records — at best 12%?!
121-123: No change is needed, but I am surprised that pacman profiling trims so much of the data, particularly from the top of the column. I would have guessed that last occurrences would be better constrained and more reliable in a core, given that downhole collapse affects first occurrences.
128–129: Most methods are admirably well-explained, with only two exceptions: simple completeness and three-timer completeness. Both could use a single-sentence definition for readers not familiar with them.
146–148: Delete simply and actually (lines 146–148, for example) from the methods; they're not needed.
206–211: I'm wondering about multicollinearity in this model, that is, the extent to which the three predictions (o, oc, and s) are correlated. A table of the correlations of the predictors for each data set could be added to the supplemental material. If these are large, though, the authors will need to address the implications of multicollinearity for model interpretation, which can be difficult.
266, 359: There should be a space between a value and its units.
314: It seems improbable that three of these values are all 0.053. Is that correct, or are there typos here?
333, elsewhere: Results should be reported in the present tense, not the past tense.
541: Burnham and Anderson (2002) isn't cited. It's worth checking the other references as well.
Supplemental 116–117: replace "approaching the present" with "from the Pliocene to the Pleistocene".
The authors should feel free to contact me with regards to any aspect of this review.
With kind regards,
Steven Holland
stratum@uga.edu
Citation: https://doi.org/10.5194/egusphere-2024-2597-RC1 -
RC2: 'Comment on egusphere-2024-2597', Anonymous Referee #2, 08 Feb 2025
Thank you for the opportunity to read this manuscript, ‘Occupancy history influences extinction risk of fossil marine microplankton groups,’ by Smith, Kocsis, and Kiessling. I was extremely intrigued by the premise and results of the study. The prediction of modern extinction was interesting. The addition of robustness testing in analysis made this work particularly convincing to me, and as such, I have relatively few/minor comments.
I apologise for my delay returning these comments, and I hope that these suggestions prove helpful in refining the work.
Minor comments
- The result that ‘the temporal scale by which we analyze these data can influence our understanding of extinction risk in marine microplankton’ stands out as a strong finding. I recommend adding this to the abstract.
- Line 47: Saulsbury et al. (2023) in PNAS is a recent paper to add. Although it is couched in terms of age-dependent extinction risk, it is equally about abundance-dependent and therefore range-size-dependent extinction risk. https://doi.org/10.1073/pnas.2307629121
- Line 47: I would also ask to rewrite the manuscript sentence on this line. It’s unnecessary to claim the topic is understudied - I am convinced of its inherent importance regardless! To say range trajectory dynamics are ‘sparingly’ considered and ‘many studies exclude [it]’ feels as though it could slight those who have indeed worked on the topic.
- Line 50: I suggest omitting this sentence (beginning ‘increasing anthropogenic impact’), as it reads as a non sequitur. The topic is already well-justified as worthy of study without this.
- Line 60: I don’t follow this sentence (‘variations in the material…’) and therefore recommend removing it for concision.
- Line 67: Rather than pose the possible results as a binary set of options (‘whether the trajectory of geographic occupancy actually influences extinction risk [or not]’), I would reword this to reflect the nuance of your methods to investigate the degree to which geographic occupancy explains extinction risk.
- Line 179: I assume the authors mean the final data table, i.e. dataset?
- Line 279: Name specific examples of the ‘various biotic events in the Cenozoic’ that ‘can be detected’.
- Line 318: The first use of ‘relative importance value’ is this paragraph needs explaining – this is the relative importance of occupancy change to standing occupancy, correct?
- Line 348: What was the D2 and the relative importance of the occupancy change term in this robustness-test version of analysis?
- Line 372: Changes in circulation and stratification are key influences on plankton biodiversity distributions, yes, but I am less confident to say they are THE key influences. There are additional influences beyond these two factors as well.
- Line 374: There is a lot of contrary evidence to this. For instance, plankton lateral shifts are documented in two Nature papers out this year - Ying et al. [https://doi.org/10.1038/s41586-024-08029-0] and Chaabane et al. [https://doi.org/10.1038/s41586-024-08191-5] A separate factor that matters is edge effects: species at poles and the equator are at the edge of the world, climatically speaking, as there is nowhere for them to migrate that is colder than found at poles or warmer than found at equator, respectively. I would consider this geographic constraint different to a mobility/biological constraint.
- Line 443: There are some logical steps in this sentence that are currently missing and need to be spelled out (or remove the sentence). What does nutrient limitation have to do with this?
- Line 478: Rather than ‘across the tree of life,’ I would say ‘across marine life’. Neither this work nor Kiessling and Kocsis (2016) analysed terrestrial species, which have substantially different migration capacities and population sizes.
- Section 5 (Conclusion): While I agree historic data are relevant to informing future predictions of biodiversity loss, I am (regretfully) unconvinced conservation biologists would take the findings of this study as important to influence management decisions on ‘where to spend limited conservation funding in the future’—something that is sadly dictated more often by what is feasible to achieve in local sociopolitical contexts than what is forecast by scientific models. What about substituting the second sentence in this paragraph and instead mentioning how these findings are empirical contributions to help resolve open evolutionary theory about how range reductions end in extinction over the long timescales of species ‘lifetimes’, i.e. age-dependency of extinction rates?
- Table 2: Were the top models always only (a) fully multiplicative or (b) fully additive? That is, were there never cases of a model such as ex ~ oc * ch + p? Or were such models not considered? This is hard to interpret based on how Table 2 is presented.
- Cite version numbers for all software packages: NSBcompanion (line 76), divDyn (line 131), taxize (line 221), and any others.
Modelling suggestions
I have three suggestions about the analysis framework, none of which I expect to substantively change the findings but which may increase the informativeness (for the first), rigor (for the second), and accuracy (for the third).
My first observation and the one that would require more work is about the definition of ‘completeness.’ Section 2.2, beginning line 126, describes metrics that measure only temporal completeness. I agree range-through completeness is relevant and interesting as a parameter; I found myself repeatedly wondering what an additional measure of spatial completeness would show. For example, if the analysis were repeated with a count of global occupied grid cells (already used to derive proportional occupancy) as a measure of ‘completeness,’ what would be its explanatory power?
A different thought about the models is whether a mixed-effects framework might be appropriate. My understanding of the input data is that individual species contribute multiple observations to the dataset (from different points in their ‘lifespan’). Is it possible to add species identity as a random effect, reducing the estimated degrees of freedom to account for pseudo-replicate observations, and still calculate deviance? Maybe adding a random effect to the GLMs will make them fail to converge – it might not be feasible – but then again it might be simple to implement. I wouldn’t require this change but do think it worth investigating.
Lastly, since the model used to predict modern taxa (section 2.7) couldn’t include a three-timer sampling value, the entire model selection and model fitting process should omit three-timer completeness as a possible variable. Perhaps this was already done, but the paragraph beginning line 228 made it seem as though the process was to take the best-fit model and then leave out the completeness covariate when predicting on the modern data. It would be better to find the best-fit model out of the model set that doesn’t have this term to begin with, then make predictions.
Citation: https://doi.org/10.5194/egusphere-2024-2597-RC2
Data sets
Preprint – Occupancy Trajectory Microplankton Isaiah E. Smith https://doi.org/10.5281/zenodo.7745607
Model code and software
Preprint – Occupancy Trajectory Microplankton Isaiah E. Smith https://doi.org/10.5281/zenodo.7745607
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
195 | 43 | 11 | 249 | 15 | 4 | 5 |
- HTML: 195
- PDF: 43
- XML: 11
- Total: 249
- Supplement: 15
- BibTeX: 4
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1