the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Occupancy history influences extinction risk of fossil marine microplankton groups
Abstract.
Geographic range has long been acknowledged as an important determinant of extinction risk. The trajectory of geographic range through time, however, has not received as much scientific attention. Here, we test the role of change in geographic range – assessed by a measure of proportional occupancy of grid cells – in determining the extinction risk in four major microplankton groups: foraminifera, calcareous nannofossils, radiolarians, and diatoms. Logistic regression was used to assess the importance of standing occupancy, occupancy change, and sampling probability in the extinction risk of species. We find that while standing occupancy is a major determinant of extinction risk in all microplankton groups, change in occupancy accounts for an average of 52 % of the explanatory power of the three analyzed variables, with a maximum value of 92 %. Sampling probability was also found to be consistently informative, with an average of 6 % and a maximum value of 22 %. Our results highlight the importance of incorporating both geographic range and its change through time, as well as sampling probability, into extinction models. The ability of occupancy trajectory to help predict extinction risk underlines the necessity of paleontological data in modern conservation efforts.
- Preprint
(729 KB) - Metadata XML
-
Supplement
(697 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2597', Steven Holland, 05 Feb 2025
# Overview
This is an intriguing manuscript of a rich data set appropriately analyzed with multivariate logistic regression. It makes a significant point for four major planktonic groups: that change in occupancy is a good predictor of extinction risk. It substantially adds to the ecological literature on extinction risk, emphasizing a factor that can be addressed only through the fossil record. As the authors point out, the importance of occupancy for extinction risk has been shown for marine invertebrates. Still, it is encouraging to see that change in occupancy has importance that extends to the plankton and is likely to matter more broadly.
The manuscript is well-written, and the figures and tables are well-constructed. I have a few general points that should addressed before publication, plus a few minor comments. In short, the authors should:
* Simplify the models so that they include only statistically significant terms.
* Present a table of model coefficients for each 16 taxon/time-bin combination, displaying only the statistically significant fitted coefficients.
* Present a table of LMG relative importance for each of the statistically significant coefficients, as well as the overall percent variance explained for each of the 16 taxon/time-bin combinations.
* Update the text to emphasize patterns in LMG relative importance across the models.
* Delete the tables (2 and 3) and figures (3 and 4) that are no longer needed.These are easily accomplished, so I consider these moderate revisions. I would be happy to review a revised manuscript.
# Principal comments
## Multiplicative vs. additive models
There's a mixup about the meaning of the + and * symbols in specifying models in R, and this is not unique to the authors; many are misled by these symbols. These symbols are overloaded (in programmer speak); that is, in the context of specifying a linear model, they do not have their commonplace meanings of add and multiply. In R's formulation of a linear model, y ~ x + z means "y as a function of x and z," not "y as a function of x plus (or added to) z." Likewise, y ~ x * z means "y as a function of x and z and the interaction x and z," not "y as a function of x times (or multiplied by) z."
Confusing these meanings has led the authors to describe some models as additive and others as multiplicative; instead, these should be described as "without interactions" and "with all interactions.” This is a small matter that can be quickly dealt with; I raise it because it feeds into the following more critical comments on model selection and reporting.## Model selection
A surprising result of this study is that the full-interaction model (extinction ~ o * oc * s, where o is occupancy, oc is occupancy change, and s is sampling) is selected in 15 out of the 16 group/bin-size combinations (Table 2). When alternative models — Table S2, where 16/16 favor the full-interaction model, and Table S4, where 13/16 favor the full-interaction model — are considered, this rises to 44/48 (92%) of all models. This full-interaction model (extinction ~ o * oc * s) expands to the formula extinction ~ o + oc + s + o:oc + o:s + oc:s + o:oc:s (each of the terms with a colon indicates the interaction of predictor variables), so it is also called the saturated model.
The fact that the saturated model was selected in 92% of the cases is highly unusual, and I've never come across such a situation in my work, including all the student projects in my data analysis class. By selecting these models, the authors are, in effect, claiming that all seven terms (o, oc, s, o:oc, o:s, oc:s, and o:oc:s) are statistically significant (i.e., the coefficients are demonstrably non-zero). That's what is unusual; seldom are all the interaction terms significant, and often, one or more non-interaction terms are not significant.
Examining the supplemental material gives a glimpse of what is going on. For example, using the foraminifera / 1 m.y. combination, where the full-interaction ("multiplicative" in table 2) model was selected, a summary of the model [using summary(foram_1000_mod1)] shows that four of the predictors are not statistically significant (o, o:oc, o:s, o:oc:s). The model must be simplified to include only the significant terms. Although one might be tempted to discard all the non-significant terms at this step, that's inadvisable because removing one term can cause another term to become significant. One must us model simplification so that significant terms are included, and non-significant terms are excluded(see the excellent coverage in Crawley's The R Book).
With seven predictor variables, exploring all possible (128) combinations manually is not feasible. The stepAIC() function provides a way to simplify the process. It starts with the saturated model (e.g., ext ~ o * oc * s), removes non-significant terms using AIC, and stops when a simplified model containing only statistically significant (i.e., demonstrably non-zero) coefficients is left. Calling summary() on that model shows the coefficients. What it produces is the simplest model that fits the data best. The authors can do this easily by calling three commands for each model; here's an example using the foraminifera/1 m.y. data:
saturatedModel <- glm(extinction ~ raw_prop_sampled * delta_1 * sampling, family = binomial(link ="logit"), data = F_1000000[[1]])
simplifiedModel <- stepAIC(saturatedModel)
summary(simplifiedModel)Using AIC for model selection, as the authors have done, is good, but there are many more models to consider than the five the authors picked (o, oc, s, o+oc+s, and o*oc*os). StepAIC() facilitates finding the best models. The saturated model may be better than the other four in most cases, but this is likely because the four simple models miss significant predictors. Using stepAIC() also eliminates the need to test the four simple cases they do explicitly. Instead, start with the saturated model and let stepAIC() simplify it. If the best model is one of the authors’ four simple cases (o, oc, s, o+oc+s), that will be discovered by using stepAIC().
## Model reporting
The results of the models need to be reported, something that’s largely missing from the current manuscript. For any mathematical model, two aspects need to be reported: the fitted coefficients and their effect sizes, that is, how necessary each coefficient is for modeling the response variable (extinction here). Although the authors touch on effect size partly by reporting some LMG relative importance values, they don't do it comprehensively, which makes comparing the relative importance of the predictor variables impossible. LMG relative importance is one of the most informative parts of the modeling because it's not just whether occupancy change matters but how much it matters relative to the other predictors (occupancy, sampling, and the various interactions).
Two tables are needed in the main body of the manuscript and not in the supplemental material; both should have 16 rows, one each for the 16 taxon/bin-size combinations. The first should have seven columns for the model coefficients for o, oc, s, o:oc, o:s, oc:s, and o:oc:s, obtained with summary(model). Where a coefficient is not statistically significant (i.e., not demonstrably different from zero), it should be left blank, or a dash should be used instead of its value. The caption should indicate that these are the statistically significant (i.e., statistically non-zero) coefficients for each model. Given the rampant confusion over what statistically significant means, it's advisable to add the "i.e., statistically non-zero" to clarify what significance means.
The second table is similar, but it should present effect size through LMG relative importance for each of the statistically significant model coefficients (all seven). Again, leave blanks or just a dash for the non-significant coefficients in any model. The table should also have an additional column for overall explained variance or something similar to show how completely the entire model can explain extinction.
These two tables will convey what's important about the models, and they allow Tables 2 & 3 and Figures 3 & 4 to be deleted as they become unnecessary. It is unnecessary to report the model AIC values in the main body of the text; they’re helpful in model selection but not for comparing among the models owing to differences in sample size. They’re also not as relevant as the model coefficients and relative importance.
Adding these tables will also necessitate new text to discuss the patterns in relative importance among the taxon/bin-size contributions. Those will be an essential take-home from this research and one of broad interest, especially for the comparative importance of occupancy, occupancy change, and sampling. If specific interactions are common and meaningful, that will also be worth discussion.
## Why history might matter
It's fascinating the occupancy change matters, as has been shown for several marine invertebrate groups, especially so at the time scales of 100 kyr, 200 kyr, 500 kyr, and 1 myr. Organisms have no way of looking backward; their populations go up or down based on conditions in the moment, so the importance of occupancy change must lie in the state of the ecological system as a whole. In other words, if occupancy change matters, it must mean that something about the present ecological system is unfavorable, and that it has been persistently unfavorable.
It does make me wonder if the increasing importance of occupancy at longer time scales is just a reflection of a species being on the declining side of Foote/Liow occupancy curves. In other words, is all that's being said is that a species in a long-term decline is more likely to go extinct than one that is not? Even if it is, that's distinct from rarity, which ecologists focus on, and it's an insight that can be gained only from the fossil record.
# Minor matters, keyed to line numbers
48: replace "looking at" with "of".
52: By disappearance, is extinction meant? Extirpation, too?
63: Insert a short explanation of what is meant by sampling. Is it the proportion of cells having some data?
96: No change is needed, but I am surprised by the low proportion of extant species with usable records — at best 12%?!
121-123: No change is needed, but I am surprised that pacman profiling trims so much of the data, particularly from the top of the column. I would have guessed that last occurrences would be better constrained and more reliable in a core, given that downhole collapse affects first occurrences.
128–129: Most methods are admirably well-explained, with only two exceptions: simple completeness and three-timer completeness. Both could use a single-sentence definition for readers not familiar with them.
146–148: Delete simply and actually (lines 146–148, for example) from the methods; they're not needed.
206–211: I'm wondering about multicollinearity in this model, that is, the extent to which the three predictions (o, oc, and s) are correlated. A table of the correlations of the predictors for each data set could be added to the supplemental material. If these are large, though, the authors will need to address the implications of multicollinearity for model interpretation, which can be difficult.
266, 359: There should be a space between a value and its units.
314: It seems improbable that three of these values are all 0.053. Is that correct, or are there typos here?
333, elsewhere: Results should be reported in the present tense, not the past tense.
541: Burnham and Anderson (2002) isn't cited. It's worth checking the other references as well.
Supplemental 116–117: replace "approaching the present" with "from the Pliocene to the Pleistocene".
The authors should feel free to contact me with regards to any aspect of this review.
With kind regards,
Steven Holland
stratum@uga.edu
Citation: https://doi.org/10.5194/egusphere-2024-2597-RC1 -
AC1: 'Reply on RC1', Isaiah Smith, 01 Mar 2025
# Overview
This is an intriguing manuscript of a rich data set appropriately analyzed with multivariate logistic regression. It makes a significant point for four major planktonic groups: that change in occupancy is a good predictor of extinction risk. It substantially adds to the ecological literature on extinction risk, emphasizing a factor that can be addressed only through the fossil record. As the authors point out, the importance of occupancy for extinction risk has been shown for marine invertebrates. Still, it is encouraging to see that change in occupancy has importance that extends to the plankton and is likely to matter more broadly.
Response: We thank the reviewer for this encouraging feedback, as well as the following constructive suggestions.
The manuscript is well-written, and the figures and tables are well-constructed. I have a few general points that should addressed before publication, plus a few minor comments. In short, the authors should:
* Simplify the models so that they include only statistically significant terms.
Response: We understand the point of the Reviewer, and have simplified the models accordingly.
* Present a table of model coefficients for each 16 taxon/time-bin combination, displaying only the statistically significant fitted coefficients.
Response: We have taken the Reviewer’s advice, and created the suggested table showing coefficients for each model.
* Present a table of LMG relative importance for each of the statistically significant coefficients, as well as the overall percent variance explained for each of the 16 taxon/time-bin combinations.
Response: We have also created a table showing the LMG values for each statistically significant coefficient, and overall percent variance explained.
* Update the text to emphasize patterns in LMG relative importance across the models.Response: We will update the text accordingly.
* Delete the tables (2 and 3) and figures (3 and 4) that are no longer needed.
These are easily accomplished, so I consider these moderate revisions. I would be happy to review a revised manuscript.
Response: We thank the Reviewer for taking the time to review this manuscript.
# Principal comments
## Multiplicative vs. additive models
There's a mixup about the meaning of the + and * symbols in specifying models in R, and this is not unique to the authors; many are misled by these symbols. These symbols are overloaded (in programmer speak); that is, in the context of specifying a linear model, they do not have their commonplace meanings of add and multiply. In R's formulation of a linear model, y ~ x + z means "y as a function of x and z," not "y as a function of x plus (or added to) z." Likewise, y ~ x * z means "y as a function of x and z and the interaction x and z," not "y as a function of x times (or multiplied by) z."
Confusing these meanings has led the authors to describe some models as additive and others as multiplicative; instead, these should be described as "without interactions" and "with all interactions.” This is a small matter that can be quickly dealt with; I raise it because it feeds into the following more critical comments on model selection and reporting.Response: Indeed, there was a mixup in the previous version of our manuscript, but only in terminology and not in meaning. We will update the text following the suggestions.
## Model selection
A surprising result of this study is that the full-interaction model (extinction ~ o * oc * s, where o is occupancy, oc is occupancy change, and s is sampling) is selected in 15 out of the 16 group/bin-size combinations (Table 2). When alternative models — Table S2, where 16/16 favor the full-interaction model, and Table S4, where 13/16 favor the full-interaction model — are considered, this rises to 44/48 (92%) of all models. This full-interaction model (extinction ~ o * oc * s) expands to the formula extinction ~ o + oc + s + o:oc + o:s + oc:s + o:oc:s (each of the terms with a colon indicates the interaction of predictor variables), so it is also called the saturated model.
The fact that the saturated model was selected in 92% of the cases is highly unusual, and I've never come across such a situation in my work, including all the student projects in my data analysis class. By selecting these models, the authors are, in effect, claiming that all seven terms (o, oc, s, o:oc, o:s, oc:s, and o:oc:s) are statistically significant (i.e., the coefficients are demonstrably non-zero). That's what is unusual; seldom are all the interaction terms significant, and often, one or more non-interaction terms are not significant.
Examining the supplemental material gives a glimpse of what is going on. For example, using the foraminifera / 1 m.y. combination, where the full-interaction ("multiplicative" in table 2) model was selected, a summary of the model [using summary(foram_1000_mod1)] shows that four of the predictors are not statistically significant (o, o:oc, o:s, o:oc:s). The model must be simplified to include only the significant terms. Although one might be tempted to discard all the non-significant terms at this step, that's inadvisable because removing one term can cause another term to become significant. One must us model simplification so that significant terms are included, and non-significant terms are excluded(see the excellent coverage in Crawley's The R Book).
With seven predictor variables, exploring all possible (128) combinations manually is not feasible. The stepAIC() function provides a way to simplify the process. It starts with the saturated model (e.g., ext ~ o * oc * s), removes non-significant terms using AIC, and stops when a simplified model containing only statistically significant (i.e., demonstrably non-zero) coefficients is left. Calling summary() on that model shows the coefficients. What it produces is the simplest model that fits the data best. The authors can do this easily by calling three commands for each model; here's an example using the foraminifera/1 m.y. data:
saturatedModel <- glm(extinction ~ raw_prop_sampled * delta_1 * sampling, family = binomial(link ="logit"), data = F_1000000[[1]])
simplifiedModel <- stepAIC(saturatedModel)
summary(simplifiedModel)Using AIC for model selection, as the authors have done, is good, but there are many more models to consider than the five the authors picked (o, oc, s, o+oc+s, and o*oc*os). StepAIC() facilitates finding the best models. The saturated model may be better than the other four in most cases, but this is likely because the four simple models miss significant predictors. Using stepAIC() also eliminates the need to test the four simple cases they do explicitly. Instead, start with the saturated model and let stepAIC() simplify it. If the best model is one of the authors’ four simple cases (o, oc, s, o+oc+s), that will be discovered by using stepAIC().
Response: We thank the Reviewer for this helpful comment. Although in earlier versions of this manuscript, we did employ stepwise model selection (the stepAIC() function), we eventually decided to analyze only the five explicit models in order to simplify reporting. We updated the analytical pipeline to include the stepwise model selection, which does not substantially impact the final results of this study. We are happy to improve the manuscript in this way.
## Model reporting
The results of the models need to be reported, something that’s largely missing from the current manuscript. For any mathematical model, two aspects need to be reported: the fitted coefficients and their effect sizes, that is, how necessary each coefficient is for modeling the response variable (extinction here). Although the authors touch on effect size partly by reporting some LMG relative importance values, they don't do it comprehensively, which makes comparing the relative importance of the predictor variables impossible. LMG relative importance is one of the most informative parts of the modeling because it's not just whether occupancy change matters but how much it matters relative to the other predictors (occupancy, sampling, and the various interactions).
Two tables are needed in the main body of the manuscript and not in the supplemental material; both should have 16 rows, one each for the 16 taxon/bin-size combinations. The first should have seven columns for the model coefficients for o, oc, s, o:oc, o:s, oc:s, and o:oc:s, obtained with summary(model). Where a coefficient is not statistically significant (i.e., not demonstrably different from zero), it should be left blank, or a dash should be used instead of its value. The caption should indicate that these are the statistically significant (i.e., statistically non-zero) coefficients for each model. Given the rampant confusion over what statistically significant means, it's advisable to add the "i.e., statistically non-zero" to clarify what significance means.
Response: We will add the suggested table to the manuscript summarizing the model coefficients for each of the statistically significant terms.
The second table is similar, but it should present effect size through LMG relative importance for each of the statistically significant model coefficients (all seven). Again, leave blanks or just a dash for the non-significant coefficients in any model. The table should also have an additional column for overall explained variance or something similar to show how completely the entire model can explain extinction.
Response: We will also add a table summarizing LMG relative importance values. We will report the raw LMG values for each term, instead of the standardized percentage with respect to the total of the LMG values for the three analyzed model terms.
These two tables will convey what's important about the models, and they allow Tables 2 & 3 and Figures 3 & 4 to be deleted as they become unnecessary. It is unnecessary to report the model AIC values in the main body of the text; they’re helpful in model selection but not for comparing among the models owing to differences in sample size. They’re also not as relevant as the model coefficients and relative importance.
Adding these tables will also necessitate new text to discuss the patterns in relative importance among the taxon/bin-size contributions. Those will be an essential take-home from this research and one of broad interest, especially for the comparative importance of occupancy, occupancy change, and sampling. If specific interactions are common and meaningful, that will also be worth discussion.
Response: We thank the Reviewer for these suggestions. Although preliminary re-analysis shows that our general conclusions relating to temporal grain and siliceous versus calcareous plankton do not change with the updates, we are excited about the opportunity to further explore/discuss more deeply the patterns in relative importance for each taxon/bin-size combination and for each variable, as this will also be valuable and of broad interest to the scientific community.
## Why history might matter
It's fascinating the occupancy change matters, as has been shown for several marine invertebrate groups, especially so at the time scales of 100 kyr, 200 kyr, 500 kyr, and 1 myr. Organisms have no way of looking backward; their populations go up or down based on conditions in the moment, so the importance of occupancy change must lie in the state of the ecological system as a whole. In other words, if occupancy change matters, it must mean that something about the present ecological system is unfavorable, and that it has been persistently unfavorable.
It does make me wonder if the increasing importance of occupancy at longer time scales is just a reflection of a species being on the declining side of Foote/Liow occupancy curves. In other words, is all that's being said is that a species in a long-term decline is more likely to go extinct than one that is not? Even if it is, that's distinct from rarity, which ecologists focus on, and it's an insight that can be gained only from the fossil record.
Response: We are thankful for this feedback, we will explore this idea further in the discussion section.
# Minor matters, keyed to line numbers
48: replace "looking at" with "of".
Response: We will update the manuscript accordingly.
52: By disappearance, is extinction meant? Extirpation, too?
Response: We meant global extinction and will update the manuscript for clarity.
63: Insert a short explanation of what is meant by sampling. Is it the proportion of cells having some data?
Response: We will add some explanation to the main text.
96: No change is needed, but I am surprised by the low proportion of extant species with usable records — at best 12%?!
Response: We will clarify this in the text. Of the total data set, 12% of the records belong to extant species, and also have adequate spatial and temporal data. The remaining 88% of the records are mostly extinct organisms (mostly with usable records).
121-123: No change is needed, but I am surprised that pacman profiling trims so much of the data, particularly from the top of the column. I would have guessed that last occurrences would be better constrained and more reliable in a core, given that downhole collapse affects first occurrences.
Response: We will double check this.
128–129: Most methods are admirably well-explained, with only two exceptions: simple completeness and three-timer completeness. Both could use a single-sentence definition for readers not familiar with them.
Response: We will add a sentence to explain these methods.
146–148: Delete simply and actually (lines 146–148, for example) from the methods; they're not needed.
Response: We will update the manuscript accordingly.
206–211: I'm wondering about multicollinearity in this model, that is, the extent to which the three predictions (o, oc, and s) are correlated. A table of the correlations of the predictors for each data set could be added to the supplemental material. If these are large, though, the authors will need to address the implications of multicollinearity for model interpretation, which can be difficult.
Response: We calculated the Pearson correlation for occupancy and occupancy change (the sampling term was removed from the analysis as per the suggestion in RC2). The correlation values for the 16 taxon/bin-size combinations had a mean of 0.23, a median of 0.23, and a maximum of 0.34. Although there exists some correlation, it is relatively weak. We are happy to include these results in the supplemental material.
266, 359: There should be a space between a value and its units.
Response: We will update the manuscript accordingly.
314: It seems improbable that three of these values are all 0.053. Is that correct, or are there typos here?
Response: We have double-checked these values, and they are indeed correct.
333, elsewhere: Results should be reported in the present tense, not the past tense.
Response: We will update the manuscript accordingly.
541: Burnham and Anderson (2002) isn't cited. It's worth checking the other references as well.
Response: We will correct this and check the other references.
Supplemental 116–117: replace "approaching the present" with "from the Pliocene to the Pleistocene".
Response: We will do this.
Citation: https://doi.org/10.5194/egusphere-2024-2597-AC1
-
AC1: 'Reply on RC1', Isaiah Smith, 01 Mar 2025
-
RC2: 'Comment on egusphere-2024-2597', Anonymous Referee #2, 08 Feb 2025
Thank you for the opportunity to read this manuscript, ‘Occupancy history influences extinction risk of fossil marine microplankton groups,’ by Smith, Kocsis, and Kiessling. I was extremely intrigued by the premise and results of the study. The prediction of modern extinction was interesting. The addition of robustness testing in analysis made this work particularly convincing to me, and as such, I have relatively few/minor comments.
I apologise for my delay returning these comments, and I hope that these suggestions prove helpful in refining the work.
Minor comments
- The result that ‘the temporal scale by which we analyze these data can influence our understanding of extinction risk in marine microplankton’ stands out as a strong finding. I recommend adding this to the abstract.
- Line 47: Saulsbury et al. (2023) in PNAS is a recent paper to add. Although it is couched in terms of age-dependent extinction risk, it is equally about abundance-dependent and therefore range-size-dependent extinction risk. https://doi.org/10.1073/pnas.2307629121
- Line 47: I would also ask to rewrite the manuscript sentence on this line. It’s unnecessary to claim the topic is understudied - I am convinced of its inherent importance regardless! To say range trajectory dynamics are ‘sparingly’ considered and ‘many studies exclude [it]’ feels as though it could slight those who have indeed worked on the topic.
- Line 50: I suggest omitting this sentence (beginning ‘increasing anthropogenic impact’), as it reads as a non sequitur. The topic is already well-justified as worthy of study without this.
- Line 60: I don’t follow this sentence (‘variations in the material…’) and therefore recommend removing it for concision.
- Line 67: Rather than pose the possible results as a binary set of options (‘whether the trajectory of geographic occupancy actually influences extinction risk [or not]’), I would reword this to reflect the nuance of your methods to investigate the degree to which geographic occupancy explains extinction risk.
- Line 179: I assume the authors mean the final data table, i.e. dataset?
- Line 279: Name specific examples of the ‘various biotic events in the Cenozoic’ that ‘can be detected’.
- Line 318: The first use of ‘relative importance value’ is this paragraph needs explaining – this is the relative importance of occupancy change to standing occupancy, correct?
- Line 348: What was the D2 and the relative importance of the occupancy change term in this robustness-test version of analysis?
- Line 372: Changes in circulation and stratification are key influences on plankton biodiversity distributions, yes, but I am less confident to say they are THE key influences. There are additional influences beyond these two factors as well.
- Line 374: There is a lot of contrary evidence to this. For instance, plankton lateral shifts are documented in two Nature papers out this year - Ying et al. [https://doi.org/10.1038/s41586-024-08029-0] and Chaabane et al. [https://doi.org/10.1038/s41586-024-08191-5] A separate factor that matters is edge effects: species at poles and the equator are at the edge of the world, climatically speaking, as there is nowhere for them to migrate that is colder than found at poles or warmer than found at equator, respectively. I would consider this geographic constraint different to a mobility/biological constraint.
- Line 443: There are some logical steps in this sentence that are currently missing and need to be spelled out (or remove the sentence). What does nutrient limitation have to do with this?
- Line 478: Rather than ‘across the tree of life,’ I would say ‘across marine life’. Neither this work nor Kiessling and Kocsis (2016) analysed terrestrial species, which have substantially different migration capacities and population sizes.
- Section 5 (Conclusion): While I agree historic data are relevant to informing future predictions of biodiversity loss, I am (regretfully) unconvinced conservation biologists would take the findings of this study as important to influence management decisions on ‘where to spend limited conservation funding in the future’—something that is sadly dictated more often by what is feasible to achieve in local sociopolitical contexts than what is forecast by scientific models. What about substituting the second sentence in this paragraph and instead mentioning how these findings are empirical contributions to help resolve open evolutionary theory about how range reductions end in extinction over the long timescales of species ‘lifetimes’, i.e. age-dependency of extinction rates?
- Table 2: Were the top models always only (a) fully multiplicative or (b) fully additive? That is, were there never cases of a model such as ex ~ oc * ch + p? Or were such models not considered? This is hard to interpret based on how Table 2 is presented.
- Cite version numbers for all software packages: NSBcompanion (line 76), divDyn (line 131), taxize (line 221), and any others.
Modelling suggestions
I have three suggestions about the analysis framework, none of which I expect to substantively change the findings but which may increase the informativeness (for the first), rigor (for the second), and accuracy (for the third).
My first observation and the one that would require more work is about the definition of ‘completeness.’ Section 2.2, beginning line 126, describes metrics that measure only temporal completeness. I agree range-through completeness is relevant and interesting as a parameter; I found myself repeatedly wondering what an additional measure of spatial completeness would show. For example, if the analysis were repeated with a count of global occupied grid cells (already used to derive proportional occupancy) as a measure of ‘completeness,’ what would be its explanatory power?
A different thought about the models is whether a mixed-effects framework might be appropriate. My understanding of the input data is that individual species contribute multiple observations to the dataset (from different points in their ‘lifespan’). Is it possible to add species identity as a random effect, reducing the estimated degrees of freedom to account for pseudo-replicate observations, and still calculate deviance? Maybe adding a random effect to the GLMs will make them fail to converge – it might not be feasible – but then again it might be simple to implement. I wouldn’t require this change but do think it worth investigating.
Lastly, since the model used to predict modern taxa (section 2.7) couldn’t include a three-timer sampling value, the entire model selection and model fitting process should omit three-timer completeness as a possible variable. Perhaps this was already done, but the paragraph beginning line 228 made it seem as though the process was to take the best-fit model and then leave out the completeness covariate when predicting on the modern data. It would be better to find the best-fit model out of the model set that doesn’t have this term to begin with, then make predictions.
Citation: https://doi.org/10.5194/egusphere-2024-2597-RC2 -
AC2: 'Reply on RC2', Isaiah Smith, 01 Mar 2025
[To the editor: We greatly appreciate the time and effort that the editorial staff have taken in handling our manuscript. We are also very grateful for the thorough and thoughtful feedback provided by both reviewers. We have responded to each of the reviewers’ comments, and have made updates to the analytical workflow and discussion based on the feedback received. Please also note that some of the data set descriptive statistics will be updated to reflect a correction in the trimming script, and Figure 1 will be updated to reflect a correction in the flow of time. While the recommended improvements strengthen the rigor of the models and results, the key conclusions and takeaways of this project remain unchanged.]
Reply on RC2:
Thank you for the opportunity to read this manuscript, ‘Occupancy history influences extinction risk of fossil marine microplankton groups,’ by Smith, Kocsis, and Kiessling. I was extremely intrigued by the premise and results of the study. The prediction of modern extinction was interesting. The addition of robustness testing in analysis made this work particularly convincing to me, and as such, I have relatively few/minor comments.
I apologise for my delay returning these comments, and I hope that these suggestions prove helpful in refining the work.
Response: The authors appreciate the time and input of the Reviewer. We are grateful for the useful comments provided below, and look forward to improving the manuscript based on these suggestions.
Minor comments
- The result that ‘the temporal scale by which we analyze these data can influence our understanding of extinction risk in marine microplankton’ stands out as a strong finding. I recommend adding this to the abstract.
- Response: We are thankful for the suggestion. We will add this finding to the abstract.
- Line 47: Saulsbury et al. (2023) in PNAS is a recent paper to add. Although it is couched in terms of age-dependent extinction risk, it is equally about abundance-dependent and therefore range-size-dependent extinction risk. https://doi.org/10.1073/pnas.2307629121
- Response: We appreciate the paper recommendation, and will include it in the next version of the manuscript.
- Line 47: I would also ask to rewrite the manuscript sentence on this line. It’s unnecessary to claim the topic is understudied - I am convinced of its inherent importance regardless! To say range trajectory dynamics are ‘sparingly’ considered and ‘many studies exclude [it]’ feels as though it could slight those who have indeed worked on the topic.
- Response: We will update this to better reflect the current state of the research.
- Line 50: I suggest omitting this sentence (beginning ‘increasing anthropogenic impact’), as it reads as a non sequitur. The topic is already well-justified as worthy of study without this.
- Response: We will update the manuscript accordingly.
- Line 60: I don’t follow this sentence (‘variations in the material…’) and therefore recommend removing it for concision.
- Response: We will make the recommended change.
- Line 67: Rather than pose the possible results as a binary set of options (‘whether the trajectory of geographic occupancy actually influences extinction risk [or not]’), I would reword this to reflect the nuance of your methods to investigate the degree to which geographic occupancy explains extinction risk.
- Response: We will update the way we present the results.
- Line 179: I assume the authors mean the final data table, i.e. dataset?
- Response: This is correct, we will update for clarity.
- Line 279: Name specific examples of the ‘various biotic events in the Cenozoic’ that ‘can be detected’.
- Response: We will update the manuscript to include specific examples.
- Line 318: The first use of ‘relative importance value’ is this paragraph needs explaining – this is the relative importance of occupancy change to standing occupancy, correct?
- Response: That is correct: we reported the importance of the occupancy change term with respect to the sum of the importance values for all three of the analyzed model terms. Based on feedback from RC1, we will report the raw LMG values (without converting to a standardized percentage). We will still ensure that we clearly explain what exactly is being reported.
- Line 348: What was the D2 and the relative importance of the occupancy change term in this robustness-test version of analysis?
- Response: We will add these values to the text.
- Line 372: Changes in circulation and stratification are key influences on plankton biodiversity distributions, yes, but I am less confident to say they are THE key influences. There are additional influences beyond these two factors as well.
- Response: We will update the text to include these other influences.
- Line 374: There is a lot of contrary evidence to this. For instance, plankton lateral shifts are documented in two Nature papers out this year - Ying et al. [https://doi.org/10.1038/s41586-024-08029-0] and Chaabane et al. [https://doi.org/10.1038/s41586-024-08191-5] A separate factor that matters is edge effects: species at poles and the equator are at the edge of the world, climatically speaking, as there is nowhere for them to migrate that is colder than found at poles or warmer than found at equator, respectively. I would consider this geographic constraint different to a mobility/biological constraint.
- Response: We will explore these ideas more deeply in the next version of the manuscript, and will reconsider how we address plankton range migrations with respect to changes in climate. We also appreciate the two suggested papers and will update our discussion based on these references.
- Line 443: There are some logical steps in this sentence that are currently missing and need to be spelled out (or remove the sentence). What does nutrient limitation have to do with this?
- Response: We appreciate the point, and agree that more explanation is needed here. We will make those updates in the revised version of the manuscript.
- Line 478: Rather than ‘across the tree of life,’ I would say ‘across marine life’. Neither this work nor Kiessling and Kocsis (2016) analysed terrestrial species, which have substantially different migration capacities and population sizes.
- Response: This is a good point, and we will update the text accordingly.
- Section 5 (Conclusion): While I agree historic data are relevant to informing future predictions of biodiversity loss, I am (regretfully) unconvinced conservation biologists would take the findings of this study as important to influence management decisions on ‘where to spend limited conservation funding in the future’—something that is sadly dictated more often by what is feasible to achieve in local sociopolitical contexts than what is forecast by scientific models. What about substituting the second sentence in this paragraph and instead mentioning how these findings are empirical contributions to help resolve open evolutionary theory about how range reductions end in extinction over the long timescales of species ‘lifetimes’, i.e. age-dependency of extinction rates?
- Response: We appreciate the feedback, and will update the conclusion to focus more on the broader implications to evolutionary theory to which this study contributes.
- Table 2: Were the top models always only (a) fully multiplicative or (b) fully additive? That is, were there never cases of a model such as ex ~ oc * ch + p? Or were such models not considered? This is hard to interpret based on how Table 2 is presented.
- Response: They were not considered in the initial version of the manuscript (they were, however, in previous iterations of this study). Upon reading the feedback from both Reviewers, we see the importance of fully analyzing all possible model formulas. As such, we have updated the experimental design to analyze all possible model combinations using the stepAIC() function.
- Cite version numbers for all software packages: NSBcompanion (line 76), divDyn (line 131), taxize (line 221), and any others.
- Response: We will add version numbers.
Modelling suggestions
I have three suggestions about the analysis framework, none of which I expect to substantively change the findings but which may increase the informativeness (for the first), rigor (for the second), and accuracy (for the third).
Response: We are thankful for these modelling suggestions. Upon incorporating these suggestions into our workflow, we see improvements to the quality of the manuscript. Additionally, our main findings remain the same.
My first observation and the one that would require more work is about the definition of ‘completeness.’ Section 2.2, beginning line 126, describes metrics that measure only temporal completeness. I agree range-through completeness is relevant and interesting as a parameter; I found myself repeatedly wondering what an additional measure of spatial completeness would show. For example, if the analysis were repeated with a count of global occupied grid cells (already used to derive proportional occupancy) as a measure of ‘completeness,’ what would be its explanatory power?
Response: Although we did examine temporal completeness, the Reviewer has correctly pointed out that we did not explicitly examine spatial completeness in this manuscript. We have re-run these models with and without a “paired-cell approach”, whereby we calculated the proportional occupancy and the change in occupancy using only the cells that were common to both bin i and bin i + 1 (see Kiessling and Kocsis 2016). This paired-cell approach was used to help account for variability in spatial sampling between temporal bins. With this approach, heavily (spatially) sampled intervals are trimmed down for each pairwise analysis such that only the common geographic cells between both time bins are considered as the denominator of the proportional occupancy when calculating standing occupancy or the change in occupancy between bins. The drawback to such an approach is that is reduces the amount of data that can be used to fit a model. Using this paired-cell approach showed minimal changes in the results and serves as a robustness test and check against biases caused by variations in spatial sampling. We are happy to include and discuss these findings as a means of examining the impact of spatial completeness on our results as another “robustness test."
A different thought about the models is whether a mixed-effects framework might be appropriate. My understanding of the input data is that individual species contribute multiple observations to the dataset (from different points in their ‘lifespan’). Is it possible to add species identity as a random effect, reducing the estimated degrees of freedom to account for pseudo-replicate observations, and still calculate deviance? Maybe adding a random effect to the GLMs will make them fail to converge – it might not be feasible – but then again it might be simple to implement. I wouldn’t require this change but do think it worth investigating.
Response: We are thankful for this interesting suggestion. We have re-run the models using a mixed-effects framework, and found that this update did not substantially change the LMG values, although several of the models did fail to converge. Given the updates in how we will report results (see the two tables requested in RC1), we did not apply the mixed-effects framework to the main analysis. Nonetheless, we see the utility of and rationale behind a mixed-effects framework, so we are happy to discuss the results of the mixed-effects models and include them in the supplementary material.
Lastly, since the model used to predict modern taxa (section 2.7) couldn’t include a three-timer sampling value, the entire model selection and model fitting process should omit three-timer completeness as a possible variable. Perhaps this was already done, but the paragraph beginning line 228 made it seem as though the process was to take the best-fit model and then leave out the completeness covariate when predicting on the modern data. It would be better to find the best-fit model out of the model set that doesn’t have this term to begin with, then make predictions.
Response: We appreciate the Reviewer pointing this out, and agree with this point. When fitting the models on extant data, the sampling term was not included at all (extant-only models were fit separately from full-data models). Given these suggestions, we will remove the sampling term completely from all models, thus making comparisons between extinct- and extant-species models more straightforward.
Citation for Kiessling and Kocsis (2016): https://doi.org/10.1098/rsbl.2015.0813
Citation: https://doi.org/10.5194/egusphere-2024-2597-AC2 - The result that ‘the temporal scale by which we analyze these data can influence our understanding of extinction risk in marine microplankton’ stands out as a strong finding. I recommend adding this to the abstract.
Data sets
Preprint – Occupancy Trajectory Microplankton Isaiah E. Smith https://doi.org/10.5281/zenodo.7745607
Model code and software
Preprint – Occupancy Trajectory Microplankton Isaiah E. Smith https://doi.org/10.5281/zenodo.7745607
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
233 | 64 | 20 | 317 | 24 | 9 | 11 |
- HTML: 233
- PDF: 64
- XML: 20
- Total: 317
- Supplement: 24
- BibTeX: 9
- EndNote: 11
Viewed (geographical distribution)
Country | # | Views | % |
---|---|---|---|
United States of America | 1 | 136 | 42 |
Germany | 2 | 25 | 7 |
France | 3 | 16 | 5 |
India | 4 | 13 | 4 |
Netherlands | 5 | 12 | 3 |
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
- 136