the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Information loss in palaeoecological data from process and observer error
Abstract. Palaeoecological data give us insight into how ecosystems have changed in the past, and with the development of new sources of proxy data and statistical methods, they are being used to address questions around the underlying mechanisms of change, such as biotic- and climate-ecosystem interactions. However, inferences from palaeoecological data can be hindered by uncertainties inherent in core-type samples that arise from environmental processes and observer-introduced error. Environmental processes, core extraction methods, sub-sampling strategies, laboratory methods, and data processing can potentially mask ‘true’ signals in the data. The influence of sources of uncertainty on inferences drawn from palaeoecological data are rarely assessed but are critical to the confidence of our conclusions. To address this concern, we use a virtual ecological approach to assess the influences of environmental and observer introduced uncertainty to better understand which of them have the strongest influence on statistical methods applied to the data. Quantifying information loss from uncertainty can be used to inform study design before a project is carried out to increase the likelihood of detecting a given signal of interest and make more robust inferences from statistical analyses of palaeoproxy data. We generate synthetic ‘error-free’ core-type samples of pseudoproxies, on which environmental and observational processes are systematically introduced to impose uncertainties on the simulated pseudoproxies. The influence of three sources of uncertainty (core mixing, sub-sampling, and proxy quantification from sub-subsamples), are assessed for their individual and combined effects on two statistical methods: Fisher Information and principal curves. Increasing sub-sampling intervals has the most substantial influence on the two statistical methods applied to the pseudoproxy data. When combined, the interaction between increasing sub-sampling interval, and decreasing the number of proxies counted per sub-sample has the strongest influence on Fisher Information and principal curves. Fisher Information and principal curves are not affected in the same way by introducing uncertainty, with principal curves being less influenced by simulated proxy counting and sub-sampling of the core. Virtually assessing uncertainties is a powerful method to better understand the influence that uncertainties introduced at different parts of the analytical process have on conclusions drawn from palaeoecological data.
- Preprint
(917 KB) - Metadata XML
-
Supplement
(1258 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3845', Anonymous Referee #1, 24 Apr 2025
General comment
Asena et al. used a virtual ecology approach to assess the sources of uncertainty on inferences drawn from palaeoecological data. Especially, they focused on the environmental (e.g., mixing, preservation, catchment erosion) and observer (core compression, sub-sampling, counting…) uncertainties to better understand which of them have the strongest influence on statistical methods applied to the data. They generated synthetic ‘error-free’ core-type samples of pseudoproxies, on which environmental and observational processes are systematically introduced to impose uncertainties on the simulated pseudoproxies. The influence of three sources of uncertainty (core mixing, sub-sampling, and proxy quantification from sub-subsamples), were assessed for their individual and combined effects on two statistical methods: Fisher Information and principal curves. Increasing sub-sampling intervals has the most substantial influence on the two statistical methods applied to the pseudoproxy data. Asena et al. also showed that Fisher Information and principal curves are not affected in the same way by introducing uncertainty. Asena et al. concluded that principal curves method more relevant to analyze a network of core data over a large geographic region, where the observer is interested in the spatial consistency of the system’s trajectory but does not have the resources to extract highly resolved data from each core. In contrast, Fisher Information is useful for short-term change in a single core.
The objectives and the method of this study correspond to the scope of CP. However, the introduction and the method need to be reshaped in order to be more easily understandable by readers with different background (data, modeling, proxy, etc…). I have also some questions on the method and how it can be useful for the researchers producing the data. For this reason, I recommend major revisions before publication to CP.
Major comments
- The method section is difficult to follow for readers not in the field of pseudoproxy experiments. I know it can be quite technical, but the authors should make some efforts to simplify the language and make some schematic figures helping in the understanding of the method. The figure 1 does not bring so much for example. For example, the terms driver, archive, sensor, and observer and the links between should be explained and summarized in a figure. The authors should explain clearly what are features. A figure or more concrete examples for the beginning of sections 2.1 and 2.2 would help too. For the moment, these parts are hard to follow during the first reading.
- About clarification, maybe it would not be a bad idea to remind what are palaeoecological data (pollen, fossils, etc…).
- At lines 150-152, the authors explain their analyses is for scenario 1 (Table S1), which means an abrupt environment driver switching between two constant conditions and randomly changing driver. Why did the authors focus on this scenario and not on the other ones? Is it the most frequent one in the past? Moreover, to help the reader to understand, I would give some concrete real examples of such scenario.
- At line 129, the authors wrote they followed the proxy system model framework of Evans et al. (2013). The problem of proxy system models in climatology is that they often give less good results than simpler linear models. Does it apply for this study? Moreover, in the conclusion, the authors wrote “A better understanding of the proxy system models of different proxies (i.e., how different proxies record environmental signals in an archive) and the uncertainties around quantifying and analysing proxy data can bring us closer to understanding long-term climate and ecosystem dynamics.”. Finally, the authors did not analyze the (potentially huge) uncertainties related to proxy system models (water isotopes measured in speleothems). Does it apply for palaeoecological data? Finaly, only some observer and environmental sources of uncertainty are analysed here (to investigate all of them is very difficult I suppose). This is also related to the sentence at line 353.
- Section 3.3: It is not completely clear to me if the effects of 3 combined sources of uncertainty bring more information compared to the effects of two combined sources (from the figure 3 I can guess how will be figures 4 and 5). The authors should clarify why.
- In section 4.3, I would like the authors to give more concrete examples of how their tool can be used in relation to past palaeoecological data studies, if possible.
Specific comments
- Supplementary figures and tables are not referenced adequately (for example supplementary table 2 instead of Table S1 or S2). Please check.
- Lines 18-19: “influence” word two times in the sentence.
- Line 60: type-I error rates. Maybe quite technical for the introduction.
- Line 218: “among treatment levels, feature anlysis…”
- Line 221: explain what are features, it is not very clear.
- Line 233: give example of regime shifts.
- Line 238: define the ecological gradient.
- Lines 240-242: this sentence is very difficult to understand.
- Line 266: define what is a treatment level.
- Figure 4: say this is for FI features.
- Figure 5: say this is for PrC features. Also, in the legend, you can simply say this is the same as Figure 4 but for PrC features.
- Lines 381-382: A word is missing at the beginning of the sentence I think.
- Lines 484-485: Maybe more perspectives considering proxy system models of different proxies?
Citation: https://doi.org/10.5194/egusphere-2024-3845-RC1 -
AC1: 'Reply on RC1', Quinn Asena, 18 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3845/egusphere-2024-3845-AC1-supplement.pdf
-
AC3: 'Reply on RC1', Quinn Asena, 18 Aug 2025
Dear reviewer 1,
I have been able to upload a response, but unable to upload the revised version of the manuscript. I have contacted Copernicus to resolve the issue.
Kind regards!Citation: https://doi.org/10.5194/egusphere-2024-3845-AC3
-
RC2: 'Comment on egusphere-2024-3845', Anonymous Referee #2, 07 May 2025
Quinn Asena et al., Information loss in palaeoecological data from process and observer error. (CP manuscript egusphere-2024-3845 available as preprint on EGUsphere.)
General comments:
This study addresses the challenges of uncertainty in palaeoecological data derived from core-type samples. To quantify information loss due to environmental uncertainties and observer errors in palaeoecological research, the researchers employed a virtual ecological approach. They generated synthetic, error-free core-type samples of pseudoproxies and systematically introduced uncertainties related to core mixing, sub-sampling, and proxy quantification. The Fisher information and Principal curves statistical methods were used to assess the effect of different uncertainties imposed to the error-free samples. Their findings showed that increasing sub-sampling intervals had the most significant impact on both statistical methods. Moreover, the combination of increased sub-sampling intervals and decreased proxy counts per sub-sample showed the strongest influence on the analyses. Notably, principal curves were less affected by uncertainties related to proxy counting and sub-sampling compared to Fisher Information, which is more sensitive to short-term variability and driver interactions.
This paper presents a novel and interesting approach to deal with environmental, and observer introduced uncertainty in palaeoecology and its objectives are within the scope of CP. The objectives of the study are clearly outlined, the results and their interpretation support the conclusions. However, the methods section is very complicated and needs to be re-structured to aid readers who are not familiar with virtual ecology (like myself) to understand it a bit better. The way the methods are written, making the reproduction of the study rather difficult. On the other hand, the results and discussion section flow a lot better, and they are clearly written. I would recommend major revisions on the presentation of the methodology before publishing to CP.
Specific comments:
I agree with the previous comment that the methods’ section needs some more clarification. At the moment it is very difficult to follow the workflow of the study. There is a lot of terminology involved, and the authors should try and explain it in a more simple way. It would be very useful to have a methods’ figure that explains the whole processes and maybe another diagram explaining how the pseudoproxies are formed. Currently, there is no direct linkage between each method step, which makes it very challenging for the reader to follow the paper. For example, I started understanding the methodology only when I proceed to the results section and I had to go back to the methods to re-read it.
A little more clarification regarding the replicated models is needed. For example, why is there a specific number of replicas (31)? Is it a decision the authors took or is there another explanation? Why is there a 5000 time-steps? These might be parameters that are well known to readers who are familiar with virtual ecology studies but are not clear to all readers.
Both virtual sub-sampling and virtual proxy counts are referred as “observation models”. Could there be a sort of discrimination on those two uncertainty drivers?
It would be helpful to know what kind of simulated pseudoproxies were used in the study. As explained in the discussion (443-457) some proxies are more sensitive to environmental sensors, so knowing what kind of pseudoproxies were used in the simulating models will make them more transparent and interpretive.
The limitations of the study are mentioned for the first time on the conclusion. A brief discussion of those limitations should be given in the discussion section along with a few examples where this approach may not be suitable. Virtual ecology as a tool to quantify uncertainty on palaeoecology is a novel approach and therefore authors should explain in more detail the benefits as well as the limitations of their method.
Technical corrections:
- Line 237: “2.4. Principal curves” should be: “2.3.2. Principal curves”
- Line 254: “2.4.1. Feature analysis” should be: “2.4. Feature analysis”
- In Figure 3A and Figure 3B the heatmaps should have more space horizontally.
Citation: https://doi.org/10.5194/egusphere-2024-3845-RC2 -
AC2: 'Reply on RC2', Quinn Asena, 18 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3845/egusphere-2024-3845-AC2-supplement.pdf
-
AC4: 'Reply on RC2', Quinn Asena, 18 Aug 2025
Dear reviewer 2,
I have been able to upload a response, but unable to upload the revised version of the manuscript. I have contacted Copernicus to resolve the issue.
Kind regards!Citation: https://doi.org/10.5194/egusphere-2024-3845-AC4
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
380 | 61 | 21 | 462 | 31 | 25 | 45 |
- HTML: 380
- PDF: 61
- XML: 21
- Total: 462
- Supplement: 31
- BibTeX: 25
- EndNote: 45
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1