the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Information loss in palaeoecological data from process and observer error
Abstract. Palaeoecological data give us insight into how ecosystems have changed in the past, and with the development of new sources of proxy data and statistical methods, they are being used to address questions around the underlying mechanisms of change, such as biotic- and climate-ecosystem interactions. However, inferences from palaeoecological data can be hindered by uncertainties inherent in core-type samples that arise from environmental processes and observer-introduced error. Environmental processes, core extraction methods, sub-sampling strategies, laboratory methods, and data processing can potentially mask ‘true’ signals in the data. The influence of sources of uncertainty on inferences drawn from palaeoecological data are rarely assessed but are critical to the confidence of our conclusions. To address this concern, we use a virtual ecological approach to assess the influences of environmental and observer introduced uncertainty to better understand which of them have the strongest influence on statistical methods applied to the data. Quantifying information loss from uncertainty can be used to inform study design before a project is carried out to increase the likelihood of detecting a given signal of interest and make more robust inferences from statistical analyses of palaeoproxy data. We generate synthetic ‘error-free’ core-type samples of pseudoproxies, on which environmental and observational processes are systematically introduced to impose uncertainties on the simulated pseudoproxies. The influence of three sources of uncertainty (core mixing, sub-sampling, and proxy quantification from sub-subsamples), are assessed for their individual and combined effects on two statistical methods: Fisher Information and principal curves. Increasing sub-sampling intervals has the most substantial influence on the two statistical methods applied to the pseudoproxy data. When combined, the interaction between increasing sub-sampling interval, and decreasing the number of proxies counted per sub-sample has the strongest influence on Fisher Information and principal curves. Fisher Information and principal curves are not affected in the same way by introducing uncertainty, with principal curves being less influenced by simulated proxy counting and sub-sampling of the core. Virtually assessing uncertainties is a powerful method to better understand the influence that uncertainties introduced at different parts of the analytical process have on conclusions drawn from palaeoecological data.
- Preprint
(917 KB) - Metadata XML
-
Supplement
(1258 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-3845', Anonymous Referee #1, 24 Apr 2025
-
AC1: 'Reply on RC1', Quinn Asena, 18 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3845/egusphere-2024-3845-AC1-supplement.pdf
-
AC3: 'Reply on RC1', Quinn Asena, 18 Aug 2025
Dear reviewer 1,
I have been able to upload a response, but unable to upload the revised version of the manuscript. I have contacted Copernicus to resolve the issue.
Kind regards!Citation: https://doi.org/10.5194/egusphere-2024-3845-AC3
-
AC1: 'Reply on RC1', Quinn Asena, 18 Aug 2025
-
RC2: 'Comment on egusphere-2024-3845', Anonymous Referee #2, 07 May 2025
Quinn Asena et al., Information loss in palaeoecological data from process and observer error. (CP manuscript egusphere-2024-3845 available as preprint on EGUsphere.)
General comments:
This study addresses the challenges of uncertainty in palaeoecological data derived from core-type samples. To quantify information loss due to environmental uncertainties and observer errors in palaeoecological research, the researchers employed a virtual ecological approach. They generated synthetic, error-free core-type samples of pseudoproxies and systematically introduced uncertainties related to core mixing, sub-sampling, and proxy quantification. The Fisher information and Principal curves statistical methods were used to assess the effect of different uncertainties imposed to the error-free samples. Their findings showed that increasing sub-sampling intervals had the most significant impact on both statistical methods. Moreover, the combination of increased sub-sampling intervals and decreased proxy counts per sub-sample showed the strongest influence on the analyses. Notably, principal curves were less affected by uncertainties related to proxy counting and sub-sampling compared to Fisher Information, which is more sensitive to short-term variability and driver interactions.
This paper presents a novel and interesting approach to deal with environmental, and observer introduced uncertainty in palaeoecology and its objectives are within the scope of CP. The objectives of the study are clearly outlined, the results and their interpretation support the conclusions. However, the methods section is very complicated and needs to be re-structured to aid readers who are not familiar with virtual ecology (like myself) to understand it a bit better. The way the methods are written, making the reproduction of the study rather difficult. On the other hand, the results and discussion section flow a lot better, and they are clearly written. I would recommend major revisions on the presentation of the methodology before publishing to CP.
Specific comments:
I agree with the previous comment that the methods’ section needs some more clarification. At the moment it is very difficult to follow the workflow of the study. There is a lot of terminology involved, and the authors should try and explain it in a more simple way. It would be very useful to have a methods’ figure that explains the whole processes and maybe another diagram explaining how the pseudoproxies are formed. Currently, there is no direct linkage between each method step, which makes it very challenging for the reader to follow the paper. For example, I started understanding the methodology only when I proceed to the results section and I had to go back to the methods to re-read it.
A little more clarification regarding the replicated models is needed. For example, why is there a specific number of replicas (31)? Is it a decision the authors took or is there another explanation? Why is there a 5000 time-steps? These might be parameters that are well known to readers who are familiar with virtual ecology studies but are not clear to all readers.
Both virtual sub-sampling and virtual proxy counts are referred as “observation models”. Could there be a sort of discrimination on those two uncertainty drivers?
It would be helpful to know what kind of simulated pseudoproxies were used in the study. As explained in the discussion (443-457) some proxies are more sensitive to environmental sensors, so knowing what kind of pseudoproxies were used in the simulating models will make them more transparent and interpretive.
The limitations of the study are mentioned for the first time on the conclusion. A brief discussion of those limitations should be given in the discussion section along with a few examples where this approach may not be suitable. Virtual ecology as a tool to quantify uncertainty on palaeoecology is a novel approach and therefore authors should explain in more detail the benefits as well as the limitations of their method.
Technical corrections:
- Line 237: “2.4. Principal curves” should be: “2.3.2. Principal curves”
- Line 254: “2.4.1. Feature analysis” should be: “2.4. Feature analysis”
- In Figure 3A and Figure 3B the heatmaps should have more space horizontally.
Citation: https://doi.org/10.5194/egusphere-2024-3845-RC2 -
AC2: 'Reply on RC2', Quinn Asena, 18 Aug 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3845/egusphere-2024-3845-AC2-supplement.pdf
-
AC4: 'Reply on RC2', Quinn Asena, 18 Aug 2025
Dear reviewer 2,
I have been able to upload a response, but unable to upload the revised version of the manuscript. I have contacted Copernicus to resolve the issue.
Kind regards!Citation: https://doi.org/10.5194/egusphere-2024-3845-AC4
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
660 | 65 | 21 | 746 | 40 | 25 | 46 |
- HTML: 660
- PDF: 65
- XML: 21
- Total: 746
- Supplement: 40
- BibTeX: 25
- EndNote: 46
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
General comment
Asena et al. used a virtual ecology approach to assess the sources of uncertainty on inferences drawn from palaeoecological data. Especially, they focused on the environmental (e.g., mixing, preservation, catchment erosion) and observer (core compression, sub-sampling, counting…) uncertainties to better understand which of them have the strongest influence on statistical methods applied to the data. They generated synthetic ‘error-free’ core-type samples of pseudoproxies, on which environmental and observational processes are systematically introduced to impose uncertainties on the simulated pseudoproxies. The influence of three sources of uncertainty (core mixing, sub-sampling, and proxy quantification from sub-subsamples), were assessed for their individual and combined effects on two statistical methods: Fisher Information and principal curves. Increasing sub-sampling intervals has the most substantial influence on the two statistical methods applied to the pseudoproxy data. Asena et al. also showed that Fisher Information and principal curves are not affected in the same way by introducing uncertainty. Asena et al. concluded that principal curves method more relevant to analyze a network of core data over a large geographic region, where the observer is interested in the spatial consistency of the system’s trajectory but does not have the resources to extract highly resolved data from each core. In contrast, Fisher Information is useful for short-term change in a single core.
The objectives and the method of this study correspond to the scope of CP. However, the introduction and the method need to be reshaped in order to be more easily understandable by readers with different background (data, modeling, proxy, etc…). I have also some questions on the method and how it can be useful for the researchers producing the data. For this reason, I recommend major revisions before publication to CP.
Major comments
Specific comments