Information loss in palaeoecological data from process and observer error
Abstract. Palaeoecological data give us insight into how ecosystems have changed in the past, and with the development of new sources of proxy data and statistical methods, they are being used to address questions around the underlying mechanisms of change, such as biotic- and climate-ecosystem interactions. However, inferences from palaeoecological data can be hindered by uncertainties inherent in core-type samples that arise from environmental processes and observer-introduced error. Environmental processes, core extraction methods, sub-sampling strategies, laboratory methods, and data processing can potentially mask ‘true’ signals in the data. The influence of sources of uncertainty on inferences drawn from palaeoecological data are rarely assessed but are critical to the confidence of our conclusions. To address this concern, we use a virtual ecological approach to assess the influences of environmental and observer introduced uncertainty to better understand which of them have the strongest influence on statistical methods applied to the data. Quantifying information loss from uncertainty can be used to inform study design before a project is carried out to increase the likelihood of detecting a given signal of interest and make more robust inferences from statistical analyses of palaeoproxy data. We generate synthetic ‘error-free’ core-type samples of pseudoproxies, on which environmental and observational processes are systematically introduced to impose uncertainties on the simulated pseudoproxies. The influence of three sources of uncertainty (core mixing, sub-sampling, and proxy quantification from sub-subsamples), are assessed for their individual and combined effects on two statistical methods: Fisher Information and principal curves. Increasing sub-sampling intervals has the most substantial influence on the two statistical methods applied to the pseudoproxy data. When combined, the interaction between increasing sub-sampling interval, and decreasing the number of proxies counted per sub-sample has the strongest influence on Fisher Information and principal curves. Fisher Information and principal curves are not affected in the same way by introducing uncertainty, with principal curves being less influenced by simulated proxy counting and sub-sampling of the core. Virtually assessing uncertainties is a powerful method to better understand the influence that uncertainties introduced at different parts of the analytical process have on conclusions drawn from palaeoecological data.