the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
StratoBayes: A Bayesian method for automated stratigraphic correlation and age modelling
Abstract. Stratigraphic correlation and age modelling are fundamental to reconstructing Earth’s history, biological evolution, and palaeoclimate, and underpin the exploration for subsurface resources. Correlations are produced by integrating diverse stratigraphic data across multiple sites, typically by visual inspection. Here, we introduce ‘StratoBayes’, a Bayesian statistical framework that combines stratigraphic correlation and depositional age estimation of stratigraphic horizons, i.e. age modelling. Our method aligns quantitative signals from two or more sites by shifting and scaling, allowing for sedimentation rate changes between stratigraphic partitions. The likelihood of an alignment is evaluated by how well the adjusted signals conform to a shared smooth trend, represented by a cubic spline. Tie points or independent age constraints, such as radiometric dates or biostratigraphic markers, can be integrated within this framework, providing age estimates for all sites. Our approach identifies multiple alignments where distinct alternatives exist, estimates their relative probabilities, and quantifies the uncertainty associated with correlations and age estimates. We apply StratoBayes to a lower Cambrian dataset comprising a combination of δ13C records, radiometric dates and astrochronology from four sites in Morocco and Siberia. The results demonstrate its capacity to quantify existing alignments, and provide the first precise age estimate for the evolutionary appearance of trilobites in Siberia, one of the hallmarks of the Cambrian Explosion. Beyond this application, StratoBayes offers a generalisable framework for probabilistic stratigraphic correlation, with potential to improve age models across a range of proxy records and time intervals.
- Preprint
(1672 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2025-1355', Maarten Blaauw, 25 May 2025
- AC1: 'Reply on RC1', Kilian Eichenseer, 30 Jun 2025
-
RC2: 'Comment on egusphere-2025-1355', Andrew Curtis, 02 Jun 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1355/egusphere-2025-1355-RC2-supplement.pdf
- AC2: 'Reply on RC2', Kilian Eichenseer, 30 Jun 2025
Status: closed
-
RC1: 'Comment on egusphere-2025-1355', Maarten Blaauw, 25 May 2025
This manuscript proposes a new method to align multiple proxy records based on assumed synchroneity (e.g. appearance of key trilobite fossils); additional data such as radiometric dates or known ages of fossils can also be added. The model draws a Bayesian cubic spline (Heaton et al., 2020) per to-be-aligned proxy, using evenly-spaced knots and smoothness parameters. The model is applied to some synthetic and real-world examples.
I like the fact that not just one alignment is chosen, displayed and discussed, but a range of alignments (e.g., Fig. 6 and section 5.1). This clearly shows the probabilistic and uncertain nature of aligning multiple records, and thus the need and potential for a Bayesian framework. Could the age-depth relationships of the three solutions from Fig. 6 also be shown in a Figure akin to Fig. 7, to see how variable the reconstructed rates and hiatuses are?
Sometimes stratigraphical correlation is the only way to obtain a chronology for a proxy record, e.g. where no absolute/radiometric age estimates are available. However, it would be good to also highlight potential problems with aligning records based on their assumed synchroneity, e.g. problems with circular reasoning, possible erroneous choice of tie-points, and the introduction of a dependence between records. These problems are reviewed by Blaauw 2012 (doi:10.1016/j.quascirev.2010.11.012).
Line 76, would it be useful to mention Trayler et al. 2024's Astrobayes age-model, which includes hiatuses (doi:10.5194/gchron-6-107-2024)?
Lines 228-32 and 646-52 list an important limitation of the proposed model; assumed linear sedimentation rates will not cause chronological uncertainties to widen further away from age constraints. Some of the reconstructed age-model uncertainties seem very narrow indeed, e.g. 7d. Does setting spline knots at regular intervals not help?
For a frequently-used Bayesian age-depth model that includes priors on sedimentation rates and variability, please cite Bacon (Blaauw & Christen 2011, https://projecteuclid.org/euclid.ba/1339616472). Bacon is a piece-wise linear model much like what is proposed here; it also includes time hiatuses, slumps (depth 'hiatuses') and changes in sedimentation rates. It uses the t-walk, a flexible MCMC (Christen & Fox 2010, http://projecteuclid.org/euclid.ba/1340218339). Although Bacon is most often used on radiocarbon-dating timescales, it has also been applied to much longer time-scales. That said, the usage of dozens of parameters per site (owing to long cores with thin sections) would probably cause the MCMC to run much, much slower than the 5 days reported here.
I ran a quick toy age-model in R using the vignette provided and all ran fine. This is important, because other recently proposed methods I've seen rely on many additional packages and on software external to R such as JAGS to run (often resulting in failure). Pity though that only binary versions are provided - could the source c++ code also be provided? That would enable users on other operating platforms to also run the code, would enable users to get a better idea of what exactly is done, and would be much more future-proof.
The MCMC runs multiple chains but only retains the samples from one chain (both a burn-in and thinning are applied afterward). Is this a standard approach?
Could you clarify $\mu$ in section 2.1: is this a hypothetical target to which all sites are tuned, or is this akin to target/reference Site 1 as in Fig. 1?
Fig. 1 of the hypothetical sample: can the $\alpha$ and $\gamma$ values of the placement in c) be depicted as vertical lines overlying the prior distributions of panel b)? This because in this example, site 2 is compressed a lot (2.8 times faster than site 1), and it would be nice to see where it falls on the log-normal prior (as well as of course the placement on the uniform prior, 12.5 m). In this example, site 2 accumulates linearly over time.
Eq. 8, shouldn't the hiatuses $\delta$ be expressed as gaps/jumps in time, not depth/height?
Citation: https://doi.org/10.5194/egusphere-2025-1355-RC1 - AC1: 'Reply on RC1', Kilian Eichenseer, 30 Jun 2025
-
RC2: 'Comment on egusphere-2025-1355', Andrew Curtis, 02 Jun 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-1355/egusphere-2025-1355-RC2-supplement.pdf
- AC2: 'Reply on RC2', Kilian Eichenseer, 30 Jun 2025
Data sets
Data and R code K. Eichenseer et al. https://zenodo.org/records/15065336
Model code and software
Software installation K. Eichenseer et al. https://stratobayes.github.io/software.html
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
981 | 111 | 18 | 1,110 | 18 | 32 |
- HTML: 981
- PDF: 111
- XML: 18
- Total: 1,110
- BibTeX: 18
- EndNote: 32
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
This manuscript proposes a new method to align multiple proxy records based on assumed synchroneity (e.g. appearance of key trilobite fossils); additional data such as radiometric dates or known ages of fossils can also be added. The model draws a Bayesian cubic spline (Heaton et al., 2020) per to-be-aligned proxy, using evenly-spaced knots and smoothness parameters. The model is applied to some synthetic and real-world examples.
I like the fact that not just one alignment is chosen, displayed and discussed, but a range of alignments (e.g., Fig. 6 and section 5.1). This clearly shows the probabilistic and uncertain nature of aligning multiple records, and thus the need and potential for a Bayesian framework. Could the age-depth relationships of the three solutions from Fig. 6 also be shown in a Figure akin to Fig. 7, to see how variable the reconstructed rates and hiatuses are?
Sometimes stratigraphical correlation is the only way to obtain a chronology for a proxy record, e.g. where no absolute/radiometric age estimates are available. However, it would be good to also highlight potential problems with aligning records based on their assumed synchroneity, e.g. problems with circular reasoning, possible erroneous choice of tie-points, and the introduction of a dependence between records. These problems are reviewed by Blaauw 2012 (doi:10.1016/j.quascirev.2010.11.012).
Line 76, would it be useful to mention Trayler et al. 2024's Astrobayes age-model, which includes hiatuses (doi:10.5194/gchron-6-107-2024)?
Lines 228-32 and 646-52 list an important limitation of the proposed model; assumed linear sedimentation rates will not cause chronological uncertainties to widen further away from age constraints. Some of the reconstructed age-model uncertainties seem very narrow indeed, e.g. 7d. Does setting spline knots at regular intervals not help?
For a frequently-used Bayesian age-depth model that includes priors on sedimentation rates and variability, please cite Bacon (Blaauw & Christen 2011, https://projecteuclid.org/euclid.ba/1339616472). Bacon is a piece-wise linear model much like what is proposed here; it also includes time hiatuses, slumps (depth 'hiatuses') and changes in sedimentation rates. It uses the t-walk, a flexible MCMC (Christen & Fox 2010, http://projecteuclid.org/euclid.ba/1340218339). Although Bacon is most often used on radiocarbon-dating timescales, it has also been applied to much longer time-scales. That said, the usage of dozens of parameters per site (owing to long cores with thin sections) would probably cause the MCMC to run much, much slower than the 5 days reported here.
I ran a quick toy age-model in R using the vignette provided and all ran fine. This is important, because other recently proposed methods I've seen rely on many additional packages and on software external to R such as JAGS to run (often resulting in failure). Pity though that only binary versions are provided - could the source c++ code also be provided? That would enable users on other operating platforms to also run the code, would enable users to get a better idea of what exactly is done, and would be much more future-proof.
The MCMC runs multiple chains but only retains the samples from one chain (both a burn-in and thinning are applied afterward). Is this a standard approach?
Could you clarify $\mu$ in section 2.1: is this a hypothetical target to which all sites are tuned, or is this akin to target/reference Site 1 as in Fig. 1?
Fig. 1 of the hypothetical sample: can the $\alpha$ and $\gamma$ values of the placement in c) be depicted as vertical lines overlying the prior distributions of panel b)? This because in this example, site 2 is compressed a lot (2.8 times faster than site 1), and it would be nice to see where it falls on the log-normal prior (as well as of course the placement on the uniform prior, 12.5 m). In this example, site 2 accumulates linearly over time.
Eq. 8, shouldn't the hiatuses $\delta$ be expressed as gaps/jumps in time, not depth/height?