the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Towards spatio-temporal comparison of transient simulations and temperature reconstructions for the last deglaciation
Abstract. An increasing number of climate model simulations is becoming available for the transition from the Last Glacial Maximum to the Holocene. Assessing the simulations’ reliability requires benchmarking against environmental proxy records. To date, no established method exists to compare these two data sources in space and time over a period with changing background conditions. Here, we develop a new algorithm to rank simulations according to their deviation from reconstructed magnitudes and temporal patterns of orbital- as well as millennial-scale temperature variations. The use of proxy forward modeling avoids the need to reconstruct gridded or regional mean temperatures from sparse and uncertain proxy data.
First, we test the reliability and robustness of our algorithm in idealized experiments with prescribed deglacial temperature histories. We quantify the influence of limited temporal resolution, chronological uncertainties, and non-climatic processes by constructing noisy pseudo-proxies. While model-data comparison results become less reliable with increasing uncertainties, we find that the algorithm discriminates well between simulations under realistic non-climatic noise levels. To obtain reliable and robust rankings, we advise spatial averaging of the results for individual proxy records.
Second, we demonstrate our method by quantifying the deviations between an ensemble of transient deglacial simulations and a global compilation of sea surface temperature reconstructions. The ranking of the simulations differs substantially between the considered regions and timescales. We attribute this diversity in the rankings to more regionally confined temperature variations in reconstructions than in simulations, which could be the result of uncertainties in boundary conditions, shortcomings in models, or regionally varying characteristics of reconstructions such as recording seasons and depths. Future work towards disentangling these potential reasons can leverage the flexible design of our algorithm and its demonstrated ability to identify varying levels of model-data agreement.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(1372 KB)
-
Supplement
(576 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1372 KB) - Metadata XML
-
Supplement
(576 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-986', Anonymous Referee #1, 15 Jun 2023
The manuscript “Towards spatio-temporal comparison of transient simulations and temperature reconstructions for the last deglaciation” introduces a new algorithm using proxy system models (PSM) to allow for a comprehensive data-model comparison of transient deglacial model simulations. The manuscript is predominately a methods paper where the authors introduce their algorithm, test the sensitivity of its parameters, and apply it in a perfect model pseudo-proxy experiment context. The review and vetting of the algorithm comprise the bulk of the text, where applying the algorithm to benchmark paleo simulations versus data is comparatively much shorter. While I am not an expert in the statistical models used within the methodology, the body of work appears sound without any immediate issues. I feel the wording justifying the need for the extra complexity of using a PSM could be improved. One of the core justifications for using the PSM is it avoids the need to rely on “sparse and uncertain proxy data”, yet the PSM largely degrades or reforms the model output to be more compatible with the proxy data and then uses same sparse and uncertain proxy data to benchmark the PSM created forward-modeled proxy time series. The authors also do not address why more traditional signal processing methods, such as Principal Component Analysis, could not instead be used to extract signal from the messy proxy data without the added complexity (and caveats) of using the PSM. I don’t think these criticisms undercut the work in any way, only that the justification for the need of the algorithm and PSM could be framed better.
While less text is dedicated to evaluating the model simulations against data, I disagree with one of the key findings: “Comparing the MPI-ESM and CCSM3 simulations that employ orbital, GHG, and ice sheet forcing, we find no systematic differences between the two climate models. In particular, TraCE-ALL is mostly within the IQD spread of the six MPI-ESM simulations.” This is an important statement, so the language should be more precises. What specifically are the authors referring to? There is only one MPI-ESM simulation that employs orbital, GHG, and ice sheet forcing (MPI_Ice6G_P2_glob), and there is no comparable TraCE simulation since TraCE-GHG and TraCE-ORB fix the ice sheet forcing to LGM (see Table 1). When I look at Figure 7, I don’t see TraCE-ALL effectively being the same as MPI with freshwater flux. Especially in the North Atlantic and North Pacific. What does it mean that MPI_Ice6G_P2_noMW outperforms TraCE-ALL in the North Atlantic for orbital pattern, when TraCE-ALL is specifically designed to reproduce the reconstructed AMOC variability? AMOC variability is sub-orbital scale of course, but what does it mean that a model without hosing is capturing that scale of variability better than TraCE-ALL (or conversely, the addition of hosing degrades the orbital-scale performance)? Likewise, what are the implications of TraCE-ALL and TraCE-GHG having nearly identical millennial pattern deviations in the North Atlantic, when the TraCE-GHG doesn’t include freshwater hosing? The TraCE-ALL and hosed MPI IQDs in the Figure 9 legend are largely not similar, no less TraCE-ALL bracketed by MPI. It is often difficult and nuanced to say when one model is performing better than another, but I don’t think the analysis and figures here support the claim TraCE-ALL and hosed MPI are effectively the same when compared to data.
The text notes “More generally, all simulations with meltwater input show a better agreement with reconstructions for millennial magnitudes than those without meltwater input.” I don’t think this is strictly true. There are cases where MPI_Ice6G_P2_noMW performs similar to, if not better than, the routed MPI-ESM simulations. In either case, this only means the millennial-scale variability is more like the data when hosing is added, not that the pattern is realistic (as noted around line 550). This is more apparent with the TraCE simulations where in some locations the addition of hosing degrades model performance. Getting the magnitude of variability correct, but the patterns (ie trends) of the deglaciation wrong isn’t particularly satisfying, which could be emphasized here.
I feel the manuscript would be improved from relatively minor revisions for clarity.
----------------
Minor comments and notes:
Line ~125 : define or give examples of “sensor” for the novice.
Line 137: Osman, et al., 2021 uses four proxy types, so I am not sure why it cited here.
Lines 177: “Computing averages in this last step instead of averaging temperature time series in the beginning avoids interpolating proxy records with irregular time axes to a common resolution.” Explain this. It seems like in some portions of the analysis the data are binned to a fixed 100-yr timestep (ie Section 3.2). The time series displayed in Figure 9 are regional averages, are these first binned to 100-yr interval or are they somehow calculated on irregular time spacing for multiple records and ensemble members?
Lines ~220: magnitude is defined as the standard deviation of each ensemble member (for the decomposed time series). Since standard deviation is an absolute value, doesn’t this fail to discriminate between trends in opposite directions (ie data is cooling when model is warming)?
Throughout the text “magnitude” is used to denote the degree of variability in the decomposed time series, which is just saying the strength of variability. It may not be obvious to the reader what the utility of this metric is. We tend to think in terms of time series, so “pattern” (as defined here) is far more intuitive.
Lines 235: Since N is either 100 or 1000, I assume an empirical probability distribution is used rather than a fitted distribution.
Line 244: How does IQD integrate differences in time series? Each forward-modeled proxy time series is on the same irregular age model spacing as the proxy data, but how is the time series translated to distributions used in the IQD equation?
Lines ~254: Zonal IQD seems to only be used in part of Figure 2, which is a flowchart of the analysis. If it is not used in the results section, could it be removed?
Section 4.2 Comparison of simulations against SST reconstructions: I think it would be really useful to the reader to plot orbital + millennial time series for the regions summarized in Figure 7 (ie this new plot should come before Figure 7). I envision something like Figure 9, which would give the reader a feel for what the models are simulating (relative to the data) prior to the decomposition. Those raw trends are somewhat abstracted away by plotting orbital and millennial time series separately. For example, in Figure 9 MPI_Ice6G_P3 has a cooling trend around 14 – 13 ka in both the orbital and millennial scales. If it is caused by the injection of freshwater forcing, I would expect it to only be in the millennial-scale (also perhaps implying freshwater forcing is showing up in the orbital-scale decomposition).
Figure 7: It would be very verbose, but would it be worth plotting a magnitude versus pattern IQD scatter plot? There are too many combinations for the main text, so perhaps an example (perhaps millennial magnitude versus pattern for the North Atlantic)? The best models should converge in the lower left of the plot near the plot origin (0,0).
Line 597: “To avoid the need to reconstruct gridded or regional mean temperatures from sparse and uncertain proxy data, the algorithm applies proxy system models to simulation output and quantifies the deviation between the resulting forward-modeled proxy time series and temperature reconstructions”. Doesn’t Figure 9 create regional stacks? I understand mean IQD is used to summarize regions (as explained in section 3.1.4), but how are time series of regional averages constructed?
Figure S2. Many of the plot titles in Figure S2 are identical. I assume this is depicting multiple records from the same core site. Perhaps this could be denoted better in the plot titles or figure caption. Also, how are the regional stack time series in Figure 9 made when not all records in Figure S2 span 19 – 9 ka?
Citation: https://doi.org/10.5194/egusphere-2023-986-RC1 - AC1: 'Reply on RC1', Nils Weitzel, 15 Dec 2023
-
RC2: 'Comment on egusphere-2023-986', Anonymous Referee #2, 18 Oct 2023
In this manuscript the authors present a new methodology to compare simulated and reconstructed sea surface temperatures and apply it to the case of the last deglaciation. The method nicely separates different aspects of temperature variability on different time-scales and will become a valuable tool to quantify model-data agreement as new transient model simulations and more proxy records of past climate intervals become available.
The paper is well written and the results are clearly presented and I therefore recommend publication of the paper in Climate of the Past after some, mostly minor, issues have been addressed.Main comments
I’m not convinced that the title properly reflects the content of the paper, namely a comparison of simulated and reconstructed sea surface temperatures. Possibly reformulate to something like:
Towards spatio-temporal comparison of simulated and reconstructed (sea surface) temperatures for the last deglaciationI realize that this is a predominantly methodological paper, but the authors could possibly consider to shorten the technical part a bit by moving some details to the supplementary, and focus a bit more on the results in terms of how well different models reproduce different aspects of the temperature evolution over the last deglaciation. For example Fig. 6 seems to add very little information and the corresponding section 4.1.3 seems very extended considering that the important messages are simply that i) the reliability and robustness of the algorithm seem to be very little affected by misspecified SNRs and ii) the effect of under- or overestimating the temporal persistence of non-climatic noise is negligible in our PPEs.
A nice addition to Figures 8 and 9 would be a figure showing also a direct comparison of simulated and reconstructed temperature (anomalies) time series for the different regions. I believe that a simple visual inspection of model and observation time series is still useful to get a first idea about model-data agreement.
I suggest removing the simulations which do not include all forcings, i.e. TraCE-ORB and TraCE-GHG, from the main analysis. For example, in Fig. 1a it is confusing to show simulations which do not include the full forcing as it gives the impression that the spread among models is even larger than it actually already is. I think it is perfectly fine to include those simulations to test the methodology, but not when it comes to the actual comparison of how well models simulate different aspects of the last deglaciation (e.g. lines 540-545).
Can the presented new methodology, which is here applied to SSTs, in principle also be extended and applied to other variables? It is mentioned that it could be extended to land temperature reconstructions, but what about very different variables like carbon or oxygen isotopes?
Some parts of the text are filled with acronyms, which makes it sometimes a bit hard to read. Please consider if the use of acronyms could be reduced, particularly also in figure captions. One example: is LD really needed?
Minor comments:
L. 30: The references cited do not support the 3-8°C range given in the paper. From the abstracts: Tierney: -5.7 to -6.5°C; Annan: -4.5+-0.9°C.
Section 2: consistently use either past or present tense
L. 106: are -> is
L. 130-131: How is that justified? What does sub-surface mean? Is it still in the mixed layer?
L. 396: FRPRR -> FPRR ?
Fig. 2 top-right, Proxy System Model: the single line representing the simulation is possibly a bit misleading, as there are 4 time series from different model grid cells that enter the PSM, if I understood correctly.
Fig. 2 bottom: how are the latitudinal belts defined? By the dashed vertical lines? It is not clear from the figure caption. Please repeat the text in 3.1.4.
Fig. 2 Timescale decomposition panels: this is just a detail, but it would be nice and more intuitive if the lines would be plotted and shown in the legends in order of increasing ‘smoothing’, i.e. 1) Reconstruction, 2) Orbital+millennial, 3) Orbital.
Citation: https://doi.org/10.5194/egusphere-2023-986-RC2 - AC2: 'Reply on RC2', Nils Weitzel, 15 Dec 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-986', Anonymous Referee #1, 15 Jun 2023
The manuscript “Towards spatio-temporal comparison of transient simulations and temperature reconstructions for the last deglaciation” introduces a new algorithm using proxy system models (PSM) to allow for a comprehensive data-model comparison of transient deglacial model simulations. The manuscript is predominately a methods paper where the authors introduce their algorithm, test the sensitivity of its parameters, and apply it in a perfect model pseudo-proxy experiment context. The review and vetting of the algorithm comprise the bulk of the text, where applying the algorithm to benchmark paleo simulations versus data is comparatively much shorter. While I am not an expert in the statistical models used within the methodology, the body of work appears sound without any immediate issues. I feel the wording justifying the need for the extra complexity of using a PSM could be improved. One of the core justifications for using the PSM is it avoids the need to rely on “sparse and uncertain proxy data”, yet the PSM largely degrades or reforms the model output to be more compatible with the proxy data and then uses same sparse and uncertain proxy data to benchmark the PSM created forward-modeled proxy time series. The authors also do not address why more traditional signal processing methods, such as Principal Component Analysis, could not instead be used to extract signal from the messy proxy data without the added complexity (and caveats) of using the PSM. I don’t think these criticisms undercut the work in any way, only that the justification for the need of the algorithm and PSM could be framed better.
While less text is dedicated to evaluating the model simulations against data, I disagree with one of the key findings: “Comparing the MPI-ESM and CCSM3 simulations that employ orbital, GHG, and ice sheet forcing, we find no systematic differences between the two climate models. In particular, TraCE-ALL is mostly within the IQD spread of the six MPI-ESM simulations.” This is an important statement, so the language should be more precises. What specifically are the authors referring to? There is only one MPI-ESM simulation that employs orbital, GHG, and ice sheet forcing (MPI_Ice6G_P2_glob), and there is no comparable TraCE simulation since TraCE-GHG and TraCE-ORB fix the ice sheet forcing to LGM (see Table 1). When I look at Figure 7, I don’t see TraCE-ALL effectively being the same as MPI with freshwater flux. Especially in the North Atlantic and North Pacific. What does it mean that MPI_Ice6G_P2_noMW outperforms TraCE-ALL in the North Atlantic for orbital pattern, when TraCE-ALL is specifically designed to reproduce the reconstructed AMOC variability? AMOC variability is sub-orbital scale of course, but what does it mean that a model without hosing is capturing that scale of variability better than TraCE-ALL (or conversely, the addition of hosing degrades the orbital-scale performance)? Likewise, what are the implications of TraCE-ALL and TraCE-GHG having nearly identical millennial pattern deviations in the North Atlantic, when the TraCE-GHG doesn’t include freshwater hosing? The TraCE-ALL and hosed MPI IQDs in the Figure 9 legend are largely not similar, no less TraCE-ALL bracketed by MPI. It is often difficult and nuanced to say when one model is performing better than another, but I don’t think the analysis and figures here support the claim TraCE-ALL and hosed MPI are effectively the same when compared to data.
The text notes “More generally, all simulations with meltwater input show a better agreement with reconstructions for millennial magnitudes than those without meltwater input.” I don’t think this is strictly true. There are cases where MPI_Ice6G_P2_noMW performs similar to, if not better than, the routed MPI-ESM simulations. In either case, this only means the millennial-scale variability is more like the data when hosing is added, not that the pattern is realistic (as noted around line 550). This is more apparent with the TraCE simulations where in some locations the addition of hosing degrades model performance. Getting the magnitude of variability correct, but the patterns (ie trends) of the deglaciation wrong isn’t particularly satisfying, which could be emphasized here.
I feel the manuscript would be improved from relatively minor revisions for clarity.
----------------
Minor comments and notes:
Line ~125 : define or give examples of “sensor” for the novice.
Line 137: Osman, et al., 2021 uses four proxy types, so I am not sure why it cited here.
Lines 177: “Computing averages in this last step instead of averaging temperature time series in the beginning avoids interpolating proxy records with irregular time axes to a common resolution.” Explain this. It seems like in some portions of the analysis the data are binned to a fixed 100-yr timestep (ie Section 3.2). The time series displayed in Figure 9 are regional averages, are these first binned to 100-yr interval or are they somehow calculated on irregular time spacing for multiple records and ensemble members?
Lines ~220: magnitude is defined as the standard deviation of each ensemble member (for the decomposed time series). Since standard deviation is an absolute value, doesn’t this fail to discriminate between trends in opposite directions (ie data is cooling when model is warming)?
Throughout the text “magnitude” is used to denote the degree of variability in the decomposed time series, which is just saying the strength of variability. It may not be obvious to the reader what the utility of this metric is. We tend to think in terms of time series, so “pattern” (as defined here) is far more intuitive.
Lines 235: Since N is either 100 or 1000, I assume an empirical probability distribution is used rather than a fitted distribution.
Line 244: How does IQD integrate differences in time series? Each forward-modeled proxy time series is on the same irregular age model spacing as the proxy data, but how is the time series translated to distributions used in the IQD equation?
Lines ~254: Zonal IQD seems to only be used in part of Figure 2, which is a flowchart of the analysis. If it is not used in the results section, could it be removed?
Section 4.2 Comparison of simulations against SST reconstructions: I think it would be really useful to the reader to plot orbital + millennial time series for the regions summarized in Figure 7 (ie this new plot should come before Figure 7). I envision something like Figure 9, which would give the reader a feel for what the models are simulating (relative to the data) prior to the decomposition. Those raw trends are somewhat abstracted away by plotting orbital and millennial time series separately. For example, in Figure 9 MPI_Ice6G_P3 has a cooling trend around 14 – 13 ka in both the orbital and millennial scales. If it is caused by the injection of freshwater forcing, I would expect it to only be in the millennial-scale (also perhaps implying freshwater forcing is showing up in the orbital-scale decomposition).
Figure 7: It would be very verbose, but would it be worth plotting a magnitude versus pattern IQD scatter plot? There are too many combinations for the main text, so perhaps an example (perhaps millennial magnitude versus pattern for the North Atlantic)? The best models should converge in the lower left of the plot near the plot origin (0,0).
Line 597: “To avoid the need to reconstruct gridded or regional mean temperatures from sparse and uncertain proxy data, the algorithm applies proxy system models to simulation output and quantifies the deviation between the resulting forward-modeled proxy time series and temperature reconstructions”. Doesn’t Figure 9 create regional stacks? I understand mean IQD is used to summarize regions (as explained in section 3.1.4), but how are time series of regional averages constructed?
Figure S2. Many of the plot titles in Figure S2 are identical. I assume this is depicting multiple records from the same core site. Perhaps this could be denoted better in the plot titles or figure caption. Also, how are the regional stack time series in Figure 9 made when not all records in Figure S2 span 19 – 9 ka?
Citation: https://doi.org/10.5194/egusphere-2023-986-RC1 - AC1: 'Reply on RC1', Nils Weitzel, 15 Dec 2023
-
RC2: 'Comment on egusphere-2023-986', Anonymous Referee #2, 18 Oct 2023
In this manuscript the authors present a new methodology to compare simulated and reconstructed sea surface temperatures and apply it to the case of the last deglaciation. The method nicely separates different aspects of temperature variability on different time-scales and will become a valuable tool to quantify model-data agreement as new transient model simulations and more proxy records of past climate intervals become available.
The paper is well written and the results are clearly presented and I therefore recommend publication of the paper in Climate of the Past after some, mostly minor, issues have been addressed.Main comments
I’m not convinced that the title properly reflects the content of the paper, namely a comparison of simulated and reconstructed sea surface temperatures. Possibly reformulate to something like:
Towards spatio-temporal comparison of simulated and reconstructed (sea surface) temperatures for the last deglaciationI realize that this is a predominantly methodological paper, but the authors could possibly consider to shorten the technical part a bit by moving some details to the supplementary, and focus a bit more on the results in terms of how well different models reproduce different aspects of the temperature evolution over the last deglaciation. For example Fig. 6 seems to add very little information and the corresponding section 4.1.3 seems very extended considering that the important messages are simply that i) the reliability and robustness of the algorithm seem to be very little affected by misspecified SNRs and ii) the effect of under- or overestimating the temporal persistence of non-climatic noise is negligible in our PPEs.
A nice addition to Figures 8 and 9 would be a figure showing also a direct comparison of simulated and reconstructed temperature (anomalies) time series for the different regions. I believe that a simple visual inspection of model and observation time series is still useful to get a first idea about model-data agreement.
I suggest removing the simulations which do not include all forcings, i.e. TraCE-ORB and TraCE-GHG, from the main analysis. For example, in Fig. 1a it is confusing to show simulations which do not include the full forcing as it gives the impression that the spread among models is even larger than it actually already is. I think it is perfectly fine to include those simulations to test the methodology, but not when it comes to the actual comparison of how well models simulate different aspects of the last deglaciation (e.g. lines 540-545).
Can the presented new methodology, which is here applied to SSTs, in principle also be extended and applied to other variables? It is mentioned that it could be extended to land temperature reconstructions, but what about very different variables like carbon or oxygen isotopes?
Some parts of the text are filled with acronyms, which makes it sometimes a bit hard to read. Please consider if the use of acronyms could be reduced, particularly also in figure captions. One example: is LD really needed?
Minor comments:
L. 30: The references cited do not support the 3-8°C range given in the paper. From the abstracts: Tierney: -5.7 to -6.5°C; Annan: -4.5+-0.9°C.
Section 2: consistently use either past or present tense
L. 106: are -> is
L. 130-131: How is that justified? What does sub-surface mean? Is it still in the mixed layer?
L. 396: FRPRR -> FPRR ?
Fig. 2 top-right, Proxy System Model: the single line representing the simulation is possibly a bit misleading, as there are 4 time series from different model grid cells that enter the PSM, if I understood correctly.
Fig. 2 bottom: how are the latitudinal belts defined? By the dashed vertical lines? It is not clear from the figure caption. Please repeat the text in 3.1.4.
Fig. 2 Timescale decomposition panels: this is just a detail, but it would be nice and more intuitive if the lines would be plotted and shown in the legends in order of increasing ‘smoothing’, i.e. 1) Reconstruction, 2) Orbital+millennial, 3) Orbital.
Citation: https://doi.org/10.5194/egusphere-2023-986-RC2 - AC2: 'Reply on RC2', Nils Weitzel, 15 Dec 2023
Peer review completion
Journal article(s) based on this preprint
Model code and software
Code in support of "Towards spatio-temporal comparison of transient simulations and temperature reconstructions for the last deglaciation" N. Weitzel, H. Andres, J.-P. Baudouin, M. Kapsch, U. Mikolajewicz, L. Jonkers, O. Bothe, E. Ziegler, T. Kleinen, A. Paul, and K. Rehfeld https://doi.org/10.5281/zenodo.7924110
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
378 | 197 | 29 | 604 | 47 | 19 | 17 |
- HTML: 378
- PDF: 197
- XML: 29
- Total: 604
- Supplement: 47
- BibTeX: 19
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
Nils Weitzel
Heather Andres
Jean-Philippe Baudouin
Marie Kapsch
Uwe Mikolajewicz
Lukas Jonkers
Oliver Bothe
Elisa Ziegler
Thomas Kleinen
André Paul
Kira Rehfeld
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1372 KB) - Metadata XML
-
Supplement
(576 KB) - BibTeX
- EndNote
- Final revised paper