Using ocean surface paleo-density to evaluate PMIP3 and PMIP4 Last Glacial Maximum climate simulations

Barathieu, Héloïse; Caley, Thibaut; Kageyama, Masa; Swingedouw, Didier; Braconnot, Pascale

doi:10.5194/egusphere-2026-254

Preprints

https://doi.org/10.5194/egusphere-2026-254

Preprints

02 Feb 2026

| 02 Feb 2026

Using ocean surface paleo-density to evaluate PMIP3 and PMIP4 Last Glacial Maximum climate simulations

Héloïse Barathieu, Thibaut Caley, Masa Kageyama, Didier Swingedouw, and Pascale Braconnot

Abstract. Quantitative reconstruction of ocean surface density during the Last Glacial Maximum (LGM) offers valuable insights into the ability of climate models to simulate past climate conditions, when global temperatures were about 4.5 °C to 6 °C colder than today. We assess the performance of the LGM climate simulations, as part of the 3^rd and 4^th phase of the Paleoclimate Modeling Intercomparisons Project, using a recent ocean surface density reconstruction based on the δ¹⁸O of foraminiferal calcite (δ¹⁸Oc). We consider the differences between the LGM and the preindustrial climates and each period separately, at both global and regional scales. Because surface density reflects the combined effects of temperature and salinity, we also examined sea surface temperature (SST) to better identify the processes underlying model–data differences.

Surface density reconstructions show greater variability than simulated surface density. Models therefore struggle to reproduce the spatial variability of the density difference (LGM – pre-industrial (PI)), but part of the mismatch may arise from the uneven spatial distribution of reconstructions, which are mostly located near coastal areas.

Density anomaly (LGM – PI) differences between data and models are largely controlled by sea surface salinity (SSS), with SST contributing to a lesser extent. This influence of SSS is directly linked to the reduction in tropical precipitation during the LGM: models that best match the large-scale density anomalies also simulate the strongest reductions in reconstructed low-latitude precipitation during the LGM, highlighting the key role of hydrological cycle changes in shaping surface density.

On a global scale, 100 % of model simulations show a statistically significant relationship with surface density reconstructions, looking at LGM and PI separately. However, on a regional scale, some features are poorly simulated, leading to weaker agreement between data and model simulations, particularly in the North Indian and Southern Oceans. Our analysis concludes with a focus in the Indo-Pacific Warm Pool. Past reconstructions indicate a LGM weakened Indian ocean west–east surface density gradient, but only 7 out of 14 models (50 %) reproduce this feature. These results highlight the need to better constrain regional hydrological cycle changes in models, as improving their representation is crucial to reduce uncertainties in both paleoclimate simulations and future climate projections.

Received: 16 Jan 2026 – Discussion started: 02 Feb 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5173 KB)

Supplement (4515 KB)

Download & links

Héloïse Barathieu, Thibaut Caley, Masa Kageyama, Didier Swingedouw, and Pascale Braconnot

Status: final response (author comments only)

RC1:
'Comment on egusphere-2026-254', Chris Brierley, 14 Mar 2026

Review by Chris Brierley
This manuscript by Barathieu et al moves paleoclimate data-model comparisons in a novel direction, by providing the first example of looking assessing ocean density changes. This is built off the foram-based compilation by Caley et al for the last glacial maximum, combined with the PMIP coupled model simulations.
I believe that this manuscript is a good fit for Climate of the Past and should be published after revision. While I outline some specific comments below, they all come under a single wide umbrella of a comment. I found this article contained so much detail, especially about the individual comparison results) that it obscured your message as times. I feel that you will gain more impact and traction from the manuscript if you are able to be more concise.
Specific comments (not in order of importance).
Sect 2.1.1 You describe the methodology of the Caley et al (2025) data compilation. However you do not describe any of the broad findings from this dataset. Neither do you provide any information about the uncertainty in the reconstructions, such as an approximate calibration error.
L210. I do not see why you would want to interpolate all these results onto a common grid resolution prior to computing the density. Surely it would be more accurate, and computationally efficient to perform all the analysis on each model’s native grid.
L214. Is there a reason that you choose to not use any density fields that had be stored on the ESFG?
Table 1. You classify HadCM3 and iLOVECLIM as CMIP6 models. HadCM3 was originally built as part of CMIP3. I accept that it has performed the PMIP4 protocol and is part of PMIP4, but surely not “HadCM3-PMIP3”. Please be more accurate in your model descriptions, as this has implications for your conclusions about the improvement of models between generations (such as in Fig 1).
L239-241. Please explain what these numbers mean, and why I need to know them.
L260. Please number the first figure you introduce in your manuscript as Fig 1.
L261. Please describe what the pseudo-proxy approach means here. I have heard the term as a way of combining changes in SST and SSS to give changes in (coral) d18O. But this clearly is not what you mean.
Fig. 1. Is this not global? So why is the x-axis labelled ‘in the basin’?
Fig. 1. Consider whether it is appropriate to subdivide between the two model generations.
(e.g.) L303. Why do you write *absolute* density anomalies. Do relative density anomalies have any meaning (I guess they will all be around 0.1%)? So what is the ‘absolute’ clarifying?
L308. Please put the values in Table 1.
Fig. 2 This is very hard to read, and especially see the difference between the model and proxy data. Consider things like different projections to minimise the amount of white space. Maybe consider a separate ensemble-mean map, which could be larger? Provide the units of the anomaly
Fig. 3. I find this very hard to interpret. Why not remove the indivuadal data points and plot as zonal means.
L348. Presumably you have been undertaking this decomposition at individual grid points, and comparing to the uncertainty at in each proxy reconstruction. However, I can’t see how this results in the Fig. 4. The bars look like there are guaranteed to add to 100%, but is there a bit of non-linearity in the equation of state. How have you accounted for this?
Fig. 4: Can you provide an estimate of the uncertainty in your decomposition?
L380. Remove repeated word: reconstruct
Fig. 5. Shouldn’t these regression lines go through the origin, by definition of being anomalies? What is the implication of the offsets?
Fig. 5. Previously you stated the global reconstruction mean of the density anomaly was 1.5 (L310), but here it is shown as 0.8. Please clarify!
L417. I believe that the proxy reconstruction IQRs are the range of the values average across the globe (or subsequently a region). But surely there is a calibration error attached to the reconstruction. How are you treating this uncertainty, and how does it alter the IQR?
Fig. 8. I do not understand what this figure is assessing, and how to interpret that from the analysis. Is it trying to measure the ability of the models to capture the spatial pattern of surface density at both the PI and LGM? If so, why don’t you present some maps. Or even better move to using a Taylor Diagram, to summarise the observational comparison?
L531. You perform a basin-by-basin analysis. But your previous section concluded the tropics and extratropics behaved quite differently. However, you chosen analysis folds these two regions into the same basin. Please justify your choice.
L542. Please comment on how this finding compares with your earlier findings.
L546. Please provide more detail. The model’s IQR are very small in the figures.
Fig. 10. I cannot see the IQR in the Southern Ocean panel, but the reason for this is never explained.
L566-7. I cannot tell from the figure which are the ‘several regions’ you are referring to.
Section 4. Personally, I would remove this section (as it replicates Fig. 8 which I didn’t understand).
L708-710. This sentence reads awkwardly. I suggest removing it.
Fig. 12. Can you please also add something to give readers an idea of how low-frequency variability in the IOD might compare to these changes in gradient.
L784. This paper also highlight that HadCM3 was the most realistic, if I recall correctly

Citation: https://doi.org/10.5194/egusphere-2026-254-RC1
- AC1: 'Reply on RC1', Héloïse Barathieu, 27 Apr 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-254/egusphere-2026-254-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2026-254-AC1
RC2:
'Comment on egusphere-2026-254', Anonymous Referee #2, 17 Mar 2026

The manuscript “Using ocean surface paleo-density to evaluate PMIP3 1 and PMIP4 Last Glacial Maximum climate simulations” provides a novel data-model comparison for the LGM using a new surface density reconstruction. Surface density is the integration of both sea surface temperature and sea surface salinity, which allows for a more robust evaluation of model performance that indirectly includes the model representation of changes in the hydrological cycle (via changes in coastal salinity). The authors include an analysis of SST versus data for additional context, where the MARGO database and Tierney et al. (2020) data sets were used as a benchmark. Model performance of SSS was not directly benchmarked. While I feel the analysis provides valuable insight into the PMIP simulations, there is also a reliance on R² correlation and regression slopes, which are likely imperfect metrics. These may still be acceptable choices as finding the appropriate statistical methods to compare data and model under paleo settings itself is a non-trivial task. As it is used herein, correlation is a rough estimate that the spatial patterns of data LGM minus LH match those of the of the model simulations. The regression slopes between data anomalies and model anomalies loosely test if the magnitude of the model anomalies match the data, within the context of the spatial patterns, where the authors used a 0.8 to 1.2 (+/- 20%) bound. As an example, one could simply add +5C to the data globally, and the correlation would have an R² of 1 and a slope of 1. The authors may choose to address this near line 478. I found the distribution-based KS test and IRQ-overlap evaluations to be more satisfying, so I’m unsure why the authors didn’t apply those methods to the raw values (LGM: data vs model) in addition to the anomalies (LGM-LH model vs data). More rigorous metrics, such as RMSE, could have been used to evaluate model skill, but these metrics may be unkind to the models in raw terms.
Overall the text is well written and highlights valuable findings using a novel approach, but I am unsure on the overreliance on regression and correlation as the primary metrics of model skill without additional justification or consideration of other metrics, such as RMSE, Nash–Sutcliffe efficiency (NSE), or Kling–Gupta efficiency (KGE).

Minor comments:
Line ~236: note the global KS threshold for the sample size.
Figure 3, MPI-ESM-P label isn’t in the plot.
Figure 7, model median is in the legend, but I don’t think it is shown in the plot.
Line 445: Does figure 7 show all three criteria?
Figure 12: it is not clear to me why the gray uncertainty bands don’t align to the data marker (star + 1-sigma).
Figure E1: In the caption, note LH = Late Holocene
~Line 730: Why do the GCM SST anomalies vary so much between Tierney (Fig 12b) and MARGO (Fig 12c). Is it from the location of the sites or the differing counts?

Citation: https://doi.org/10.5194/egusphere-2026-254-RC2
- AC2: 'Reply on RC2', Héloïse Barathieu, 27 Apr 2026
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-254/egusphere-2026-254-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2026-254-AC2
EC1: 'Editor Comment on egusphere-2026-254', Christo Buizert, 18 Mar 2026

Dear authors,
Your manuscript has now been seen by two reviewers. As you can see from their comments, they are both generally supportive of publication. However they have some concerns that will need to be addressed before the work can be accepted.
At this stage, please write a clear and detailed response to their comments. I will likely be inviting you to submit a revised version of your manuscript in the future, so please feel free to write your responses in terms of proposed changes to the manuscript. It would be most helpful if you can be detailed and specific in your responses. For example, instead of writing something generic like "we will improve the discussion in the revised manuscript", please provide the proposed text you would actually add to the discussion.
Good luck with your responses, and feel free to reach out if you have questions.

All the best, Christo Buizert (CP editor)

Citation: https://doi.org/10.5194/egusphere-2026-254-EC1

Héloïse Barathieu, Thibaut Caley, Masa Kageyama, Didier Swingedouw, and Pascale Braconnot

Supplement

https://doi.org/10.5194/egusphere-2026-254-supplement

Héloïse Barathieu, Thibaut Caley, Masa Kageyama, Didier Swingedouw, and Pascale Braconnot

Viewed

Total article views: 1,681 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,047	504	130	1,681	223	98	134

HTML: 1,047
PDF: 504
XML: 130
Total: 1,681
Supplement: 223
BibTeX: 98
EndNote: 134

Views and downloads (calculated since 02 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	525	268	63	856
Mar 2026	419	156	51	626
Apr 2026	81	64	15	160
May 2026	22	16	1	39

Cumulative views and downloads (calculated since 02 Feb 2026)

Month	HTML	PDF	XML	Total
Feb 2026	525	268	63	856
Mar 2026	419	156	51	626
Apr 2026	81	64	15	160
May 2026	22	16	1	39

Viewed (geographical distribution)

Total article views: 1,662 (including HTML, PDF, and XML) Thereof 1,662 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 19 May 2026

Short summary

This study evaluates climate model simulations of the Last Glacial Maximum using ocean surface density reconstructions from foraminiferal shells δ¹⁸O. Models capture global patterns but regional climate changes are less well simulated, especially in the North Indian Ocean. Tropical differences between reconstructions and model simulations are mainly driven by changes in ocean salinity linked to precipitation.


Total:	0
HTML:	0
PDF:	0
XML:	0