Evaluating the performance of CMIP6 models in simulating Southern Ocean biogeochemistry

Cheng, Ming; Maher, Nicola; Ellwood, Michael J.

doi:https://doi.org/10.5194/egusphere-2025-2633

Preprints

https://doi.org/10.5194/egusphere-2025-2633

Preprints

25 Jul 2025

| 25 Jul 2025

Evaluating the performance of CMIP6 models in simulating Southern Ocean biogeochemistry

Ming Cheng, Nicola Maher, and Michael J. Ellwood

Abstract. The Southern Ocean plays a vital role in global biogeochemical cycles, yet the quality of its representation in Earth System Models (ESMs) remains unquantified. This study evaluates the performance of 14 Coupled Model Intercomparison Project Phase 6 (CMIP6) models in simulating key biogeochemical variables south of 30° S, including austral-summer surface chlorophyll, deep chlorophyll maxima (DCMs), nitrate, silicate, dissolved iron, and yearly particulate organic carbon (POC). Model output for the period 2000–2014 is compared to multiple observational datasets, such as Copernicus for chlorophyll and POC, the World Ocean Atlas (WOA) for nitrate and silicate, and GEOTRACES for dissolved iron. Model performance is assessed using statistical metrics including mean bias error (MBE), standardised standard deviation (SSD), root mean squared deviation (RMSD), and correlation coefficient (CC). The results reveal substantial inter-model variability, with individual models exhibiting strengths in simulating different variables. GFDL-ESM4 best reproduces chlorophyll and DCM patterns, IPSL-CM6A-LR performs well for nutrients, MIROC-ES2L for dissolved iron, and CMCC-ESM2 for POC. Based on composite rankings, the top-performing models are IPSL-CM6A-LR, GFDL-ESM4, CMCC-ESM2, UKESM1-0-LL, and CNRM-ESM2-1. This work underscores the importance of multi-model evaluation for identifying model strengths and guiding future improvements in biogeochemical (BGC) model development, particularly in the context of understanding and projecting Southern Ocean biogeochemistry under climate change.

Received: 04 Jun 2025 – Discussion started: 25 Jul 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 5944 KB)

Supplement (6046 KB)

Download & links

Ming Cheng, Nicola Maher, and Michael J. Ellwood

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-2633', Anonymous Referee #1, 02 Sep 2025

Review of "Evaluating the performance of CMIP6 models in simulating

Southern Ocean biogeochemistry" by Ming Cheng et al.
Scope of the manuscript, general comments and recommendation

------------------------------------------------------------
The manuscript by Cheng et al. evaluates the performance of the

biogeochemical part of CMIP6 models in reproducing Southern Ocean

biogeochemical observations. As the Southern Ocean is one of the

regions where biogeochemical models diverge most strongly, this is an

important subject for a study, especially since biogeochemical models

have become quite a bit more complex on average in the transition from

CMIP5 to CMIP6 (Seferian et al, .
The evaluation in the manuscript is performed using the typical tools

used in that type of study, namely looking at biases, correlation

etc. between model output and climatologies of observations, in the

end combining the different metrics into an overall ranking of the

models. The study is, however, untypical, in that it attempts to judge

the models not only against the 'classical' observations, for which

good climatologies are avaialable, namely the macronutrients and

chlorophyll, but also against observations of the micronutrient iron

estimated depths and chlorophyll levels of deep chlorophyll maxima,

where those are present, and finally the concentration of POC and even

separately the biomasses of zooplankton, detritus and bacteria. Other

'standard' observations, like satellite-based net primary production,

dissolved inorganic carbon and total alkalinity are not taken into

account.
While I think that the attempt to include new variables into the

assessment of biogeochemical models is a progress, the manuscript does

not take into account the uncertain state of our knowledge in many of

the variables that the authors use. In my view the mauscript is too

uncritical of the observational database that they use to compare the

models against, and consequently too confident in the ability to judge

model outcomes.
Here are my main criticisms concerning this point:
- Firstly, for their iron validation, the authors use the combination

of observed bottle data from Tagliabue et al. (2012). This data is

(unlike the attribution of this dataset to GEOTRACES, made in the

manuscript, which is simply wrong) mostly a compilation of

pre-GEOTRACES data of high quality. Since the publication of this

data set, a large number of additional data has become available

through the GEOTRACES intermediate data products, especially for the

Southern Ocean. Why has this data not been taken into account?
- For the evaluation of the depth of the deep chlorophyll maximum and

chlorophyll concentration at the maximum, the authors have chosen

the product from Copernicus, which is based on the works of Sauzede

et al. (2016). The authors mention that this dataset estimates POC

and chlorophyll using a neural network method, but do not give any

further details. Here is therefore my summary of the method: The

data set estimates the vertical distribution of particle backscatter

(which can be used as a measure of POC) from the large data base of

ARGO vertical profiles of temperature and salinity, and co-located

surface satellite estimates of particle backscatter and chlorophyll

a from MODIS. Actually, contrary to the statement made in the

manuscript, the method presented in Sauzede et al (2016) only

describes the estimation of POC profiles, NOT of chlorophyll. For

the chlorophyll estimation one should probably cite the data manual

(https://documentation.marine.copernicus.eu/QUID/CMEMS-MOB-QUID-015-010.pdf).

While this data set is unique in that it for the first time allows a

look at the vertical distribution of biological activity in the

ocean, it is not 'observations' (which is how it is repeatedly

referenced to in the manuscript), but a fairly indirect

estimate. The limits of this data set and its possible errors are

not discussed at all in this manuscript, and neither are the error

estimates, which are present in the data themselves, taken into

account in the model assessment. Instead, the data set is

uncritically taken as 'truth'.
- Why is the same data set also taken for the evaluation of surface

chlorophyll and POC? As the processing of the data in the copernicus

product involves chlorophyll and backscatter estimates from MODIS,

it would remove one possible source of error to directly use the

satellite data here. Actually at this point it should be discussed

that the standard algorithm used in satellite estimates of

chlorophyll has been questioned in the Southern Ocean by Johnson et

al. 2009 (which is cited in the manuscript); the algorithm proposed

in Johnson et al. 2009 gives on average higher values of chlorophyll

in the Southern Ocean than the standard algorithm used at that time

for SeaWIFS. I think this also hold for the GlobColor product used

in the copernicus data set, but I have to admit that this is getting

beyod my expertise. But I think it illustrates yet another source of

uncertainty in the 'observations' that should be discussed.
- Just out of curiosity: Many model assessments also use

satellite-based estimates of net primary production. Is there a

specific reason why this was not done here?
- And finally, the authors use ONE number of how POC is distributed

over phytoplankton, zooplankton, dead organic matter and bacteria

that has been estimated for the Southern Ocean to convert the

copernicus estimate of POC into one of phytoplankton, zooplanton,

detritus, and bacterial carbon biomass. In their tables 6 and 7 they

then judge whether models 'underestimate zooplankton' etc. But when

you actually read the paper by Yang et al. 2022, one immediately

realizes the limits of that comparison. Firstly, the paper does not

describe microzooplankton, but only the biomass of zooplankton that

can be caught in plankton nets. Secondly, the biomasses of the three

zooplankton groups studied in that paper (mesozooplankton, krill and

salps) has a large regional variability, as for example shown in

their figure 2. While the Yang paper indeed demonstrates that there is

an inverted trophic pyramid in the Southern Ocean, the actual

biomass numbers probably have a large uncertainty from sampling

bias. Taking the one biomass number for the whole Southern Ocean

obtained here then for conversion of a totally different POC

estimate into zooplankton biomass further leads to errors. To add to

that, the authors do not describe how they have combined the

estimates from the three different papers cited into one. In my view

it makes sense to investigate whether models obtain a similar

inverted trophic pyramid as described in Yang et al, but not to

write sentences like 'Most models describe integrated phytoplankton

carbon reasonably well with values comparable to observations' when

the observations are just indirect estimates of POC from copernicus,

multiplied by one Southern Ocean estimate of the phytoplankton

carbon:POC ratio, and then not taking possible erors into

account. The whole section starting line 412 to line 445 in my view

should be scrapped.
Given these criticisms I don't think the paper can be published

without quite major revisions. To make it publishable, I think

the following needs to be done:
- Extend the data set used for the comparison of modeled iron by the

data from the lates GEOTRACES intermediate data product and repeat

the comparison.
- Redo the comparison of deep chlorophyll maximum frequency and

chlorophyll levels taking into account the uncertainty of the

copernicus data set.
- use (at least in addition to the copernicus data set) the direct

satellite-based estimated of chlorophyl and POC from MODIS for the

surface comparison; possibly also discuss the issue of the

chlorophyll algorithm uncertainty raised by Johnson et al, 2009.
- either remove the comparison with the different components of POC

completely or do it properly by accounting for the error margins
I think all these changes would probably be incompatible with the

strong focus of the paper on 'ranking' of the different models,

i.e. saying which one is 'the best', which comes second etc. Given the

uncertainty of the data sets used, which is completely neglected in

the present manuscript, I don't really think this can be done with any

confidence.
As this will require more or less a complete rewrite of the manuscript

I limit my further specific comments to the most important ones.
Specific comments

-----------------
Line 135-136: '.. we use yearly data instead, as carbon export

predominantly occurs during summer months': I don't understand the

reasoning here. If carbon export predomnantly occurs in summer, does

not using annual POC values make the connection of export less

reliable?
Formula 4: The formula for root-mean-square difference is given here

corectly; but in the Taylor diagnam one should use the RSMD after

correction for the mean model-data bias, otherwise the connection

between CC, SSD and RSMD that is used to construct the diagram does

not hold (Taylor 2001). Was this done here?
line 153: 'the number of grid points..' Does that depend on the grid

resolution? Is that a problem?
Table S1: Were the calculations of CC and other statistical quantities

for chlorophyll done using log-transformed data, as is done most of

the times?
Comparison of surface nitrate and silicate: Given that the Southern

Ocean is an upwelling region, would it make sense to also check the

concentration of these nutrients in Circumpolar Depp water with data

when tryng to explain the model-data difference at the surface?
When comparing dissolved iron with the Tagliabue et al. 2012 data set,

mean bias estimates are given. Does a mean make sense in such a sparse

data set? Should one perhaps at least also have a look at the median?
In the iron comparison, repeatedly the 'limited availability of

observational data' is referred to, which is correct. But the data is

not that limited, given the GEOTRACES data that is ignored here.
Model ranking: it is unclear to me how the different statistical

quantities to judge model-'data' agreement are converted into one

ranking. Is the lowest RSMD the criterium, the highest CC?
Line 383: "DCMs are primarili driven by photoacclimation". No, not all

of them, see Cornec et al. 2021. The whole discussion of DCMs and the

factors driving them is a bit superficial.

References

----------
Cornec, M., Claustre, H., Mignot, A., Guidi, L., Lacour, L., Poteau,

A., et al. (2021). Deep chlorophyll maxima in the global ocean:

Occurrences, drivers and characteristics. Global Biogeochemical

Cycles, 35, e2020GB006759. https://doi. org/10.1029/2020GB006759

Citation: https://doi.org/10.5194/egusphere-2025-2633-RC1
- AC1: 'Reply on RC1', Ming Cheng, 25 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-2633/egusphere-2025-2633-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-2633-AC1
RC2:
'Comment on egusphere-2025-2633', Anonymous Referee #2, 12 Sep 2025

The manuscript “Evaluating the performance of CMIP6 models in simulating Southern Ocean biogeochemistry” analyzes coupled carbon-climate Earth system model fidelity for surface chlorophyll, nitrate, silicate, and iron, the deep chlorophyll maximum, and particular organic carbon across subregions of the Southern Ocean to rank the models which is a highly valuable analysis given the historical challenges in both observations collection and model fidelity and importance of the Southern Ocean for heat and carbon uptake. The biggest weakness of the current manuscript is the assumption that inter-model differences and biases should be attributed to biogeochemical formulation rather than the underlying physics, including representation of temperature, mixed layer depth, upwelling, and upper ocean stratification, transport, and turbulence, all of which are long standing challenges in the climate community. While a detailed discussion of the potential of physical biases and their potential implications is outside the scope of the present manuscript, the possibility of physical attribution should be mentioned. Otherwise I have only minor comments.
Specific comments by line number:
9 - This assertion is highly overstated - see lines 67-82 which contradict this as well as such literature as:

Frölicher, T.L., Sarmiento, J.L., Paynter, D.J., Dunne, J.P., Krasting, J.P. and Winton, M., 2015. Dominance of the Southern Ocean in anthropogenic carbon and heat uptake in CMIP5 models. Journal of Climate, 28(2), pp.862-886.

Mongwe, N.P., Vichi, M. and Monteiro, P.M., 2018. The seasonal cycle of p CO 2 and CO 2 fluxes in the Southern Ocean: diagnosing anomalies in CMIP5 Earth system models. Biogeosciences, 15(9), pp.2851-2872.

Rickard, G.J., Behrens, E., Chiswell, S., Law, C.S. and Pinkerton, M.H., 2023. Biogeochemical and physical assessment of CMIP5 and CMIP6 ocean components for the southwest Pacific Ocean. Journal of Geophysical Research: Biogeosciences, 128(5), p.e2022JG007123.

Nevison, C.D., Manizza, M., Keeling, R.F., Stephens, B.B., Bent, J.D., Dunne, J., Ilyina, T., Long, M., Resplandy, L., Tjiputra, J. and Yukimoto, S., 2016. Evaluating CMIP5 ocean biogeochemistry and Southern Ocean carbon uptake using atmospheric potential oxygen: Present‐day performance and future projection. Geophysical Research Letters, 43(5), pp.2077-2085.
48 - which? High iron requirement?
60 - by "integration of". do the authors mean "assessment with"? It is not clear what "data" is integrated into these models to represent the Southern Ocean except for topography and radiative forcing.
99 - Why define the acronym when it is not used again until the acknowledgments and also defined there?
Eq 1-4 - These are all pretty common statistical definitions which could be removed for space.
167 – “MPI-ESM models” should be “MPI-ESMs”
225 - Should be "Fig. 1" to point to chlorophyll.
510-558 - The attribution here to biological complexity seems to assume that the Southern Ocean physics that drives the biogeochemistry is perfect in these models. This is not the case and is the subject of many papers. Much of the focus has been on wind and sea ice biases and upper ocean stratification (e.g. Beadling et al, 2020), temperature, (Luo et al., 2023 and polynya (Mohrmann et al., 2021):
Beadling, R. L., Russell, J. L., Stouffer, R. J., Mazloff, M., Talley, L. D., Goodman, P. J., ... & Pandde, A. (2020). Representation of Southern Ocean properties across coupled model intercomparison project generations: CMIP3 to CMIP6. Journal of Climate, 33(15), 6555-6581.
Luo, F., Ying, J., Liu, T., & Chen, D. (2023). Origins of Southern Ocean warm sea surface temperature bias in CMIP6 models. npj Climate and Atmospheric Science, 6(1), 127.
Mohrmann, M., Heuzé, C., & Swart, S. (2021). Southern Ocean polynyas in CMIP6 models. The Cryosphere, 15(9), 4281-4313.

Citation: https://doi.org/10.5194/egusphere-2025-2633-RC2
- AC2: 'Reply on RC2', Ming Cheng, 25 Sep 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-2633/egusphere-2025-2633-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-2633-AC2

Ming Cheng, Nicola Maher, and Michael J. Ellwood

Supplement

https://doi.org/10.5194/egusphere-2025-2633-supplement

Ming Cheng, Nicola Maher, and Michael J. Ellwood

Viewed

Total article views: 924 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
715	182	27	924	36	20	18

HTML: 715
PDF: 182
XML: 27
Total: 924
Supplement: 36
BibTeX: 20
EndNote: 18

Views and downloads (calculated since 25 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	74	45	3	122
Aug 2025	176	50	8	234
Sep 2025	394	51	13	458
Oct 2025	62	30	3	95
Nov 2025	9	6	0	15

Cumulative views and downloads (calculated since 25 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	74	45	3	122
Aug 2025	176	50	8	234
Sep 2025	394	51	13	458
Oct 2025	62	30	3	95
Nov 2025	9	6	0	15

Viewed (geographical distribution)

Total article views: 898 (including HTML, PDF, and XML) Thereof 898 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 05 Nov 2025

Short summary

The Southern Ocean helps regulate Earth’s climate by cycling nutrients and carbon. We studied how well 14 modern climate models represent key ocean properties, such as plant growth, nutrients, and carbon particles. By comparing model results with real-world observations, we found large differences in model performance. Some models captured certain features better than others. Our findings can guide future improvements in ocean and climate predictions.


Total:	0
HTML:	0
PDF:	0
XML:	0