the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evaluating the performance of CMIP6 models in simulating Southern Ocean biogeochemistry
Abstract. The Southern Ocean plays a vital role in global biogeochemical cycles, yet the quality of its representation in Earth System Models (ESMs) remains unquantified. This study evaluates the performance of 14 Coupled Model Intercomparison Project Phase 6 (CMIP6) models in simulating key biogeochemical variables south of 30° S, including austral-summer surface chlorophyll, deep chlorophyll maxima (DCMs), nitrate, silicate, dissolved iron, and yearly particulate organic carbon (POC). Model output for the period 2000–2014 is compared to multiple observational datasets, such as Copernicus for chlorophyll and POC, the World Ocean Atlas (WOA) for nitrate and silicate, and GEOTRACES for dissolved iron. Model performance is assessed using statistical metrics including mean bias error (MBE), standardised standard deviation (SSD), root mean squared deviation (RMSD), and correlation coefficient (CC). The results reveal substantial inter-model variability, with individual models exhibiting strengths in simulating different variables. GFDL-ESM4 best reproduces chlorophyll and DCM patterns, IPSL-CM6A-LR performs well for nutrients, MIROC-ES2L for dissolved iron, and CMCC-ESM2 for POC. Based on composite rankings, the top-performing models are IPSL-CM6A-LR, GFDL-ESM4, CMCC-ESM2, UKESM1-0-LL, and CNRM-ESM2-1. This work underscores the importance of multi-model evaluation for identifying model strengths and guiding future improvements in biogeochemical (BGC) model development, particularly in the context of understanding and projecting Southern Ocean biogeochemistry under climate change.
- Preprint
(5944 KB) - Metadata XML
-
Supplement
(6046 KB) - BibTeX
- EndNote
Status: open (until 05 Sep 2025)
-
RC1: 'Comment on egusphere-2025-2633', Anonymous Referee #1, 02 Sep 2025
reply
Review of "Evaluating the performance of CMIP6 models in simulating
Southern Ocean biogeochemistry" by Ming Cheng et al.Scope of the manuscript, general comments and recommendation
------------------------------------------------------------The manuscript by Cheng et al. evaluates the performance of the
biogeochemical part of CMIP6 models in reproducing Southern Ocean
biogeochemical observations. As the Southern Ocean is one of the
regions where biogeochemical models diverge most strongly, this is an
important subject for a study, especially since biogeochemical models
have become quite a bit more complex on average in the transition from
CMIP5 to CMIP6 (Seferian et al, .The evaluation in the manuscript is performed using the typical tools
used in that type of study, namely looking at biases, correlation
etc. between model output and climatologies of observations, in the
end combining the different metrics into an overall ranking of the
models. The study is, however, untypical, in that it attempts to judge
the models not only against the 'classical' observations, for which
good climatologies are avaialable, namely the macronutrients and
chlorophyll, but also against observations of the micronutrient iron
estimated depths and chlorophyll levels of deep chlorophyll maxima,
where those are present, and finally the concentration of POC and even
separately the biomasses of zooplankton, detritus and bacteria. Other
'standard' observations, like satellite-based net primary production,
dissolved inorganic carbon and total alkalinity are not taken into
account.While I think that the attempt to include new variables into the
assessment of biogeochemical models is a progress, the manuscript does
not take into account the uncertain state of our knowledge in many of
the variables that the authors use. In my view the mauscript is too
uncritical of the observational database that they use to compare the
models against, and consequently too confident in the ability to judge
model outcomes.Here are my main criticisms concerning this point:
- Firstly, for their iron validation, the authors use the combination
of observed bottle data from Tagliabue et al. (2012). This data is
(unlike the attribution of this dataset to GEOTRACES, made in the
manuscript, which is simply wrong) mostly a compilation of
pre-GEOTRACES data of high quality. Since the publication of this
data set, a large number of additional data has become available
through the GEOTRACES intermediate data products, especially for the
Southern Ocean. Why has this data not been taken into account?- For the evaluation of the depth of the deep chlorophyll maximum and
chlorophyll concentration at the maximum, the authors have chosen
the product from Copernicus, which is based on the works of Sauzede
et al. (2016). The authors mention that this dataset estimates POC
and chlorophyll using a neural network method, but do not give any
further details. Here is therefore my summary of the method: The
data set estimates the vertical distribution of particle backscatter
(which can be used as a measure of POC) from the large data base of
ARGO vertical profiles of temperature and salinity, and co-located
surface satellite estimates of particle backscatter and chlorophyll
a from MODIS. Actually, contrary to the statement made in the
manuscript, the method presented in Sauzede et al (2016) only
describes the estimation of POC profiles, NOT of chlorophyll. For
the chlorophyll estimation one should probably cite the data manual
(https://documentation.marine.copernicus.eu/QUID/CMEMS-MOB-QUID-015-010.pdf).
While this data set is unique in that it for the first time allows a
look at the vertical distribution of biological activity in the
ocean, it is not 'observations' (which is how it is repeatedly
referenced to in the manuscript), but a fairly indirect
estimate. The limits of this data set and its possible errors are
not discussed at all in this manuscript, and neither are the error
estimates, which are present in the data themselves, taken into
account in the model assessment. Instead, the data set is
uncritically taken as 'truth'.- Why is the same data set also taken for the evaluation of surface
chlorophyll and POC? As the processing of the data in the copernicus
product involves chlorophyll and backscatter estimates from MODIS,
it would remove one possible source of error to directly use the
satellite data here. Actually at this point it should be discussed
that the standard algorithm used in satellite estimates of
chlorophyll has been questioned in the Southern Ocean by Johnson et
al. 2009 (which is cited in the manuscript); the algorithm proposed
in Johnson et al. 2009 gives on average higher values of chlorophyll
in the Southern Ocean than the standard algorithm used at that time
for SeaWIFS. I think this also hold for the GlobColor product used
in the copernicus data set, but I have to admit that this is getting
beyod my expertise. But I think it illustrates yet another source of
uncertainty in the 'observations' that should be discussed.- Just out of curiosity: Many model assessments also use
satellite-based estimates of net primary production. Is there a
specific reason why this was not done here?- And finally, the authors use ONE number of how POC is distributed
over phytoplankton, zooplankton, dead organic matter and bacteria
that has been estimated for the Southern Ocean to convert the
copernicus estimate of POC into one of phytoplankton, zooplanton,
detritus, and bacterial carbon biomass. In their tables 6 and 7 they
then judge whether models 'underestimate zooplankton' etc. But when
you actually read the paper by Yang et al. 2022, one immediately
realizes the limits of that comparison. Firstly, the paper does not
describe microzooplankton, but only the biomass of zooplankton that
can be caught in plankton nets. Secondly, the biomasses of the three
zooplankton groups studied in that paper (mesozooplankton, krill and
salps) has a large regional variability, as for example shown in
their figure 2. While the Yang paper indeed demonstrates that there is
an inverted trophic pyramid in the Southern Ocean, the actual
biomass numbers probably have a large uncertainty from sampling
bias. Taking the one biomass number for the whole Southern Ocean
obtained here then for conversion of a totally different POC
estimate into zooplankton biomass further leads to errors. To add to
that, the authors do not describe how they have combined the
estimates from the three different papers cited into one. In my view
it makes sense to investigate whether models obtain a similar
inverted trophic pyramid as described in Yang et al, but not to
write sentences like 'Most models describe integrated phytoplankton
carbon reasonably well with values comparable to observations' when
the observations are just indirect estimates of POC from copernicus,
multiplied by one Southern Ocean estimate of the phytoplankton
carbon:POC ratio, and then not taking possible erors into
account. The whole section starting line 412 to line 445 in my view
should be scrapped.Given these criticisms I don't think the paper can be published
without quite major revisions. To make it publishable, I think
the following needs to be done:- Extend the data set used for the comparison of modeled iron by the
data from the lates GEOTRACES intermediate data product and repeat
the comparison.- Redo the comparison of deep chlorophyll maximum frequency and
chlorophyll levels taking into account the uncertainty of the
copernicus data set.- use (at least in addition to the copernicus data set) the direct
satellite-based estimated of chlorophyl and POC from MODIS for the
surface comparison; possibly also discuss the issue of the
chlorophyll algorithm uncertainty raised by Johnson et al, 2009.- either remove the comparison with the different components of POC
completely or do it properly by accounting for the error marginsI think all these changes would probably be incompatible with the
strong focus of the paper on 'ranking' of the different models,
i.e. saying which one is 'the best', which comes second etc. Given the
uncertainty of the data sets used, which is completely neglected in
the present manuscript, I don't really think this can be done with any
confidence.As this will require more or less a complete rewrite of the manuscript
I limit my further specific comments to the most important ones.Specific comments
-----------------Line 135-136: '.. we use yearly data instead, as carbon export
predominantly occurs during summer months': I don't understand the
reasoning here. If carbon export predomnantly occurs in summer, does
not using annual POC values make the connection of export less
reliable?Formula 4: The formula for root-mean-square difference is given here
corectly; but in the Taylor diagnam one should use the RSMD after
correction for the mean model-data bias, otherwise the connection
between CC, SSD and RSMD that is used to construct the diagram does
not hold (Taylor 2001). Was this done here?line 153: 'the number of grid points..' Does that depend on the grid
resolution? Is that a problem?Table S1: Were the calculations of CC and other statistical quantities
for chlorophyll done using log-transformed data, as is done most of
the times?Comparison of surface nitrate and silicate: Given that the Southern
Ocean is an upwelling region, would it make sense to also check the
concentration of these nutrients in Circumpolar Depp water with data
when tryng to explain the model-data difference at the surface?When comparing dissolved iron with the Tagliabue et al. 2012 data set,
mean bias estimates are given. Does a mean make sense in such a sparse
data set? Should one perhaps at least also have a look at the median?In the iron comparison, repeatedly the 'limited availability of
observational data' is referred to, which is correct. But the data is
not that limited, given the GEOTRACES data that is ignored here.Model ranking: it is unclear to me how the different statistical
quantities to judge model-'data' agreement are converted into one
ranking. Is the lowest RSMD the criterium, the highest CC?Line 383: "DCMs are primarili driven by photoacclimation". No, not all
of them, see Cornec et al. 2021. The whole discussion of DCMs and the
factors driving them is a bit superficial.
References
----------Cornec, M., Claustre, H., Mignot, A., Guidi, L., Lacour, L., Poteau,
A., et al. (2021). Deep chlorophyll maxima in the global ocean:
Occurrences, drivers and characteristics. Global Biogeochemical
Cycles, 35, e2020GB006759. https://doi. org/10.1029/2020GB006759Citation: https://doi.org/10.5194/egusphere-2025-2633-RC1
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
293 | 104 | 13 | 410 | 23 | 10 | 8 |
- HTML: 293
- PDF: 104
- XML: 13
- Total: 410
- Supplement: 23
- BibTeX: 10
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1