Sensitivity of the tropical Atlantic to vertical mixing in two ocean models (ICON-O v2.6.6 and FESOM v2.5)

Bastin, Swantje; Koldunov, Aleksei; Schütte, Florian; Gutjahr, Oliver; Mrozowska, Marta Agnieszka; Fischer, Tim; Shevchenko, Radomyra; Kumar, Arjun; Koldunov, Nikolay; Haak, Helmuth; Brüggemann, Nils; Hummels, Rebecca; Specht, Mia Sophie; Jungclaus, Johann; Danilov, Sergey; Dengler, Marcus; Jochum, Markus

doi:https://doi.org/10.5194/egusphere-2024-2281

Preprints

https://doi.org/10.5194/egusphere-2024-2281

Preprints

24 Jul 2024

| 24 Jul 2024

Sensitivity of the tropical Atlantic to vertical mixing in two ocean models (ICON-O v2.6.6 and FESOM v2.5)

Swantje Bastin, Aleksei Koldunov, Florian Schütte, Oliver Gutjahr, Marta Agnieszka Mrozowska, Tim Fischer, Radomyra Shevchenko, Arjun Kumar, Nikolay Koldunov, Helmuth Haak, Nils Brüggemann, Rebecca Hummels, Mia Sophie Specht, Johann Jungclaus, Sergey Danilov, Marcus Dengler, and Markus Jochum

Abstract. Ocean General Circulation Models still have large upper-ocean biases e.g. in tropical sea surface temperature, possibly connected to the representation of vertical mixing. In earlier studies, the ocean vertical mixing parameterisation has usually been tuned for a specific site or only within a specific model. We present here a systematic comparison of the effects of changes in the vertical mixing scheme in two different global ocean models, ICON-O and FESOM, run at a horizontal resolution of 10 km in the tropical Atlantic. We test two commonly used vertical mixing schemes; the K-Profile Parameterisation (KPP) and the Turbulent Kinetic Energy (TKE) scheme. Additionally, we vary tuning parameters in both schemes, and test the addition of Langmuir turbulence in the TKE scheme. We show that the biases of mean sea surface temperature, subsurface temperature, subsurface currents and mixed layer depth differ more between the two models than between runs with different mixing scheme settings within each model. For ICON-O, there is a larger difference between TKE and KPP than for FESOM. In both models, varying the tuning parameters hardly affects the pattern and magnitude of the mean state biases. For the representation of smaller scale variability like the diurnal cycle or inertial waves, the choice of the mixing scheme can matter: the diurnally enhanced penetration of equatorial turbulence below the mixed layer is only simulated with TKE, not with KPP. However, tuning of the parameters within the mixing schemes does not lead to large improvements for these processes. We conclude that a substantial part of the upper ocean tropical Atlantic biases is not sensitive to details of the vertical mixing scheme.

Received: 19 Jul 2024 – Discussion started: 24 Jul 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 6359 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (6359 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

28 Feb 2025

Sensitivity of the tropical Atlantic to vertical mixing in two ocean models (ICON-O v2.6.6 and FESOM v2.5)

Geosci. Model Dev., 18, 1189–1220, https://doi.org/10.5194/gmd-18-1189-2025,https://doi.org/10.5194/gmd-18-1189-2025, 2025

Short summary

Interactive discussion

Status: closed

CC1: 'referee comment', Gilles Reverdin, 13 Aug 2024

The paper investigates the impact of turbulence parameterizations in ocean models, focusing on the equatorial Atlantic in 2015. Two ocean models that are also part of climate models are considered: ICON-O and FESOM. They both have intermediate high horizontal and vertical resolution (128 layers), but with very different horizontal grids and schemes. Both have a z* vertical grid (with SSH link, but not quite the same, if I correctly understood). However, they have rather comparable near-equatorial horizontal resolution. These are forced runs (ERA5 forcing terms, mostly) with use of bulk formula for the air-sea exchanges. Runs are typically done on two years: different turbulent schemes are tested (in particular In ICON-O) as well as the bulk formula formulation (to test the influence on the runs of the differences in flux formulation between the two). The test runs are two years long, the second year been considered, which should be enough for the near equatorial adjustment, but avoid the larger basin scale adjustments that will result from the different turbulence parameterizations. Notice also that most of the changes made should mostly modify the near surface mixing, and not so ‘directly’ the deeper one. The bulk formulas used both imply negative feedback towards the ERA5 atmosphere temperature in the tropical Atlantic. However, in regions where the model produce excess surface temperature (such as south-eastern equatorial Atlantic), either because of the other components of the heat budget (radiative…) or because of the turbulence scheme or model simulations, this would add a destabilizing term in the mixed layer, and thus moderate excess near surface stratification. In regions, where the models are too cold (probably too much upwelling or thermocline structure not well reproduced), this would contribute to some added stratifying term, and thus very reduced MLD. I am just stating that as there could be a link between SST biases and MLD bias structures related to the overall bulk formulation (which ever of the two is used). The main results are that MLD underestimation bias (and SST too low bias) is overall large (almost a factor of two in some areas and runs; most noticeable between 0 and 10°S in central and western Atlantic), and although sensitive to the turbulence parameterization, not to the point of changing main patterns (same for the SST bias structure). There is some dependency nonetheless on scheme which is explained, and systematic differences remaining between the two models. As pointed out by the authors, some of this bias might be due to overall thermocline structure and flow. There is then ad good discussion on high frequency variability in particular the diurnal warm layer (but also inertial waves). This is likely important, due to modulation and possible impact on momentum flux in the ocean (less so for heat and water flux, as it is more linear…). This made me wonder about the ‘in situ’ reference used for those. It is based on Argo data using (according to figure 1 caption) all Argo profiles in 2000 to 2022), which implies that a large part of the profiles were not starting above 7 m, and thus missing most of the DWL. There is also a question on when in the day the Argo profiles arrive at the sea surface, which is often not homogeneous across the tropical Atlantic or through the day. Are the authors sure that there is no systematic difference due to that the in-situ Argo data reference, and by how much (and could there be a spatial structure in it due to distribution of timing of Argo profiles through the day). At least, this is not consistent with what is presented from the model, which uses a surface reference for density (a density criterium is used). In some ways, it could have been preferable to use for the models a late-night value to compare with the Argo climatology (and for those always taking the reference near 7 m, or other fixed depth). Whereas the investigation of DWL and diurnal cycle require another analysis (as is done). For SST, it is HADISST which is used as a reference. It would be important to remind whether it is the daily average SST which is used for the comparison (or something else). There are also interesting results on near equatorial mixing and day time of maximum diffusivity, deep cycle turbulence, with suggestions of some of the TKE (or PKK) runs performing more satisfactorily than the others (for these investigations, other data sets are used, which seem appropriate for the investigation, as well as fr the off equatorial DWL) (well for deep-cycle turbulence it is less so, based on figure 13 and 14) Overall, would it be fair to say that somehow, we have two models with rather large systematic large-scale biases (as also seen in Figure 7) that would not change much in such a short term as 1 to 2 years of the tests, which mostly tackle the surface mixing (although some minimum values also impact the subsurface terms). Maybe that could be a reason with the differences in the overall results with what is found in Deppenmeier et al (2020) investigating ck dependency of bias in NEMO (larger ck leading to surface cooling and subsurface warming, and less SST bias). Alltogether it is a rather interesting study worth publishing. Minor comments. For TKE they use Pr=6.6Ri which I find large (I believe that it is 1 in NEMO, but have not checked). Why this choice? l. 372: ‘to a large extent on the surface velocity of the ocean’ (‘to a large extent’ may be a little too strong; more for energy than for heat/water). After, I understood the point made on the relative direction of wind and currents, and thus the difference between the equatorial and off-equatorial situations, but there to impact larger for energy/wind power than for heat/water (and it is not so clearly separated, according to a recent paper, Hans et al (2024)) The authors are likely aware of the Hans et al. paper (JGR Oceans in Press) that carefully evaluates from data the structure of the DWL and its diurnal jet along the equatorial Atlantic, and could complement what is discussed in the paper. Figure 2: I would write instead: “Annual mean mixed layer depth in 2015 (correct?) for the different simulations of FESOM and ICON-O relative to the Argo climatology (for 2000-2002?) presented on the left top panel. (or difference in annual MLD…, but not ‘between FESOM and ICON-O runs’) Captions of figure 7 and 8 incorrect. The panels show SST difference (except for the Argo one). Figure 7 caption: over which latitudinal band are the 2015 Argo profiles (the reference for the other panels) averaged. Does this averaging scale have an impact (or not) on the anomalies presented on the other panels for the different model runs. Altogether, some figure captions are not very detailed (and information has to be retrieved from the core of the paper to figure to what they correspond. Another example is figure 11, in which no special domain is specified for the model runs, nor where the observations were collected. Figure 10, I understand what is attempted, but I have a hard time looking at it, convincing myself on what is said in the paper. In this case, is it important to show all the panels. I can imagine many reasons that may not be that relevant for the overall conclusions, why the model runs don’t reproduce the special event found in the data. Figure 12: I understood afterwards the choice of days 120-138 of year 2014 (reading the paper). On the other hand, the dates are rather close to the beginning of the test simulations, and could be sensitive to it. The results are very different between the runs, for example the lower (and not daily?) modulation in I_KPP_01 to 03. I did not fully understand what is from it the lesson. Why is there this 5-day modulation in these three runs and not in the others; Would they have had some ‘instability’ waves, for example, that are not present in the other runs. And from those panels, how does one feel what is expected? (the last sentence of the caption is for that a bit vague, and not informative) Fig. 15: Is what is shown the results in ICON-O of using the alternative bulk formula (and with comparison to the observed MLD, as in Fig. 2). Or is it instead the difference of the two sets of runs which the caption would suggest. Figure 17: I assume that only the third panel from left with the alternate forcing bulk formulae in ICON-O. I would remove in the title ‘Effect of exchanging’… and be more specific on what are the runs presented…

Citation: https://doi.org/10.5194/egusphere-2024-2281-CC1
RC1: 'Comment on egusphere-2024-2281', Anonymous Referee #1, 03 Sep 2024

Summary
Using OMIP-type simulations with two OGCMs, ICON-O and FESOM, the authors investigate the influence of the vertical mixing parameterization on the simulation of the tropical Atlantic, with both the KPP and TKE schemes tested with several parameter settings. While the models do show sensitivity to the choice of mixing scheme and parameter setting, the choice of model has a larger impact. Furthermore, all simulations show relatively similar bias patterns, including a weaker than observed equatorial SST gradient, which suggests that the vertical mixing scheme may not be the major cause of these biases.
The manuscript is well written and presents a thorough analysis of the performance of the two models. One concern I have is that one year of simulation (the year 2015) may not be sufficient to reliably assess model performance. I presume the high-resolution simulations are too expensive to conduct long-term integrations, but in that case one may wonder if simulations with lower resolution but longer integration time would have been more suitable. Itemized comments follow below.

Major Comments
1) It is not clear whether one year is long enough to reliably assess model performance. Uncertainty in both the observations and the single-realization simulations could be comparable to the biases you are trying to examine. If long-term OMIP simulations at lower resolution are available for ICON-O and FESOM you could check how representative a single year is by comparing the bias of individual years with that of the long-term mean. Alternatively, you could also look at OMIP simulations from other modeling centers to get a rough idea.
2) In the introduction, you refer to previous studies suggesting that the biases in CGCMs and their corresponding OMIP simulations are similar. But the biases you show in the ATL3 region are actually rather atypical, with temperatures lower than observed and the minimum occurring one month early. In many CMIP6 models, on the other hand, the SSTs in the ATL3 are too warm and the minimum is reached too late.
The discrepancy could be due to the high model resolution, as stated by the authors, but this cannot be assessed without a corresponding CGCM simulation. Do the authors have such simulations available?
3) Related to comment 2: How large are the biases in ICON-O and FESOM compared to those seen in typical CMIP6 models? A quick look at the AWI-CM-1-1-MR piControl simulation in the CMIP6 archive, which uses FESOM, suggests that SSTs are about 2K too cold in the western equatorial Atlantic and 2K too warm in the east. Again, it would be instructive to compare the biases in the two OGCMs with their corresponding CGCM simulations.
4) Prigent and Farneti (2024) have recently examined the performance of OMIP simulations in the tropical Atlantic. It should be instructive to compare with their results. Are the biases you see in your high resolution simulation similar to the OMIP simulations?
5) SST biases in the equatorial Atlantic have a strong seasonality, with the most severe bias occurring in JJA. While the seasonality of this bias can be partly inferred from Figs. 5 and 6, it would be helpful to see a longitude-time section of the equatorial SST bias as well.
6) In their conclusions the authors state that vertical mixing can only explain a limited amount of the biases seen in CGCMs, and that biases in atmosphere-ocean interaction likely play an important role. An alternative hypothesis, however, is that AGCM biases are a major source (e.g., Richter and Xie 2008, Wahl et al. 2011; Richter et al. 2012; Voldoire et al. 2018). The results shown here may be consistent with this hypothesis. The authors should discuss this.
7) What do the authors conclude about the prospect of reducing tropical Atlantic biases?

Minor Comments
1) l. 31: The study by Song et al. (2015) could also be cited here.
2) l. 99: Please define EVP.
3) Figures 7 and 8: The depth of the thermocline should be indicated in all panels.
4) Figure 11: The values of the vertical axes should be -0.05 instead of -0.5.
5) Section 6: How can the bulk formula affect SST and the EUC, but not MLD?

References
Prigent, A. and R. Farneti, 2024: An assessment of equatorial Atlantic interannual variability in Ocean Model Intercomparison Project (OMIP) simulations. Ocean Sci., 20, 1067–1086, https://doi.org/10.5194/os-20-1067-2024.
Richter, I., S.-P. Xie, A. T. Wittenberg, and Y. Masumoto, 2012: Tropical Atlantic biases and their relation to surface wind stress and terrestrial precipitation. Climate Dyn., 38, 985–1001, doi:10.1007/s00382-011-1038-9.
Song, Z., S.-K. Lee, C. Wang, B. Kirtman, and F. Qiao, 2015: Contributions of the atmosphere–land and ocean–sea ice model components to the tropical Atlantic SST bias in CESM1. Ocean Modell., 96, 280–296, doi:10.1016/j.ocemod.2015.09.008.
Voldoire, A., E. Exarchou, E. Sanchez-Gomez, T. Demissie, A.-L. Deppenmeier, C. Frauen, K. Goubanova, W. Hazeleger, N. Keenlyside, S. Koseki, C. Prodhomme, J. Shonk, T. Toniazzo, A.-K. Traoré, 2019, Role of wind stress in driving SST biases in the Tropical Atlantic, Clim. Dyn., DOI:10.1007/s00382-019-04717-0, 53(5), 3481-3504.
Wahl, S., M. Latif, W. Park, and N. Keenlyside, 2011: On the tropical Atlantic SST warm bias in the Kiel climate model. Climate Dyn., 36, 891–906, doi:10.1007/s00382-009-0690-9.

Citation: https://doi.org/10.5194/egusphere-2024-2281-RC1
RC2:
'Comment on egusphere-2024-2281', Anonymous Referee #2, 12 Sep 2024
Summary:

The authors compare different mixing schemes in two global models to systematically address the effects of using different prescriptions of mixing on mean and seasonal variability of temperature and state variables in the tropical Atlantic. In short, while variability between models is more significant than variability between mixing schemes, the authors find that mixing schemes can affect the representation of smaller-scale phenomena; for instance only the TKE scheme reproduces diurnally-varying deep cycle turbulence. This provides more generalized insight compared to previous studies who focused on single locations with only one model. I think this work would be of interest to modelers and be a good fit in GMD.

Major comments:

The model resolution in the Atlantic is 10-13 km (and 50km outside the tropical Atlantic for FESOM). I agree that the resolutions in the region of interest are comparable and don’t suspect that this is a problem, but I think some discussion should be added on this. That is, whether there is sufficient resolution to resolve the spatial variability in this region – While mesoscale features like tropical instability waves are probably resolved, smaller-scale filaments, etc. might not be and it is unclear how this would influence mixing.

The current set of metrics used focuses primarily on comparing long-term mean values. While there is some comparison of seasonal variability (for SST), I think it might be important to state how well the models recreate the seasonal cycle, particularly because of strong variability associated with the equatorial current system. For example, section 4.3 discusses how well models represent the mean equatorial current, but does not include information on how well the seasonal variability is represented (even though we know from observations that it is significant). This might also provide insight into the physical mechanisms. (Or, based on the minimal influence on the seasonal cycle of SST, maybe it is similar for all runs. In any case, I think this should be discussed.)

I think there should be some numerical quantitative results included in the text. While quantitative results can be deduced from the figures, I think that is not easy for some readers to do. I pointed out a few places that I think quantitative results would improve the text in my line-by-line comments.

Line-by-line minor comments and suggestions:

L1 – I would reword to not state “e.g.” in the abstract. At minimum there should be a comma.

L27-33 – This section maybe can be shortened or removed… The main focus of the paper is mixing, not fixing the atmospheric or ocean parameters, right?

L48-56 – Move 1 paragraph earlier? – because this is a main focus of the manuscript

L62 – delete “for example”

L65 – “specific region” instead of “specific bias”?

L68-69 – Strange syntax. Maybe say something like “Previous studies typically only use a single model and thus it is unclear whether those results are universally applicable”?

L79 – I prefer “in the present study” to “this study”. It is less ambiguous.

L83 – remove “e.g.”

L99 – spell out what EVP is

L100 – what are the boundaries for the “equatorial Atlantic” with 13 km resolution rather than 50km? That would strengthen this argument.

L101 – move this to after you discuss the ICON-O resolution.

L115-116 – “we agreed on” – strange syntax, please reword

L122-125 – This should be elaborated on. I think the reason is because with coarser resolution, smaller-scale instances of shear instability are averaged out and thus mixing occurs at apparently higher Ri.

L129 – Eliminate “we want to”. You are making that comparison now, it is not a future research focus.

L131-144 – This is a very important section and provides motivation for your work. In fact, as I was reading through, I was wondering if there would be any difference if all parameters were the same in both models. Perhaps it would be a good idea to move this earlier to emphasize that point.

L206+ - I’m having a hard time seeing some of the patterns in the figures that you state in the text when scrolling back and forth between the text and figures. I think labeling individual figure panels and referencing them in the text might help make this section more readable.

Another question – for some of the runs, it is clear in Figure 2 that the bias in the equatorial region is significantly different from areas to the north and south. That may be outside the domain of study, but might be something to discuss since it is so obvious in the figures.

L213 – Add justification for averaging between 4N and 4S. I think this is just because of the cold tongue location.

L237-238 – “considerably stronger”. I agree with the result itself, but in this (and other places), I think it would be insightful to include quantitative results. For example, saying 50% stronger (or however much it is) would give some better insight into the differences and I think be a more useful result.

L247 – ITCZ

L245-269 (and earlier..) – Clarification question: Models typically have a warm subsurface bias in the Atlantic cold tongues. Figures 4-6 show that other than in the far eastern Atlantic, SSTs are much colder in all model runs than the observations. Would be helpful to add a sentence or two to clearly state that typical model biases are strongly depth dependent. Maybe this should go earlier in the text (line 64??)

L261 – Why? I think here and in other places, as a reader I would appreciate some speculation on what is causing some of the differences between the runs.

L276 – Remove “actually” – it sounds like you are expressing surprise, but I think we would expect similar results for both models

This is another place where I think numerical, quantitative results would be helpful (yes, they can be deduced from the figures, but I think it is better if they are in the text for readers to easily see).

L283 – Mention why you specifically discuss 23W – because that’s where the PIRATA mooring is

L289-300 – Important information, but maybe this should be in the introduction?

L314-315 – “between 0.1 and 0.2 seems best”- Agreed, but different c_k produce more realistic results between ICON-O and FESOM. Might be helpful to state the specific values that work best in the text for each model.

L350 – “the high stratification band is weaker” – Is this true? From Fig 10 it looks to me that it is thinner, but not necessarily weaker.

L362 – “several reasons” – If it is just those two then say that.

L370-372 – True, but the atmospheric variations have a larger impact on fluxes than the diurnal jet/ocean current because wind varies more. Maybe reword this as to not overstate the impact.

Fig 11 – I like the comparison between obs and the models. But I think the differences between model runs (e.g. discussed starting at L380) are very difficult to see other than in the top center plot. Perhaps a change in the color scale would make it easier to see these changes.

L381 – At different points in the text you use “parametrization”, “parameterization” and “parameterization”. All might be acceptable, but be consistent.

L390 – might be worth mentioning that DCT is diurnally-varying here at the start

L398 – Not sure I completely agree. There are periods where it looks like from Fig 12 that FESOM does not show DCT.

L400-402 – I think a more needs to be added to reconcile these points. The FESOM runs seem to best represent DCT, but the ICON runs (including KPP, which poorly represents DCT) resolve the downward propagation. Aren’t these related, so shouldn’t the same runs represent both well? In other words, doesn’t this imply that the model is reproducing DCT in the FESOM cases for the wrong reason/physics? You state later (and Fig 14 very clearly shows) that the actual K values are closer to the observations in the FESOM cases, so maybe that has something to do with it, but still I think more should be said here.

L407 – Again, it’s more complicated than that. I don’t think there’s a single run that accurately represents all elements of DCT. So I would avoid using “satisfying” or similar terms unless you can quantify what that means.

L425 – move this to line 422? Might be better to first say what is different as a result of different bulk formulae, since I think that is the more important point

On another note, I think it would be helpful to explain a bit more on what terms in the bulk formula might cause these differences, referring again to appendix A.

Another question- I understand what you’re doing is a sensitivity experiment, but isn’t it typically unreasonable to replace bulk formula with those from another model that was tuned to different formulae? Maybe it is worth reiterating that this is a sensitivity test and I_TKE_02_FBF is not meant to get “realistic” results. (Or, I may be wrong here. In that case please clarify.)

L442 – break up into multiple sentences

L477 – this sentence should be quantitative, especially considering the result is different from some past research

L489 – “biases are not sensitive”

General comments for the discussion and conclusion-
1) Overall I think the authors have done a nice job contextualizing their work with previous studies.
2) A key conclusion is that the differences between models >> the differences between vertical mixing schemes within models. It would be helpful to have some sort of quantitative measure of how much these differences are. This might make it easier to also compare to any previous studies, i.e., Did the previous studies really show a greater difference between mixing schemes? Or did they show quantitatively the same effect and just perceived it as greater because they only looked at 1 model/process/variable? I think that’s an important distinction to make.
3) I’d also like to see a bit of speculation on what physics are different between the models that cause the large differences. To be fair, this is touched on in a few places earlier, but I think a cohesive discussion would be useful.
Citation: https://doi.org/10.5194/egusphere-2024-2281-RC2
AC1: 'Comment on egusphere-2024-2281', Swantje Bastin, 16 Nov 2024

We would like to thank the two anonymous referees and Gilles Reverdin for their reviews of our manuscript and their helpful comments and suggestions. For our point-by-point replies to all three comments (CC1, RC1, and RC2) please see the attached pdf.

Citation: https://doi.org/10.5194/egusphere-2024-2281-AC1

Interactive discussion

Status: closed

CC1: 'referee comment', Gilles Reverdin, 13 Aug 2024

The paper investigates the impact of turbulence parameterizations in ocean models, focusing on the equatorial Atlantic in 2015. Two ocean models that are also part of climate models are considered: ICON-O and FESOM. They both have intermediate high horizontal and vertical resolution (128 layers), but with very different horizontal grids and schemes. Both have a z* vertical grid (with SSH link, but not quite the same, if I correctly understood). However, they have rather comparable near-equatorial horizontal resolution. These are forced runs (ERA5 forcing terms, mostly) with use of bulk formula for the air-sea exchanges. Runs are typically done on two years: different turbulent schemes are tested (in particular In ICON-O) as well as the bulk formula formulation (to test the influence on the runs of the differences in flux formulation between the two). The test runs are two years long, the second year been considered, which should be enough for the near equatorial adjustment, but avoid the larger basin scale adjustments that will result from the different turbulence parameterizations. Notice also that most of the changes made should mostly modify the near surface mixing, and not so ‘directly’ the deeper one. The bulk formulas used both imply negative feedback towards the ERA5 atmosphere temperature in the tropical Atlantic. However, in regions where the model produce excess surface temperature (such as south-eastern equatorial Atlantic), either because of the other components of the heat budget (radiative…) or because of the turbulence scheme or model simulations, this would add a destabilizing term in the mixed layer, and thus moderate excess near surface stratification. In regions, where the models are too cold (probably too much upwelling or thermocline structure not well reproduced), this would contribute to some added stratifying term, and thus very reduced MLD. I am just stating that as there could be a link between SST biases and MLD bias structures related to the overall bulk formulation (which ever of the two is used). The main results are that MLD underestimation bias (and SST too low bias) is overall large (almost a factor of two in some areas and runs; most noticeable between 0 and 10°S in central and western Atlantic), and although sensitive to the turbulence parameterization, not to the point of changing main patterns (same for the SST bias structure). There is some dependency nonetheless on scheme which is explained, and systematic differences remaining between the two models. As pointed out by the authors, some of this bias might be due to overall thermocline structure and flow. There is then ad good discussion on high frequency variability in particular the diurnal warm layer (but also inertial waves). This is likely important, due to modulation and possible impact on momentum flux in the ocean (less so for heat and water flux, as it is more linear…). This made me wonder about the ‘in situ’ reference used for those. It is based on Argo data using (according to figure 1 caption) all Argo profiles in 2000 to 2022), which implies that a large part of the profiles were not starting above 7 m, and thus missing most of the DWL. There is also a question on when in the day the Argo profiles arrive at the sea surface, which is often not homogeneous across the tropical Atlantic or through the day. Are the authors sure that there is no systematic difference due to that the in-situ Argo data reference, and by how much (and could there be a spatial structure in it due to distribution of timing of Argo profiles through the day). At least, this is not consistent with what is presented from the model, which uses a surface reference for density (a density criterium is used). In some ways, it could have been preferable to use for the models a late-night value to compare with the Argo climatology (and for those always taking the reference near 7 m, or other fixed depth). Whereas the investigation of DWL and diurnal cycle require another analysis (as is done). For SST, it is HADISST which is used as a reference. It would be important to remind whether it is the daily average SST which is used for the comparison (or something else). There are also interesting results on near equatorial mixing and day time of maximum diffusivity, deep cycle turbulence, with suggestions of some of the TKE (or PKK) runs performing more satisfactorily than the others (for these investigations, other data sets are used, which seem appropriate for the investigation, as well as fr the off equatorial DWL) (well for deep-cycle turbulence it is less so, based on figure 13 and 14) Overall, would it be fair to say that somehow, we have two models with rather large systematic large-scale biases (as also seen in Figure 7) that would not change much in such a short term as 1 to 2 years of the tests, which mostly tackle the surface mixing (although some minimum values also impact the subsurface terms). Maybe that could be a reason with the differences in the overall results with what is found in Deppenmeier et al (2020) investigating ck dependency of bias in NEMO (larger ck leading to surface cooling and subsurface warming, and less SST bias). Alltogether it is a rather interesting study worth publishing. Minor comments. For TKE they use Pr=6.6Ri which I find large (I believe that it is 1 in NEMO, but have not checked). Why this choice? l. 372: ‘to a large extent on the surface velocity of the ocean’ (‘to a large extent’ may be a little too strong; more for energy than for heat/water). After, I understood the point made on the relative direction of wind and currents, and thus the difference between the equatorial and off-equatorial situations, but there to impact larger for energy/wind power than for heat/water (and it is not so clearly separated, according to a recent paper, Hans et al (2024)) The authors are likely aware of the Hans et al. paper (JGR Oceans in Press) that carefully evaluates from data the structure of the DWL and its diurnal jet along the equatorial Atlantic, and could complement what is discussed in the paper. Figure 2: I would write instead: “Annual mean mixed layer depth in 2015 (correct?) for the different simulations of FESOM and ICON-O relative to the Argo climatology (for 2000-2002?) presented on the left top panel. (or difference in annual MLD…, but not ‘between FESOM and ICON-O runs’) Captions of figure 7 and 8 incorrect. The panels show SST difference (except for the Argo one). Figure 7 caption: over which latitudinal band are the 2015 Argo profiles (the reference for the other panels) averaged. Does this averaging scale have an impact (or not) on the anomalies presented on the other panels for the different model runs. Altogether, some figure captions are not very detailed (and information has to be retrieved from the core of the paper to figure to what they correspond. Another example is figure 11, in which no special domain is specified for the model runs, nor where the observations were collected. Figure 10, I understand what is attempted, but I have a hard time looking at it, convincing myself on what is said in the paper. In this case, is it important to show all the panels. I can imagine many reasons that may not be that relevant for the overall conclusions, why the model runs don’t reproduce the special event found in the data. Figure 12: I understood afterwards the choice of days 120-138 of year 2014 (reading the paper). On the other hand, the dates are rather close to the beginning of the test simulations, and could be sensitive to it. The results are very different between the runs, for example the lower (and not daily?) modulation in I_KPP_01 to 03. I did not fully understand what is from it the lesson. Why is there this 5-day modulation in these three runs and not in the others; Would they have had some ‘instability’ waves, for example, that are not present in the other runs. And from those panels, how does one feel what is expected? (the last sentence of the caption is for that a bit vague, and not informative) Fig. 15: Is what is shown the results in ICON-O of using the alternative bulk formula (and with comparison to the observed MLD, as in Fig. 2). Or is it instead the difference of the two sets of runs which the caption would suggest. Figure 17: I assume that only the third panel from left with the alternate forcing bulk formulae in ICON-O. I would remove in the title ‘Effect of exchanging’… and be more specific on what are the runs presented…

Citation: https://doi.org/10.5194/egusphere-2024-2281-CC1
RC1: 'Comment on egusphere-2024-2281', Anonymous Referee #1, 03 Sep 2024

Summary
Using OMIP-type simulations with two OGCMs, ICON-O and FESOM, the authors investigate the influence of the vertical mixing parameterization on the simulation of the tropical Atlantic, with both the KPP and TKE schemes tested with several parameter settings. While the models do show sensitivity to the choice of mixing scheme and parameter setting, the choice of model has a larger impact. Furthermore, all simulations show relatively similar bias patterns, including a weaker than observed equatorial SST gradient, which suggests that the vertical mixing scheme may not be the major cause of these biases.
The manuscript is well written and presents a thorough analysis of the performance of the two models. One concern I have is that one year of simulation (the year 2015) may not be sufficient to reliably assess model performance. I presume the high-resolution simulations are too expensive to conduct long-term integrations, but in that case one may wonder if simulations with lower resolution but longer integration time would have been more suitable. Itemized comments follow below.

Major Comments
1) It is not clear whether one year is long enough to reliably assess model performance. Uncertainty in both the observations and the single-realization simulations could be comparable to the biases you are trying to examine. If long-term OMIP simulations at lower resolution are available for ICON-O and FESOM you could check how representative a single year is by comparing the bias of individual years with that of the long-term mean. Alternatively, you could also look at OMIP simulations from other modeling centers to get a rough idea.
2) In the introduction, you refer to previous studies suggesting that the biases in CGCMs and their corresponding OMIP simulations are similar. But the biases you show in the ATL3 region are actually rather atypical, with temperatures lower than observed and the minimum occurring one month early. In many CMIP6 models, on the other hand, the SSTs in the ATL3 are too warm and the minimum is reached too late.
The discrepancy could be due to the high model resolution, as stated by the authors, but this cannot be assessed without a corresponding CGCM simulation. Do the authors have such simulations available?
3) Related to comment 2: How large are the biases in ICON-O and FESOM compared to those seen in typical CMIP6 models? A quick look at the AWI-CM-1-1-MR piControl simulation in the CMIP6 archive, which uses FESOM, suggests that SSTs are about 2K too cold in the western equatorial Atlantic and 2K too warm in the east. Again, it would be instructive to compare the biases in the two OGCMs with their corresponding CGCM simulations.
4) Prigent and Farneti (2024) have recently examined the performance of OMIP simulations in the tropical Atlantic. It should be instructive to compare with their results. Are the biases you see in your high resolution simulation similar to the OMIP simulations?
5) SST biases in the equatorial Atlantic have a strong seasonality, with the most severe bias occurring in JJA. While the seasonality of this bias can be partly inferred from Figs. 5 and 6, it would be helpful to see a longitude-time section of the equatorial SST bias as well.
6) In their conclusions the authors state that vertical mixing can only explain a limited amount of the biases seen in CGCMs, and that biases in atmosphere-ocean interaction likely play an important role. An alternative hypothesis, however, is that AGCM biases are a major source (e.g., Richter and Xie 2008, Wahl et al. 2011; Richter et al. 2012; Voldoire et al. 2018). The results shown here may be consistent with this hypothesis. The authors should discuss this.
7) What do the authors conclude about the prospect of reducing tropical Atlantic biases?

Minor Comments
1) l. 31: The study by Song et al. (2015) could also be cited here.
2) l. 99: Please define EVP.
3) Figures 7 and 8: The depth of the thermocline should be indicated in all panels.
4) Figure 11: The values of the vertical axes should be -0.05 instead of -0.5.
5) Section 6: How can the bulk formula affect SST and the EUC, but not MLD?

References
Prigent, A. and R. Farneti, 2024: An assessment of equatorial Atlantic interannual variability in Ocean Model Intercomparison Project (OMIP) simulations. Ocean Sci., 20, 1067–1086, https://doi.org/10.5194/os-20-1067-2024.
Richter, I., S.-P. Xie, A. T. Wittenberg, and Y. Masumoto, 2012: Tropical Atlantic biases and their relation to surface wind stress and terrestrial precipitation. Climate Dyn., 38, 985–1001, doi:10.1007/s00382-011-1038-9.
Song, Z., S.-K. Lee, C. Wang, B. Kirtman, and F. Qiao, 2015: Contributions of the atmosphere–land and ocean–sea ice model components to the tropical Atlantic SST bias in CESM1. Ocean Modell., 96, 280–296, doi:10.1016/j.ocemod.2015.09.008.
Voldoire, A., E. Exarchou, E. Sanchez-Gomez, T. Demissie, A.-L. Deppenmeier, C. Frauen, K. Goubanova, W. Hazeleger, N. Keenlyside, S. Koseki, C. Prodhomme, J. Shonk, T. Toniazzo, A.-K. Traoré, 2019, Role of wind stress in driving SST biases in the Tropical Atlantic, Clim. Dyn., DOI:10.1007/s00382-019-04717-0, 53(5), 3481-3504.
Wahl, S., M. Latif, W. Park, and N. Keenlyside, 2011: On the tropical Atlantic SST warm bias in the Kiel climate model. Climate Dyn., 36, 891–906, doi:10.1007/s00382-009-0690-9.

Citation: https://doi.org/10.5194/egusphere-2024-2281-RC1
RC2:
'Comment on egusphere-2024-2281', Anonymous Referee #2, 12 Sep 2024
Summary:

The authors compare different mixing schemes in two global models to systematically address the effects of using different prescriptions of mixing on mean and seasonal variability of temperature and state variables in the tropical Atlantic. In short, while variability between models is more significant than variability between mixing schemes, the authors find that mixing schemes can affect the representation of smaller-scale phenomena; for instance only the TKE scheme reproduces diurnally-varying deep cycle turbulence. This provides more generalized insight compared to previous studies who focused on single locations with only one model. I think this work would be of interest to modelers and be a good fit in GMD.

Major comments:

The model resolution in the Atlantic is 10-13 km (and 50km outside the tropical Atlantic for FESOM). I agree that the resolutions in the region of interest are comparable and don’t suspect that this is a problem, but I think some discussion should be added on this. That is, whether there is sufficient resolution to resolve the spatial variability in this region – While mesoscale features like tropical instability waves are probably resolved, smaller-scale filaments, etc. might not be and it is unclear how this would influence mixing.

The current set of metrics used focuses primarily on comparing long-term mean values. While there is some comparison of seasonal variability (for SST), I think it might be important to state how well the models recreate the seasonal cycle, particularly because of strong variability associated with the equatorial current system. For example, section 4.3 discusses how well models represent the mean equatorial current, but does not include information on how well the seasonal variability is represented (even though we know from observations that it is significant). This might also provide insight into the physical mechanisms. (Or, based on the minimal influence on the seasonal cycle of SST, maybe it is similar for all runs. In any case, I think this should be discussed.)

I think there should be some numerical quantitative results included in the text. While quantitative results can be deduced from the figures, I think that is not easy for some readers to do. I pointed out a few places that I think quantitative results would improve the text in my line-by-line comments.

Line-by-line minor comments and suggestions:

L1 – I would reword to not state “e.g.” in the abstract. At minimum there should be a comma.

L27-33 – This section maybe can be shortened or removed… The main focus of the paper is mixing, not fixing the atmospheric or ocean parameters, right?

L48-56 – Move 1 paragraph earlier? – because this is a main focus of the manuscript

L62 – delete “for example”

L65 – “specific region” instead of “specific bias”?

L68-69 – Strange syntax. Maybe say something like “Previous studies typically only use a single model and thus it is unclear whether those results are universally applicable”?

L79 – I prefer “in the present study” to “this study”. It is less ambiguous.

L83 – remove “e.g.”

L99 – spell out what EVP is

L100 – what are the boundaries for the “equatorial Atlantic” with 13 km resolution rather than 50km? That would strengthen this argument.

L101 – move this to after you discuss the ICON-O resolution.

L115-116 – “we agreed on” – strange syntax, please reword

L122-125 – This should be elaborated on. I think the reason is because with coarser resolution, smaller-scale instances of shear instability are averaged out and thus mixing occurs at apparently higher Ri.

L129 – Eliminate “we want to”. You are making that comparison now, it is not a future research focus.

L131-144 – This is a very important section and provides motivation for your work. In fact, as I was reading through, I was wondering if there would be any difference if all parameters were the same in both models. Perhaps it would be a good idea to move this earlier to emphasize that point.

L206+ - I’m having a hard time seeing some of the patterns in the figures that you state in the text when scrolling back and forth between the text and figures. I think labeling individual figure panels and referencing them in the text might help make this section more readable.

Another question – for some of the runs, it is clear in Figure 2 that the bias in the equatorial region is significantly different from areas to the north and south. That may be outside the domain of study, but might be something to discuss since it is so obvious in the figures.

L213 – Add justification for averaging between 4N and 4S. I think this is just because of the cold tongue location.

L237-238 – “considerably stronger”. I agree with the result itself, but in this (and other places), I think it would be insightful to include quantitative results. For example, saying 50% stronger (or however much it is) would give some better insight into the differences and I think be a more useful result.

L247 – ITCZ

L245-269 (and earlier..) – Clarification question: Models typically have a warm subsurface bias in the Atlantic cold tongues. Figures 4-6 show that other than in the far eastern Atlantic, SSTs are much colder in all model runs than the observations. Would be helpful to add a sentence or two to clearly state that typical model biases are strongly depth dependent. Maybe this should go earlier in the text (line 64??)

L261 – Why? I think here and in other places, as a reader I would appreciate some speculation on what is causing some of the differences between the runs.

L276 – Remove “actually” – it sounds like you are expressing surprise, but I think we would expect similar results for both models

This is another place where I think numerical, quantitative results would be helpful (yes, they can be deduced from the figures, but I think it is better if they are in the text for readers to easily see).

L283 – Mention why you specifically discuss 23W – because that’s where the PIRATA mooring is

L289-300 – Important information, but maybe this should be in the introduction?

L314-315 – “between 0.1 and 0.2 seems best”- Agreed, but different c_k produce more realistic results between ICON-O and FESOM. Might be helpful to state the specific values that work best in the text for each model.

L350 – “the high stratification band is weaker” – Is this true? From Fig 10 it looks to me that it is thinner, but not necessarily weaker.

L362 – “several reasons” – If it is just those two then say that.

L370-372 – True, but the atmospheric variations have a larger impact on fluxes than the diurnal jet/ocean current because wind varies more. Maybe reword this as to not overstate the impact.

Fig 11 – I like the comparison between obs and the models. But I think the differences between model runs (e.g. discussed starting at L380) are very difficult to see other than in the top center plot. Perhaps a change in the color scale would make it easier to see these changes.

L381 – At different points in the text you use “parametrization”, “parameterization” and “parameterization”. All might be acceptable, but be consistent.

L390 – might be worth mentioning that DCT is diurnally-varying here at the start

L398 – Not sure I completely agree. There are periods where it looks like from Fig 12 that FESOM does not show DCT.

L400-402 – I think a more needs to be added to reconcile these points. The FESOM runs seem to best represent DCT, but the ICON runs (including KPP, which poorly represents DCT) resolve the downward propagation. Aren’t these related, so shouldn’t the same runs represent both well? In other words, doesn’t this imply that the model is reproducing DCT in the FESOM cases for the wrong reason/physics? You state later (and Fig 14 very clearly shows) that the actual K values are closer to the observations in the FESOM cases, so maybe that has something to do with it, but still I think more should be said here.

L407 – Again, it’s more complicated than that. I don’t think there’s a single run that accurately represents all elements of DCT. So I would avoid using “satisfying” or similar terms unless you can quantify what that means.

L425 – move this to line 422? Might be better to first say what is different as a result of different bulk formulae, since I think that is the more important point

On another note, I think it would be helpful to explain a bit more on what terms in the bulk formula might cause these differences, referring again to appendix A.

Another question- I understand what you’re doing is a sensitivity experiment, but isn’t it typically unreasonable to replace bulk formula with those from another model that was tuned to different formulae? Maybe it is worth reiterating that this is a sensitivity test and I_TKE_02_FBF is not meant to get “realistic” results. (Or, I may be wrong here. In that case please clarify.)

L442 – break up into multiple sentences

L477 – this sentence should be quantitative, especially considering the result is different from some past research

L489 – “biases are not sensitive”

General comments for the discussion and conclusion-
1) Overall I think the authors have done a nice job contextualizing their work with previous studies.
2) A key conclusion is that the differences between models >> the differences between vertical mixing schemes within models. It would be helpful to have some sort of quantitative measure of how much these differences are. This might make it easier to also compare to any previous studies, i.e., Did the previous studies really show a greater difference between mixing schemes? Or did they show quantitatively the same effect and just perceived it as greater because they only looked at 1 model/process/variable? I think that’s an important distinction to make.
3) I’d also like to see a bit of speculation on what physics are different between the models that cause the large differences. To be fair, this is touched on in a few places earlier, but I think a cohesive discussion would be useful.
Citation: https://doi.org/10.5194/egusphere-2024-2281-RC2
AC1: 'Comment on egusphere-2024-2281', Swantje Bastin, 16 Nov 2024

We would like to thank the two anonymous referees and Gilles Reverdin for their reviews of our manuscript and their helpful comments and suggestions. For our point-by-point replies to all three comments (CC1, RC1, and RC2) please see the attached pdf.

Citation: https://doi.org/10.5194/egusphere-2024-2281-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Swantje Bastin on behalf of the Authors (13 Dec 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (16 Dec 2024) by Riccardo Farneti

RR by Anonymous Referee #2 (24 Dec 2024)

ED: Publish subject to technical corrections (01 Jan 2025) by Riccardo Farneti

AR by Swantje Bastin on behalf of the Authors (08 Jan 2025) Manuscript

Journal article(s) based on this preprint

28 Feb 2025

Sensitivity of the tropical Atlantic to vertical mixing in two ocean models (ICON-O v2.6.6 and FESOM v2.5)

Geosci. Model Dev., 18, 1189–1220, https://doi.org/10.5194/gmd-18-1189-2025,https://doi.org/10.5194/gmd-18-1189-2025, 2025

Short summary

Viewed

Total article views: 569 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
438	101	30	569	21	17

HTML: 438
PDF: 101
XML: 30
Total: 569
BibTeX: 21
EndNote: 17

Views and downloads (calculated since 24 Jul 2024)

Month	HTML	PDF	XML	Total
Jul 2024	87	21	7	115
Aug 2024	156	29	8	193
Sep 2024	59	9	6	74
Oct 2024	24	7	1	32
Nov 2024	44	7	5	56
Dec 2024	19	8	0	27
Jan 2025	22	9	1	32
Feb 2025	27	11	2	40

Cumulative views and downloads (calculated since 24 Jul 2024)

Month	HTML	PDF	XML	Total
Jul 2024	87	21	7	115
Aug 2024	156	29	8	193
Sep 2024	59	9	6	74
Oct 2024	24	7	1	32
Nov 2024	44	7	5	56
Dec 2024	19	8	0	27
Jan 2025	22	9	1	32
Feb 2025	27	11	2	40

Viewed (geographical distribution)

Total article views: 610 (including HTML, PDF, and XML) Thereof 610 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 28 Feb 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (6359 KB)
Metadata XML

Short summary

Vertical mixing is an important process e.g. for tropical sea surface temperature, but cannot be resolved by ocean models. Comparisons of mixing schemes and settings have usually been done with a single model, sometimes yielding conflicting results. We systematically compare two widely used schemes, TKE and KPP, with different parameter settings, in two different ocean models, and show that most effects from mixing scheme parameter changes are model dependent.

Sensitivity of the tropical Atlantic to vertical mixing in two ocean models (ICON-O v2.6.6 and FESOM v2.5)

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.


Total:	0
HTML:	0
PDF:	0
XML:	0