the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A multi-model analysis of the decadal prediction skill for the North Atlantic ocean heat content
Abstract. Decadal predictions can skillfully forecast the upper ocean temperature in many regions of world. The North Atlantic in particular shows promising results when it comes to high predictive skill of Ocean Heat Content (OHC). Nevertheless, important regional differences exist across Decadal Prediction Systems, which are explored in this multi-model analysis. Differences are also found in their respective uninitialized historical ensembles, which points to large uncertainties in the externally forced signals. We analyze eight CMIP6 climate models with comparable ensembles of decadal predictions and historical simulations to document their differences in upper OHC skill, and to investigate if intrinsic model characteristics, such as key mean state biases in the local forcing from the atmosphere or the local stratification, can influence the relative predictive role of external forcings and internal variability. Particular attention has been given to the Labrador Sea and its surroundings, since this is found to be a region where upper OHC has low observational uncertainties, yet high inter-model spread in the upper OHC prediction skill of decadal predictions and historical experiments. Benchmarking mean state properties of the local surface fluxes and stratification against observations, both strongly linked with the simulated upper OHC skill for the historical ensembles, suggests that their multi-model mean provides the most realistic estimate of the true forced signal
- Preprint
(11338 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-1569', Iuliia Polkova, 09 Aug 2024
Review of the manuscript “A multi-model analysis of the decadal prediction skill for the North Atlantic ocean heat content” by Carmo-Costa et al
The study described the multi-model analysis of the decadal prediction skill for the upper ocean heat content. Authors analyze multi-model spread in prediction skill of the upper 700 m ocean heat content, further choosing the Labrador Sea and its surrounding areas, where they observe the largest model uncertainties among other regions in the North Atlantic. They further investigate the processes to clarify sources of these uncertainties and possibly improve predictions. Overall, the study is a worthy research contribution, however the manuscript needs to be further improved before publication as it currently lacks important details on the methods and description of some results; especially the NAO part is confusing. Also, some figures need improvement, so that one can follow better the conclusions that are based on them. I thus suggest major revision. Detailed comments are below.
Abstract:
- L5-10: In the abstract, authors describe what their study intends to investigate “We analyze eight CMIP6 … to investigate if intrinsic model characteristics … can influence …”. Instead of stating the aim, I suggest to summarize results of the study and whether authors’ intents were achieved by the current analysis.
- L10-12: Last sentence only describes the historical ensembles. Basically, this is the only sentence about the results in the abstract part and it has nothing to do with decadal prediction experiments, which should be the main focus of the study according to the title. Given that the title of the paper focuses on decadal prediction skill, does this statement in the last sentence of the abstract also hold for prediction ensembles?
Introduction:
- L15: The exact “decades” and the exact level that has a trend need to be specified. Moreover, first two papers cited are not about warming trend in observational datasets but about reanalyses, which are not the same thing and thus need to be named correctly. Same in the second sentence (L17-19) – the papers cited are about reanalyses and not about observational datasets. Reanalyses are as much based on the observations as on the models; if they did not introduce their underlying model and assimilation method effects on the final products, we would have more coherence between different reanalyses. Thus, it is not correct to equalize them with the observational datasets.
- L20-23: Previous paragraph described a warming trend in the North Atlantic Ocean Heat Content (NA OHC); this paragraph begins with describing cooling trend but does not specify that the cited papers speak about sea surface temperature (SST) trend in the subpolar North Atlantic (SPNA). The fact that the focus now changed from OHC to SST, needs to be specified or one has to cite the appropriate papers, which describe cooling trend in the SPNA OHC.
- L28: Specify “recent OHC variability” period. Imagine this paper will be read in 20 years from now. Will this current period description still be “recent”?
- L36: Link to Duchez et al leads to “page not found” and if it is from a newsletter, it might not be an appropriate citation. Links to cited papers need to be double-checked. I came across at least 4 of them (start with https://doi.org/https://doi.org...), which are leading to “page not found”.
- L58-61: This statement is not accurate. There are multiple studies from the decadal prediction community, which are cited few lines below and which show that the NA SST and the NA SPG are predictable up to decadal timescale and maybe even longer. The NA region is the one that is the most distinctive in its predictability due to initialization among all other regions on the globe. The authors basically cancel this knowledge from previous studies in this paragraph without providing any evidence. To avoid this confusion, one needs to specify the exact regions that are described in the Langehaug et al paper, where prediction systems indeed still have a lot of troubles, namely the Norwegian Sea, the Inflow region and eastern SPG and not generalize the results of this study to the whole North Atlantic basin. By the way, the link provided to their paper is not working.
- L64: In this context, the link to Polkova et al study should be a different one, namely: Polkova, I., Brune, S., Kadow, C., Romanova, V., Gollan, G., Baehr, J., et al. (2019). Initialization and ensemble generation for decadal climate predictions: A comparison of different methods. Journal of Advances in Modeling Earth Systems, 11, 149–172. https://doi.org/10.1029/2018MS001439 This is where Polkova et al 2019 analyzed the NA SST and NA OHC skill for an “individual decadal prediction system” (as stated in L66). The paper that is cited does not fit here because in it, Polkova et al 2023 analyze the NA SPG skill from the WMO DCP ensemble of twelve CMIP models and would be more appropriate to be mentioned in L77.
- L88: Do authors have any evidence to state that anomaly initialization “became more popular”? E.g., in the WMO operational decadal prediction set, which relies on the CMIP6-based models, only 4 models (out of 12) use anomaly initializations. Anomaly initialization is an alternative and work-around method, it has been tested by some research groups, but there is no evidence that the method is getting more popular. One could compare and cite a change from CMIP5 to CMIP6 decadal predictions to support the statement that more systems became anomaly initialized. From my experience, I do not observe that they “become more popular”.
- L94-101: Are those questions actually answered? Q1: has already been answered in the Introduction. Paragraph in L101-111 suddenly reduces the scope of the study from the NA skill to that in the Labrador Sea. Authors need to introduce why they focus on the Labrador Sea all of a sudden. Q2: What method is used to answer that? Q3: Name local drivers and preconditions that will be considered in the analysis. Could question part in L94-101 be combined with the content part in L101-111? Otherwise, they duplicate each other.
- L110: Discussion “in light of previous studies” turned out to be very thin. L380-430: 6 summary points, 4 of them mention briefly other studies without much discussion. Moreover, the 4th section (L378) is not named as Discussion anymore.
- Overall the introduction part is lengthy and dissipated. Until L90 I do not know, where it is heading. Some of the description can be squeezed substantially, for instance, it is not necessary to introduce in great detail the difference between predictions and historical simulations, it has been done in many previous papers. Paragraph L57-66 states there is no skill in the NA, while the paragraphs in L68-83 try to prove the opposite. Authors need to specify earlier in the introduction, where they are going with all this, otherwise it is not clear what of this very extensive introduction about many topics is relevant for the current study. E.g., the paragraph L41-43 could be such a place. The “main aim” (L91-92) could come earlier.
Methods:
- L128: Please rephrase: Not clear what exactly the exception is with EC-Earth3. 10 members were requested and 10 members were provided; where is exception?
- L130: “Two models contributed with fewer than 10 members to the experiments:” Contradicts the previous sentence in L124-125. If there are two exceptions, following statement does not hold: “A total of 8 AOGCMs fulfilled all the selection criteria.”
- L132: Specify difference in resolution in the text.
- L136: I suggest renaming the subsection to “Verification datasets” as “three ocean reanalysis” (L138) are not “observational references” but data assimilation products - a blend of model results and observations. And adjust in the text accordingly, e.g., L150, 205, 321, etc. Also, not clear if EN4 data are actually the EN4 analysis dataset based on Optimal Interpolation. This is also not clear further on in the analysis: L137: What EN4 data are actually used, original profiles or profiles interpolated to the gridded dataset? Thus, this needs to be specified.
- L180: Is there any reason for choosing single lead years 2, 5 and 10 and not multi-year averages, which were proposed and used in previous decadal prediction studies to reduce the noise in the calculation of the prediction skill?
- Why Table 1 does not contain information about the atmospheric model and atmosphere initialization as it is probably relevant for the NAO part of the study?
Results:
- L190: “This initial skill loss might …” Some disconnect with the previous sentence, which speaks about increased skill. MPI-ESM model shows also increase of skill in the LS. Overall, this improvement seems to look very minor. To provide a quantitative estimate of improvement/loss of skill, the authors could provide percentage of grid-cells in the region that shows higher/lower skill?
- L191: From the figure, it does not look like “historical ensembles show comparatively higher ACC values than the DCPP at FY2“. Vice versa historical simulations have a larger area of „no skill“. Some models have higher correlation in the eastern NA in HIST, but again only some. To support the statement in L191, the map of the skill score or correlation difference with the significance level should be provided – that is a standard plot in similar intercomparison studies.
- L192: Suggest to change “it seems to reflect” to “it might reflect … the issue that has already been reported by …”.
- Figure 2: the color palette is a bit unfortunate: The ranges 0.1-0.3 and 0.3-0.5 are hardly distinguishable. The range 0.6-1 is not in use.
- Figure 3: is not very informative. The arguments in paragraph (L210-215) about “cooling trends” and any “multi-annual modulations” are difficult to recognize and follow from this figure (L210-219). L210-211: Is this a trend or multi-annual variability? L217: Is the “evolution flat” or the figure is flat? Maybe a different position of the panels, which does not stretch the timeseries, could help to better transfer the message of the authors. Also, one could normalize timeseries and use the original figure in the supplementary.
- L226: Please elaborate in the text: what does “(1) an unforced origin for the observed trend” mean?
- Figure 5: Residual correlation as in Smith et al 2019 https://doi.org/10.1038/s41612-019-0071-y might be more appropriate to separate skill due to internal variability and external forcing.
- L238: Or there are cancelling signals in the box that is analyzed, as it also includes Irminger Current and part of the North Atlantic Current in that box. From Figure 1, it follows that apart from one model, all historical experiments have skill in the Labrador Sea.
- L257: What is the difference between “the local OHC skill and ultimately their forecast skill”? Please elaborate in the text.
- Figure 6: Why are there two times “depth” labels on the y-axis?
- L294, 298-303: Can one compare Figures 7 and 8? They seem to show different things: Figure 7 shows ensemble spread and Figure 8 shows shifts of action centers with lead year. Is Figure 8 plotted based on the ensemble mean or also on based on the ensemble members as Figure 7?
- L303-306: Confusing conclusion about “centres of action appear to be unaffected by the forecast drift”. Has Figure 8 been diagnosed based on the drift? But even then, Figures 7 and 8 still represent different things (ensemble spread vs. drift). Or does Figure 8 reflect temporal evolution + drift, then speaking about drift is not appropriate at all?
- L304: “full-field initialization does not correct the position of the simulated centers of action in models” What is the physical mechanism by which ocean initialization should correct NAO centers of action? The original hypothesis was that the NAO drives the LS variability and not the other way around. Why is atmospheric initialization not mentioned here, as these prediction systems are also initialized in the atmosphere? Also, this conclusion (L304-306) cannot be made because DCPs are not compared here with respect to the original data that have been assimilated into the respective prediction systems. Pay attention that all of these systems are initialized from different datasets and with different assimilation/initialization methods, and not necessarily from the verification dataset that is used here. Earlier authors mentioned that ERA5 has its centers also close “to the East” (L299), CMCC-CM2-SR5 is initialized in the atmosphere with ERA-Interim and ERA5. If this is how the CMCC-model wants the centers of action to be (in HIST) and how initialization suggests (ERA-reanalyses), so why should they be located somewhere else?
- L321: Specify which observations are meant here, or is it ERA5 reanalysis?
- L324: I am missing the bridge in this analysis between ocean initialization based on „full-field initialization“ and „more realistic forcing of the NAO“. There is no word about atmospheric initialization, more curiously, it is not even mentioned in Table 1.
- Figure 11. Caption text about the sensitivity of the stratification should be in the main text not in the caption. Instead, the range can be explained in the caption, with 0 value meaning less stratified and 1.5 - more stratified or so. In the legend the green star is not filled, what does this mean? I suggest another sensitivity test with the box confined only to the LS as e.g., in Menary et al 2016 https://doi.org/10.1002/2016GL070906, because in the very first figure almost all the models have skill in the LS. Thus, this analysis in Figure 11 contradicts Figure 1 with now half of the models suggested to not have skill in the LS. Same region, as in this study, in Hermanson et al 2014 https://doi.org/10.1002/2014GL060420 termed as western part of the SPG. Maybe authors need to reconsider naming the region as “LS” or recalculate the analysis for a smaller region that focuses only on the LS.
- L360: “In models that have stronger climatological surface heat fluxes in the LS”. Name those models.
- L360-363: The sentence is not clear. The ACC in the NorCPM is highly anti-correlated with the EN4, likely suggesting the opposite trend (as follows from Figure 3). What does this has to do with the “ lower percentage of the observed OHC700 variance”? There are too many indirect interpretations in the last paragraphs of the result section. Please make sure to refer to figures and analysis, so that the reader can follow authors’ line of thoughts.
- L375, L410: In Figure 11, the multi-model mean ACC value is less than 0.5, the multi-model seems to have 0.4, which means even less variance explained than claimed in the study (25%). MRI model has the skill of 0.5.
Conclusions and Final Remarks
- L383: From where does this follow that the observational uncertainties in the Labrador Sea are low?
- L390: Subpolar North Atlantic is not the region, where models do not show skill in Figure 1. SPG is usually defined from 45N (or 50N) northward, while the region where the ACC has negative values in this analysis already starts at 30N (or 35N) in most of the models. It is more the region of the Gulf Stream path and the Gulf Stream separation in many models. It would be more meaningful to specify the coordinates, latitudinal bands, etc., rather than naming regions incorrectly.
- L391: „It is unclear how much of this low skill is due to the large local observational uncertainties. “ What is exactly meant by this sentence? That the verification dataset is uncertain and so is the skill estimate, or that the initial conditions are uncertain and thus predictions diverge too much from the true initial state? All of this might hold and not only that; model biases, initialization issues and limits of predictability are also reasons for low prediction skill. E.g., in seasonal predictions, deficiencies in predicting position of the jet stream in the atmosphere is an issue; we have the same thing in the ocean in decadal predictions, with the current’s pathways (Gulf Stream separation and pathway). In this respect, it is not clear to me, why only this one reason (“local observational uncertainties”) that limits skill is mentioned in the summary.
- L399: “using multi-model approaches” for what purpose exactly? Operational climate predictions are carried out at national centers with a single model. From Figure 5, most of the models have decadal prediction skill that is higher than that of the historical simulations (L397-398). The skill score for Figure 1 could show the quantitative difference between initialized and uninitialized simulations. As it is now, the conclusion is not convincing.
- L405: Consider this paper for discussion by Hegerl et al 2021 https://doi.org/10.3389/fclim.2021.678109 showing that models that simulate a more realistic SPG stratification show higher SST prediction skill than those that simulate less realistic stratification.
- L413: Some models are initialized only in the ocean (e.g., NorCPM), others are also initialized in the atmosphere (e.g., HadGem3). Information about this is not mentioned in the manuscript. How does this difference (initializing or not initializing atmosphere, full field vs anomaly) across models affects or could affect their performance?
- L420-421: From Figure 1, it does not follow that MPI-ESM model is the among best performing “in the whole North Atlantic”, its skill is way less than in other models.
---
- Minor:
L29: Rephrase “help overcome”
L81: “and” not “or”.
L211: Some problem with a sentence.
L348: “Understanding uncertainties and predictability in the externally forced LS OHC700”
L360: “(Figure 11c)”
L390: “negative skill” not “negative skill score”. No “skill scores” have been shown in the manuscript.
L393: “inter-model differences in terms of the skill spread”. It is necessary to be precise about what differences are meant.
Citation: https://doi.org/10.5194/egusphere-2024-1569-RC1 -
CC1: 'Reviewer Comment on egusphere-2024-1569', Didier Swingedouw, 29 Aug 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-1569/egusphere-2024-1569-CC1-supplement.pdf
-
RC2: 'Comment on egusphere-2024-1569', Didier Swingedouw, 18 Sep 2024
This paper analyses the variations of the ocean heat content averaged over the first 700 m (OHT700) of the ocean into an area surrounding the Labrador Sea region for the period 1970-2014. For this purpose, it uses observational data available and two multi-model sets of climate simulations, one with only the external forcing (historical simulations) and the other with decadal hindcasts starting from observation-based estimate. The analysis shows a very wide range of response in the models, especially for the historical simulations. The authors try to estimate the skill of the different systems to reproduce the OHT700 in decadal prediction and historical simulations, and found an interesting link between this skill and the capability of the models to reproduce observed mean state of stratification and ocean heat fluxes in the Labrador area.
This is an interesting and well-written paper. The analysis led is impressive given how difficult it is to deal with so many climate model data. The interpretation of the results is wise and useful, even though no definitive conclusions can be drawn from this type of multi-model analysis. At least, this is presenting an interesting intercomparison of present-day models to reproduce heat storage in the Labrador Sea area and a few interesting predictors that might be of use for observational constraint approaches.
I therefore think this paper is suitable for publication. I have mainly some comments that might allow to strengthen the demonstrations and possibly improve the interpretation of the results.
- Line 35-40: here the authors are mixing discussions about the subpolar gyre and the wider North Atlantic and ocean heat content and SST. It might be worth to be a bit more specific in the description of those papers.
- Line 61: a reference after forecast range might be useful to support this claim.
- Line 202: The LS, as represented in Figure 2, does not entirely correspond to the Labrador Sea but is going far the east, including the Irminger Sea for instance. In this respect the agreements between observation-based datasets are not that clear to the east (cf. Figure A.1), while the good agreement is taken as a reason to focus on this region in line 204. Please clarify. Have the authors tried a more tied region?
- Line 249-253: ocean stratification and heat fluxes are two variables clearly linked in the convection region. If the halocline is too strong, convection is not allowed and heat fluxes can lead to sea ice formation. It might be worth to state this coupling between these two variables (maybe in the discussion).
- Line 266: it is said line 155 that density is computed with reference 1000 m (sigma_1) while in Figure 6, the caption talks about reference to the surface. Given that the numbers in Figure 6 are larger than 28, I assume this is actually sigma_1. This choice is surprising given that then authors are focusing on the very upper layer. I think it might be better to consider sigma_0 as stated in the caption (while it is not what is shown).
- Line 291-296: why are the observations not shown on Figure 7?
- Line 395-400: The use of residual ACC (Scaife & Smith 2018) might be interesting as well. I’m wondering if this might work for this type of complex quantity like OHT700, especially given the complexity of its forced response. A discussion on this aspect might be interesting here I think.
- Line 412-416: I have the feeling that this aspect has not been much depicted in the result section, so that this discussion seems a bit coming out of the blue. Maybe useful to add a few points on this in the results section.
- Line 431: Yes, the omission of advective processes is clearly missing in this paper, but I can understand that it is far from easy to have those quantities from such a large ensemble of simulations. What about citing Ortega et al. (2015) that was also discussing this type of processes in details? + typo at “mechanisms”
Citation: https://doi.org/10.5194/egusphere-2024-1569-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
216 | 73 | 27 | 316 | 12 | 20 |
- HTML: 216
- PDF: 73
- XML: 27
- Total: 316
- BibTeX: 12
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1