the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Catalogue of Strong Nonlinear Surprises in ocean, sea-ice, and atmospheric variables in CMIP6
Abstract. The Coupled Model Intercomparison Project Phase 6 (CMIP6) archive was analysed for the occurrence of Strong Nonlinear Surprises (SNS) in future climate-change projections. To this end, we built an automated detection algorithm to identify SNS in a reproducible manner. Two different types of SNS were defined: abrupt changes measured over decadal timescales and slower state transitions, too large to be explained by the forcing without invoking strong internal feedbacks in the climate system. Data of 54 models were analysed for five shared socio-economic pathways for ocean, sea ice, and atmospheric variables. The algorithm isolates regions of at least 106 km2 and utilizes stringent criteria to select SNS. In total 73 SNS were found, divided in 11 categories of which 4 apply to abrupt change and 7 to state transitions. Of the identified SNS 45 % relate to sea-ice cover, 19 % to ocean currents, 29 % to mixed layer depth, and 7 % to atmospheric systems like the Intertropical Convergence Zone. For each category, probability density functions for time-windows of maximal change indicate SNS occurring earlier and at lower global temperature rise than assessed in previous reviews, in particular the ones associated with winter Arctic Sea ice disappearance, northern North Atlantic winter mixed layer collapse and subsequent transition of the Atlantic Meridional Overturning Circulation (AMOC) to a weak state in which the cell associated with North Atlantic Deep Water involved has vanished. This catalogue emphasizes the possibility of SNS already below 2 °C of global warming, even more than the previous assessments based on CMIP5 data.
- Preprint
(13625 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-2039', Anonymous Referee #1, 12 Jul 2025
-
RC2: 'Comment on egusphere-2025-2039', Anonymous Referee #2, 24 Sep 2025
The manuscript submitted by Angevaar and Drijfhout propose a cataloguing protocol for identifying strong nonlinear surprises (SNS) in the CMIP6 database. The manuscript describes the method relatively simply (most of the details are in the supplementary material) and then describes successively the 11 identified categories. This work is clearly useful to the community, as it proposes an objective and unified tool to work on nonlinear events of the climate system, and gives a first overview of the main findings. Yet, because of the global and general approach, it necessarily remains a bit too general, and has difficulties to escape the classical caveat of a relatively lengthy qualitative description from which the reader can finish with a clear idea (me at least). Other than that, the paper is very well written. Figures are clear and clean. They are sometimes perhaps a bit too simple, see my comments below.
Because I think this paper may eventually be an important milestone for the TP community, I recommend major revisions following some of the suggestions below.
1 General comment on the SNS 2 types of SNS are introduced (abstract, introduction and methodology). Do the authors consider these 2 are exhaustive nonlinear surprises? If not, what other cases may be considered? How were these selected? Of yes, how and why??
Related to that: how do you ensure that “ Slower transitions” are abrupt or Decadal, as claimed in the abstract?2 General comment on the main message of the tetx, that also appears at the beginning of the discussion section t: the end of the abstract: a bit catastrophist. How many events per simulated year? how depend are these conclusions to the detection tool itself? (did you test it in CMIP5?) How realistic are these findings, given the models biases in terms of mean state, and spread in terms on climate sensitivity? How to estimate the risk?.
3 Cataloguing: it is interesting that specific events in specific models are described. Yet although some of them are illustrated, most of the time by a single timeseries , many others are not, and I general, the behavior of other related variables discussed in the text is not shown. This makes the various paragraphs describing the various SNS difficult to read and to be convincing, the reader supposed to believe the authors and what they write. models I am not sure how to handle that as once again, I believe that a bit of in depth discussion of each of the SNS is interesting. Otherwise we would have not more than a methodology paper. Perhaps adding a few related variables on the figures illustrating the various cases would help?
4 Also it is not easy for the reader to follow which models show which type of SNS and whether some models seem to show more or a cascade of SNS. The authors sometimes discuss sch cascades here and there in the text (SSS related to MD for example). But this is not systematic and not recalled anywhere synthetically. Would it be possible to improve that? Something in the line of Fig. 16 and the related comment in the discussion section, but for models perhaps?
Minor comments
- 52: I don’t see much machine learning in the detection protocol. Please rephrase.
- 55-56 “truly”: I don’t see aby assessment of realism or robustness of the findings. Please clarify what you mean by “truly” here
- 69: capital C missing (cataloguing)
- 96: historical run (why singular?) combined with the scenarios: this step requires a bit more detail. How did you pair members? Could the protocol find some artificial SNS because of the way scenario members are generated in 2014 and paired with historical members?
Around l. 105 and following: regions selection:
I suggest using bullet points to describe the four approach for more clarity in the reading
A bit of explanation of why criteria 2-4 (why may look a bit redundant or similar at first sight) would be welcome
- 125 and following. It is a pity that 6 criteria have to be listed with 3 specific lines for 3 variables. I suggest to rank and perhaps to change a bit the order in which the criteria are presented: (iii) should appear as a sub-criteria of (ii) I guess and (iv) and (v) as sb-criteria of (vi) no? Or list (vi) before (iv and (v) so that the list goes from the more general to the more specific. Also, I find that (iv) and (v) are not really justified as (vi) does not say anything about the structure of the data. The fact of the data have a different structure is of the developer’s business, and should not appear in this list of general criteria I think.
Clarifying this list either through reducing it or ranking it would strongly enhance the impact of the method.
- 151: how is this normalization performed?
- 156 and following: I would rewrite the number of events concerned by each category in the title of each subsection (A= etc)
- 161: polar amplification -> perhaps cite XX to be more complete?
- 164-165: please clarify this sentence and explain better which types of events are in category A and which are in a. It is the first time these letters are mentioned in the text I think. Furthermore, the beginning of the paragraph discussed the cases in general and I am not sure why CanESM5 is a specificity here (see l 165 “In this model”).
L 182: logical link between this “outlier” and what precedes is unclear to me. The previous sentence was describing positive feedbacks favoring sea ice melt. The 2 examples that follow rather concern impacts of sea ice SNS don’t they?
- 200 and following: do you want to speak of SNS cascades?
- 215 and following (but applying to most cases): is there something in the way sea ice is represented, or mean state bias, that could explain the specific behavior of these few models?
- 227 and following: I wonder if this example is well placed. Shouldn’t it appear rather in a section focusing on MLD SNS?
- 245: wasn’t criteria (v) defined specifically for this purpose?
- 253-254: this allusion to Swingedouw et al 2021 is largely a repetition of what precedes I think. Remove the sentence?
L 257: not clear to me why the GISS model is suddenly specifically cited here.
- 310-311 “this site is not particularly known for deep convection”: but how is it in this model? This relates to my general comment on the models mean state.
Around l. 315: I acknowledge the discussion on the models mean state systematic biases. I think this is very useful and could appear on other instances in the manuscript
- 321: Comparison to Swingedouw et al 2021: we don’t really know who to believe. All this is relative to the detection tool
- 325: add a “probably” in this sentence
- 330 is (or was) missing before identified
- 346= “in CMIP6” Are you really able to generalize that much? “in some CMIP6 models” rather?
L 400 and following: the monsoon system if often described as a high potential for TP. Could the ITCZ transition be a precursor of linked to such TP?
L 410: is this bia only present in 1 member of the IPSL model? If not, why would the transition be linked to biases then?
L 427 and following: given all the differences described here, plus the argument on the size of the region (which maybe should be added here): can one really speak of an increase? I think this is a bit overstated
L 441 and following: I suggest repeating which physical component of the climate system the lettered categories relate to.
L493-494: sorry I don’t understand (or don’t know) this notation, please explain.
Citation: https://doi.org/10.5194/egusphere-2025-2039-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
788 | 84 | 18 | 890 | 16 | 26 |
- HTML: 788
- PDF: 84
- XML: 18
- Total: 890
- BibTeX: 16
- EndNote: 26
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
General comments
This manuscript describes a catalogue of Strong Nonlinear Surprises (SNS) in ocean, sea ice and atmospheric variables in CMIP6. The authors expanded on the methodology of a previous assessment on CMIP5 by Drijfhout et al. (2015) by automating the detection of SNS and including an algorithm to combine grid cells into spatially connected regions with SNS. They have a set of 6 categories of SNS, including abrupt changes and state transitions.
The developed method substantially improves the previous method, specifically by automating and including a spatial algorithm. The algorithm performs very well, and the authors are able to successfully capture large SNS in the data. The results are of great interest and highly valuable to the community. The results lead to new insights and have a high potential to stimulate further research and discussion within the field on abrupt dynamics in the climate system. The manuscript could benefit from a clearer description of the methods, careful framing of the results, and a reorganized and more substantive discussion.
Major comments
1.
Could the authors please clarify in the methods section how exactly the regions are determined. Specific points to consider here are the following.
Thresholding is used to select different regions. How are these initial regions created/selected? Is the percentage threshold based on the very first and last values of the timeseries or over a smoothed timeseries/average over n years? This could make a difference for variables with high variability. For the third region finding approach, what is the reasoning for multiplying the percentage scores? In the third phase, formal criteria are applied to the selected regions. Does this merge regions of the same region-finding method, or does it merge any regions regardless of the region-finding approach? If so, will this lead to “smoothing” out of SNS events? Also, what is the point of having higher thresholds in case the different types of regions are merged? Why does it not work to only use the lowest threshold of T = 85%?
2.
It is not fully clear what choices the authors made in arriving at the 6 different SNS categories and how to interpret them, i.e. in what ways are they similar or different. What is, for example, the difference in interpretation between categories i and ii? Type ii is towards the end of the time series, and it can therefore be less robustly tested whether the change is persistent. Should this then be interpreted differently from a “real” abrupt change event i? Please give a short explanation of what the authors regard as a state transition/new state (criteria iv to vi).
In addition, the manuscript would benefit from more robust reasoning for the different categories and differentiation between abrupt changes and state transitions. Categories iii to vi are concerned with state transitions instead of abrupt shifts. However, when looking at the detected time series, the SNS often seem abrupt (e.g. sea ice “A” and “a” both change abruptly relative to the timescale of their normal dynamics). What is the motivation for separating these? With regards to the criteria of category iii, can the authors explain why they decided on this criterion instead of using vi with an extra requirement of a minimum surface area? Currently, the results sections are divided into abrupt shifts and state transitions for the same systems. Without a clear reasoning on the difference between the two, perhaps the authors can merge the sections for each physical system instead of having this distinction.
3.
Throughout the sections discussing the SNS results, the authors make statements about the mechanisms or forcings of the identified SNS without discussing how they arrived at this conclusion. Can the authors please substantiate the claims they make on this, whether it is based on analyzing the data of multiple variables at the SNS or on literature. We suggest that claims like “forced by”, “caused by”, “leads to”, “driven by” need to be backed up with either references or a note on what is observed in related variables around the SNS.
An (incomplete) list of points where this was done is shown below, and the manuscript would benefit from a thorough check on the whole results section on whether the claims are substantiated.
4.
It would be good if the computation of the CDFs was added to the methods section, instead of only being explained and discussed in the discussion. The results of the global CDFs could then be placed near the end of the results section. This would improve the structure and readability a lot.
Figures 16 and 17 are informative showing the distributions of global warming at which the SNS occurred. In the second panel of Figure 16, it shows that there are very few simulations above 6 degrees of warming. The authors currently use a cut-off of 11 degrees, but maybe this should be lowered to 6 degrees. The high temperature region draws a lot of attention while not being informative due to the very high uncertainty. Moreover, the color palette puts a very strong focus on the SSP585 scenario due to the bright color. In Figure 17, the CDFs of all categories are shown. However, some categories contain just one model simulation. This makes the CDF highly uncertain. Maybe only the CDFs with more than e.g. 5 detected SNS could be shown or separate those with more simulation by a different line style.
Furthermore, in the introduction (line 62), it is stated that PDFs are used to give the likelihood of maximum change. Can the authors explain or provide a reference to why a single simulation can statistically give a likelihood? In Figure 3, the global warming level at the point of maximum change in the SNS are used instead of PDFs. What is the reasoning for not using the same method in both cases? For Figure 3, one could take the global warming level at e.g. the midpoint of the PDF instead.
5.
The discussion contains important points, but it requires an improved structure. A large part of the discussion section is currently occupied with methodology and new results (how to obtain the CDFs and the global warming levels). The manuscript would benefit from moving this part to the methodology and results sections (as described in major comment 4).
The discussion is currently very brief (when the CDF results are not considered). It would be good if the authors linked back to some of the points they mention in the introduction, like the distinction between abrupt shifts and tipping points, and how their results fit into this. In addition, it would be valuable if some discussion on the individual physical subsystems was added, placing their results in the wider literature context including a discussion on future research directions for specific systems.
Lastly, the main conclusion the authors draw is that the number of SNS events rises until a global warming level of 6 is reached where it stabilizes, even though few simulations reach such high temperatures. It is a little unclear how to interpret this rise since there are less simulations. We would recommend rephrasing this conclusion such that it is better supported by the previous discussion.
Specific comments
The authors clearly describe how they distinguish between the terms “abrupt changes/shifts” and “SNS”. They also include a short discussion about the potential harmful effects of the tipping points concept. The strength of the statements regarding the tipping point controversy does not reflect the content of the paper. We believe it is important to discuss the distinction between tipping points and abrupt shifts, but the way it is currently framed might distract from the goal of the paper.
The authors mention in the introduction that they search for events that are “truly surprising”. What is this exactly according to the authors?
It is generally well-argued in the introduction that the goal is to detect large and abrupt changes in the data. However, the authors also argue they want to limit the total detected amount of these events. What is the reasoning behind that? Instead of being guided by the quantity of SNS in the data, the goal now seems to only find the largest changes instead of all large events. This needs more justification.
In the introduction, the authors reference the use of machine learning (line 52). How did the authors make use of machine learning? In the methods section, there is no reference to a machine learning method.
The paragraph at line 85 lists all scanned variables. Why is only one atmospheric variable used? In the introduction it seems that atmospheric variables are also a focus point (also at line 500), which does not come back strongly in the rest of the manuscript.
Why do the authors combine historical data with SSP scenarios? Is this to gather enough statistics for e.g. the Diptest? If so, please mention this in the methods.
Why is global warming calculated with respect to the average temperature from 1850-1880 instead of the preindustrial temperatures from preindustrial control simulations?
Around line 95 the authors mention they only look at yearly averages (except for mixed-layer depth). Some other variables, like sea ice extent, also depend heavily on the season. Summer and winter sea ice likely disappears at different forcing levels. Could the authors analyze summer and winter sea ice separately? By averaging year-round, abrupt changes in summer sea ice are likely missed.
The category of each detected SNS is often denoted by a letter (either lowercase or uppercase). This is difficult to keep track of. We suggest writing it out instead throughout the whole manuscript since it is difficult to remember every category (e.g. “Abrupt shift in NH sea ice” instead of “category a”).
In figure 3, the abbreviations are not yet explained. The figure shows different locations for abrupt changes versus state transitions (for example, MLD and sea ice). Could this be clarified? Furthermore, the order in Fig 3 does not correspond to the order in which the systems are discussed in the results section. It would help to align this for clarity.
It is not clear what the difference is between section 3.1 and section 3.2. According to the formal criteria they are indeed divided into different categories, but are they really physically different from each other? When looking at the time series in Figure A3, the loss of sea ice is also abrupt, even though they are treated as state transitions instead of abrupt shifts. How big is the overlap between models in sections 3.1 and 3.2?
At line 199-200, the authors mention that the thresholds are reached earlier in CMIP6 with reference to Figure 5. However, this figure does not relate to this statement. It would be good to mention the new temperature range in CMIP6 for this comparison since this is not explicitly mentioned.
Line 258: Over what region is this temperature impact measured? Over the area where the mixed layer collapses?
Line 262: Please add an explicit reference with whom the authors agree.
3.4 and 3.5: In both sections 3.4 and 3.5, changes in the subpolar gyre are discussed. In the first paragraph of 3.4, the authors explain that they do not find any abrupt shifts in SPG convection, but later they do discuss such changes. This is confusing and requires clarification. Furthermore, the comparison with the results of Swingedouw et al. (2021) is framed in 3.4 as if the results do not match, while in 3.5 many of the same models are found to exhibit SNS, only as state transitions instead of abrupt changes (see also major comment 2). Because of the large differences between methods and definitions, we suggest that the authors do not make this statement as strongly. Especially regarding the large area threshold used, smaller scale abrupt shifts (on the order of e.g. the Labrador sea) cannot be found, making a precise comparison nearly impossible. Why did the authors not consider a smaller area threshold for this system, given the scale of the processes relevant for convection?
Line 291: What are these larger regions? How are these obtained?
Line 356: In the text, a comparison is made between this manuscript and other articles with reference to Figure 11. However, this figure does not contain any comparisons; this should be added to the figure.
Line 397: Can a reference be added?
Line 403-404. Why are the transitions associated with model bias? In what sense does the double ICTS become less pronounced?
In the first paragraph of the discussion, it is mentioned that there is a large increase in number of SNS between this assessment and Drijfhout et al. (2015). How is this statement supported? Both assessments used different methodologies and criteria. Using an automated algorithm could likely have increased the number of detected SNS. It would be interesting to see how many events would be detected if some of the CMIP5 variables were re-analyzed with the new methodology (although this would require substantial work and therefore is not a request to the authors).
Line 475: What is meant by the small bump? It is not clear where in the figure this is visible (there is however a small bump at 0 degrees?).
In the discussion, global warming thresholds are given for each category of SNS. Could this be summarized in a table?
Figure 17: What is meant by “maximally changing”? Why does the temperature of the bottom figure range from 0 to 5, while the upper one ranges from 0 to 17?
The authors mention at the end of the introduction that they will compare their results to the assessment of Terpstra et al. (2025). However, in the discussion they do not compare much of the results apart from stating that the CMIP6 thresholds are lower than in CMIP5. The authors could also make a comparison between frequency/thresholds between this manuscript and Terpstra et al. (2025). Even though both use different scenarios, and indeed one-on-one comparison is not possible, would it be possible to go into a bit more detail in the comparison?
Although not strictly necessary, it would be very interesting to have figures with both the time series and spatial extent (like e.g. figures 4, 5, 6) available for all SNS in an online supplement/repository if it does not require too much effort from the authors.
Technical corrections
Add consistent numbering format (e.g. line 324 “seven” and “3”)
Line 22: “Ref” should be the actual reference
Line 56: Remove extra brackets around citation
Line 68: Sentence not starting with a capital letter.
Lines 88-89: clarify what the difference is between msftyz and msftmz since now they both have the same full name (or state they are the same)
Line 96: maybe mention nominal resolution explicitly of Gaussian N90 grid for non-expert readers.
Line 99: TAS is written upper case here, but with lower case at line 90.
Line 131: “Generally, i and vi are generic criteria”. These are types/categories, not criteria.
Line 132 – 133: missing words in this sentence
Line 173: What does “Its similarity” point to? The abrupt change or abrupt shift in the previous sentence?
Line 188: This only occurs in one model, so remove “typically”
Line 206: The abbreviation of ppt is mentioned here but afterwards it is only used in the figures. Maybe this sentence can be removed.
Line 252: Suggestion: SSS decreasing the surface density à freshening
Line 254-256: Check the grammar of this sentence. What do the authors mean exactly by “in terms of atmospheric cooling”?
Line 266-267: NorESM2-MM and NorESM2-LM are mentioned in these two lines. Shouldn’t these both be NorESM2-MM?
Line 272: remove comma between number and unit.
Line 284: “unlikely whether” is not clear, maybe rephrase this sentence.
Line 291: “looking to” --> “looking at”
Lines 308-313: There is some repetition in mentioned locations of the transitions in different models.
Line 412: “…also work in Nature” --> “…also are present in nature”
Page 4, footnote 1: “that” --> “than”
In figure 12, the regions of SNS are shown for both SST and SSH. Is it correct that for both variables the regions are exactly the same?
Figure 16: Unit of degree Celsius is not displayed correctly in the pdf.