Status: this preprint is open for discussion and under review for Nonlinear Processes in Geophysics (NPG).
Predicting the distance of the AMOC to its tipping point using CNNs
Francesco Guardamagna,Sacha Sinet,and Henk A. Dijkstra
Abstract. The Atlantic Meridional Overturning Circulation (AMOC) is an important tipping element of the climate system, with the potential to undergo an abrupt transition from its present strong state to a weak state. Such a collapse would have severe global consequences, including regional cooling, sea-level rise, altered precipitation patterns, and cascading impacts on other climate tipping elements. Both statistical and physics-based early warning signals (EWS) of an approaching AMOC tipping event have been proposed. Here, we introduce a convolutional neural network (CNN)–based framework designed to predict the distance of an AMOC state to its tipping point under imposed freshwater flux forcing. We first evaluate the CNN model using simulations from the Earth System Model of Intermediate Complexity CLIMBER-X. We then test its generalization capabilities by applying the CNN model, trained on CLIMBER-X data, to the AMOC tipping trajectory obtained recently in the Community Earth System Model (CESM). Explainable AI methods are used to identify the spatiotemporal features most relevant to the predictions. Our results demonstrate the potential of deep learning to provide reliable estimates of the distance to the AMOC tipping point and generalize across models of varying complexity.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
The Atlantic meridional overturning circulation (AMOC) is known to exhibit multiple equilibria and tipping points, which can be observed in climate model simulations where a freshwater flux is applied in the North Atlantic. In this draft, the authors propose inferring the accumulated freshwater flux applied in the North Atlantic from the patterns of sea surface temperature, sea surface salinity and salinity cross-section at 35°S in the Atlantic Ocean. This inference uses data from CLIMBER-X (an earth system model of intermediate complexity) and CESM (a low-resolution climate model). The accumulated freshwater flux is normalized in each model by an estimate of the total accumulated freshwater flux required to trigger a tipping point. Here, the method employs convolutional neural networks (CNNs). The results obtained are discussed using a sensitivity analysis.
The draft presents original research; however significant work is needed to make it publishable. Specifically, the scientific question and the research hypotheses are not clearly presented in the introduction. The results presented are overly technical and lack synthesis. The sensitivity analysis fails to reveal consistent patterns. The conclusion and discussion are underdeveloped.
Research hypotheses
The data from only two models are used. How representative are these two models? Are their outputs consistent with those of other models concerning the AMOC tipping and its SST and salinity signatures?
Is the accumulated freshwater flux leading to a tipping AMOC well estimated in the simulations? The manuscript lacks clear explanations of how the AMOC collapse or tipping is defined or estimated. Would it be valuable to incorporate some uncertainties?
Data from hosing experiments may differ significantly from real-word. For instance, hosing experiments often involve a shallowing of the mixed layer in the hosing region, whereas rea-world freshwater flux from melting ice sheets has a different pattern. This limitation needs to be discussed, as the machine learning method developed may fail when applied to observational data.
Methods and baseline
A linear regression model is given as a baseline. However, given the high dimensionality of the input data, the linear regression model may overfill. I suggest using a ridge regression or a random forest model instead.
Another point concerns the normalization. L243 : ‘’CESM inputs are normalized using the mean and standard deviation of the CLIMBER-X training data’’ In climate science, the model data often show large differences, as they all have different biases. I understand that the mean state of CLIMBER-X was removed from CESM data to define anomalies. If this is correct, this approach would emphasize the difference between CESM and CLIMBER-X rather than the distance to tipping. A better approach might be to use anomalies defined with a reference period relative to each model. Additionally, dividing by the standard deviation of CLIMBER-X could lead to a high variability in CESM data if the standard deviation of CLIMBER-X is much smaller than the one of CESM. Can the authors compare the standard deviations of the two models?
Selection of results discussed
The authors should explicitly refer to each figure panel, line and symbol to help reader verify statements and hypotheses. Currently, it is difficult to follow what is being discussed. Are the claims based on figures, hypotheses or suggestions? The results are presented as an exhaustive list of figures, which is overwhelming. I suggest reducing the figures. For instance, Figs. 6, 7, 8, 9 and 10 can be reduced to show the sensitivity analysis for a signle freshwater forcing and CLIMBER-X, with the other results summarized in the text. The same remark applies to appendices B and C that could be removed or significantly condensed. For appendix C, graphs might be more effective than tables for visual clarity.
The unclear presentation and difficult readability made the review challenging, so I primarily focused on the first 3 sections for minor comments.
Description of method
The training, validation and testing strategy concerning the CESM results is unclear and needs to be clarified.
Introduction and conclusion
The introduction needs to better explain the experiments used in the paper, and their limitations. The conclusion needs to provide comparisons with related recent findings, suggest future perspectives and discuss limitations.
Minor comments
Introduction:
L37-38: ‘’First, it requires prior knowledge about the freshwater forcing rate, which is a significant constraint for real-world applications.‘’ The term freshwater forcing is ambiguous. Clarify whether it refers to a freshwater imposed from in hosing experiments, or the climatological freshwater forcing from the surface water budget (e.g., precipitation minus evaporation and runoff)?
L38-30 : ‘’ the model must be trained on data from the simulation itself, extending up to 100 years before the onset of collapse. ‘’ Please specify the total number of year used, the physical field or time series used in the training data.
L39-41: ‘’ as the model relies solely on one-dimensional indices, such as the AMOC strength at 26◦N as input, it precludes the application of explainable AI techniques ‘’ Why do the authors argue IA technique cannot be applied to indices? Provide justification or revise the statement
L68-69: ‘’ matches many aspects of state-of-the-art CMIP6 models across diverse forcings and boundary conditions (Willeit et al. (2022a)). ‘’ Can the authors be more specific and describe which processes have been validated and for what types of experiments?
L74 : ‘’ the AMOC collapses once the freshwater forcing reaches F_H^C = 0.22 Sv’’. How was the threshold determined? The manuscript does not provide sufficient explanation. Also define AMOC collapse. This term is loosely used in the literature. Specify the criteria for collapse (threshold, or detection of bifurcation). I suggest that the authors show and illustrate the results supporting this value.
L78: ‘’The Community Earth System Model (CESM) is a fully coupled GCM. ‘’ Clarify the differences between CESM and CLIMBER-X. What processes included in each model?
L86-87: ‘’ Under this forcing, van Westen et al. (2024a) estimated that the AMOC reaches its tipping point at model year 1758, when the freshwater input into the North Atlantic reaches F_H^E = 0.53 Sv.’’ How was the tipping point estimated? Is it equivalent to say that the AMOC collapsed and that the AMOC reached a tipping point? What does the model year 1758 refer to? Does the simulation start at year 0? Provide contextfor the timeline.
L112-113 and Fig. 1: ‘’ The CNN is trained using Sea Surface Temperature (SST) and Sea Surface Salinity (SSS) fields across the Atlantic Ocean (spanning from 90°N to 35◦S)’’ Define the Atlantic Ocean boundaries. Most definition does not extend beyond 80°N (e.g., Fram straight). The Arctic included (see Fig. 1) justify this choice. Why do the authors choose to use SST and SST as input? Why not include subsurface ocean data, which may provide additional predictive skill?
L114: ‘’the full-depth salinity profile at 35◦S ‘’ Do the authors mean that they used the cross-section at 35°S in the Atlantic Ocean in the depth-longitude space? Justify the choice of this latitude and its relevance to AMOC dynamics and tipping point.
L121 : ‘’where F_H(t) denotes the freshwater flux value ‘’ Define freshwater flux. Does it refer to an additional freshwater flux added (e.g. hosing)? Or is it the actual total freshwater flux (precipitation minus evaporation + runoff)? Is it the same as the forcing rate provided L129 and L130? Clarify also the unit for both terms.
L129-130 ‘’ For all forcing rates, d_F(t) is defined with respect to a freshwater flux at tipping F_H (t_p) = F_H^C = 0.22 Sv, which corresponds to the tipping point identified for the slowest forcing experiment ‘’ How was the tipping point identified? Why use the same F_H (t_p) for all forcing rates? Would the results differ if F_H (t_p) varied? Just a question : would the results be different if d_F(t) was defined as (t-t_0)/(t_p-t), where t_0 the time of the initial conditions.
L130: Why is the unit of a freshwater forcing given in Sv yr-1? Clarify the units. A freshwater flux is typical expressed in Sv. Is the forcing a rate of change?
L135: ‘’ then evaluated on the trajectory excluded from both training and validation ‘’ Define trajectory. Does it refer to a time series from a single simulation? Please specify the data used for evaluation.
L161-162: ‘’ The reported predictions are the median across 20 independent training trials; variability across trials is negligible and therefore not shown. ‘’ Justify the need for multiple trials, and reduce the number of trial if results are deterministic.
L157-159 : ‘’ The LR model is trained using the same input variables and target (dF ), again following the procedure outlined in Section 2.4. ‘’ The LR model may overfit when using such high dimensional input. Did the authors try to apply regularization or to reduce the dimensionality of the input? If not, address this limitation and switch to a more robust baseline.
L172-173: ‘’ corresponding to a prediction uncertainty of 961 years (9.61 × 10−3 Sv if we express the error in terms of freshwater forcing) ‘’ Clarify the calculation. How was the 961 years derived? How does this translate to 9.61 10-3 Sv?
L179-180 : ‘’ with very low percentage errors relative to the total span of the test simulations.’’ Can the authors explain better? What is the length of the test simulation then?
L188-189: ‘’ the LR provides skillful predictions ‘’ Define skillful. Is this based on a statistical metric?
L213-215 : ‘’ The primary advantage of the CNN lies in its generalization capability. When trained on CLIMBER-X data and evaluated on the more complex CESM model, the LR model fails to provide reliable predictions, whereas the CNN demonstrates robust 215 generalization performance (see Section 3.2). ‘’ The claim about CESM results is premature, as these results are not yet presented in the manuscript.
L219-220: ‘’ For rF = 10−4 Sv yr−1, the collapse of the AMOC is initiated approximately 200 years before the system reaches its actual 220 tipping point, marking the onset of a regime shift’’ Which figure illustrates this result? Define AMOC collapse. Explain ‘’initiated’’. Does this refer to the start of a decline?
Appendix B : reduce the text and figures to focus on key results. Improve explanations to highlight the most important findings.
Fig. 3, legend : no need to explain what is a boxplot, this is common knowledge. However, I suggest to keep the definition of extremes and error bars.
Fig. 3: I suggest to use consistent color for models using the same inputs. For instance, CNN using SST-only in yellow boxplot and LR using SST-only with a yellow triangle…
Fig. 3: What is shown here? Evaluation of the model when using the test simulation with the six other simulations used for training and validation?
L246-251: The authors explained L226-232 that CESM data was only used for evaluation, why then the hyperparameter are modified using CESM data?
L248 : ‘’the first 430 years are discarded to remove transient effects’’ Define ‘’transient effects’’.
L263-268 : ‘’ Despite these measures, some variability remains in the validation result’’ and ‘’ In what follows, we present results obtained with the best-performing configuration on the CESM test set (last 880 years).‘’ the large variability in the results obtained suggest that there is a large uncertainty in the inference. I suggest to quantify the related uncertainty.
Fig 4 : Where is the training data in panel (a)? How is the tipping detected in panel (a)?
L277 : here 500 trials are mentioned while 50 trials are mentioned at L255? Why 500 ? Clarify the purpose of such a large number of trials.
L344-347 : here and more generally in this subsection, can the authors refer to the figure / panel for each statement.
L374 : ‘’SV relevance maps’’ Explain the acronym SV.
L390 : ‘’ the magnitude is quantified using Sen’s slope estimator.’’ Justify the choice of Sen’s slope estimator over a linear trend.
L409-410: ‘’ First, we note that patterns are consistent with those reported by Stouffer et al. (2006), who conducted an inter-comparison of several EMICs, including an earlier version of the CLIMBER-X model used here. ‘’ Update the discussion to include more recent papers for context.
Section 4.2 and 4.3 : refer ro figure panels when describing results to improve readability. Maybe focus on only one freshwater forcing, and show the sensitivity or the SST, SSS and S35S for CLIBBER-X only.
The relevance score in Figs. 6, 7, 8, 9 and 10 are quite noisy and suggest that the CNN may not be learning physically consistent features. Discuss the implication for the skill of the CNN. Can it be linked to the large variability obtained?
Francesco Guardamagna,Sacha Sinet,and Henk A. Dijkstra
Model code and software
Code for reproducing the results of the paper "Predicting the Distance of the AMOC to Its Tipping Point Using CNNs"Francesco Guardamagna https://doi.org/10.5281/zenodo.19369578
Francesco Guardamagna,Sacha Sinet,and Henk A. Dijkstra
Viewed
Total article views: 568 (including HTML, PDF, and XML)
HTML
PDF
XML
Total
BibTeX
EndNote
370
179
19
568
37
38
HTML: 370
PDF: 179
XML: 19
Total: 568
BibTeX: 37
EndNote: 38
Views and downloads (calculated since 10 Apr 2026)
Cumulative views and downloads
(calculated since 10 Apr 2026)
Viewed (geographical distribution)
Total article views: 565 (including HTML, PDF, and XML)
Thereof 565 with geography defined
and 0 with unknown origin.
The Atlantic Meridional Overturning Circulation (AMOC) is an ocean current that redistributes heat in the Atlantic Ocean and may abruptly weaken under climate change, with impacts including cooling in Europe and sea-level rise along the North American East Coast. Freshwater input from melting ice into the North Atlantic can push the system toward a tipping point. We introduce an artificial intelligence-based method to estimate the distance to this tipping point in terms of freshwater forcing.
The Atlantic Meridional Overturning Circulation (AMOC) is an ocean current that redistributes...
The Atlantic meridional overturning circulation (AMOC) is known to exhibit multiple equilibria and tipping points, which can be observed in climate model simulations where a freshwater flux is applied in the North Atlantic. In this draft, the authors propose inferring the accumulated freshwater flux applied in the North Atlantic from the patterns of sea surface temperature, sea surface salinity and salinity cross-section at 35°S in the Atlantic Ocean. This inference uses data from CLIMBER-X (an earth system model of intermediate complexity) and CESM (a low-resolution climate model). The accumulated freshwater flux is normalized in each model by an estimate of the total accumulated freshwater flux required to trigger a tipping point. Here, the method employs convolutional neural networks (CNNs). The results obtained are discussed using a sensitivity analysis.
The draft presents original research; however significant work is needed to make it publishable. Specifically, the scientific question and the research hypotheses are not clearly presented in the introduction. The results presented are overly technical and lack synthesis. The sensitivity analysis fails to reveal consistent patterns. The conclusion and discussion are underdeveloped.
The data from only two models are used. How representative are these two models? Are their outputs consistent with those of other models concerning the AMOC tipping and its SST and salinity signatures?
Is the accumulated freshwater flux leading to a tipping AMOC well estimated in the simulations? The manuscript lacks clear explanations of how the AMOC collapse or tipping is defined or estimated. Would it be valuable to incorporate some uncertainties?
Data from hosing experiments may differ significantly from real-word. For instance, hosing experiments often involve a shallowing of the mixed layer in the hosing region, whereas rea-world freshwater flux from melting ice sheets has a different pattern. This limitation needs to be discussed, as the machine learning method developed may fail when applied to observational data.
A linear regression model is given as a baseline. However, given the high dimensionality of the input data, the linear regression model may overfill. I suggest using a ridge regression or a random forest model instead.
Another point concerns the normalization. L243 : ‘’CESM inputs are normalized using the mean and standard deviation of the CLIMBER-X training data’’ In climate science, the model data often show large differences, as they all have different biases. I understand that the mean state of CLIMBER-X was removed from CESM data to define anomalies. If this is correct, this approach would emphasize the difference between CESM and CLIMBER-X rather than the distance to tipping. A better approach might be to use anomalies defined with a reference period relative to each model. Additionally, dividing by the standard deviation of CLIMBER-X could lead to a high variability in CESM data if the standard deviation of CLIMBER-X is much smaller than the one of CESM. Can the authors compare the standard deviations of the two models?
The authors should explicitly refer to each figure panel, line and symbol to help reader verify statements and hypotheses. Currently, it is difficult to follow what is being discussed. Are the claims based on figures, hypotheses or suggestions? The results are presented as an exhaustive list of figures, which is overwhelming. I suggest reducing the figures. For instance, Figs. 6, 7, 8, 9 and 10 can be reduced to show the sensitivity analysis for a signle freshwater forcing and CLIMBER-X, with the other results summarized in the text. The same remark applies to appendices B and C that could be removed or significantly condensed. For appendix C, graphs might be more effective than tables for visual clarity.
The unclear presentation and difficult readability made the review challenging, so I primarily focused on the first 3 sections for minor comments.
The training, validation and testing strategy concerning the CESM results is unclear and needs to be clarified.
The introduction needs to better explain the experiments used in the paper, and their limitations. The conclusion needs to provide comparisons with related recent findings, suggest future perspectives and discuss limitations.
Minor comments
Introduction:
L37-38: ‘’First, it requires prior knowledge about the freshwater forcing rate, which is a significant constraint for real-world applications.‘’ The term freshwater forcing is ambiguous. Clarify whether it refers to a freshwater imposed from in hosing experiments, or the climatological freshwater forcing from the surface water budget (e.g., precipitation minus evaporation and runoff)?
L38-30 : ‘’ the model must be trained on data from the simulation itself, extending up to 100 years before the onset of collapse. ‘’ Please specify the total number of year used, the physical field or time series used in the training data.
L39-41: ‘’ as the model relies solely on one-dimensional indices, such as the AMOC strength at 26◦N as input, it precludes the application of explainable AI techniques ‘’ Why do the authors argue IA technique cannot be applied to indices? Provide justification or revise the statement
L68-69: ‘’ matches many aspects of state-of-the-art CMIP6 models across diverse forcings and boundary conditions (Willeit et al. (2022a)). ‘’ Can the authors be more specific and describe which processes have been validated and for what types of experiments?
L74 : ‘’ the AMOC collapses once the freshwater forcing reaches F_H^C = 0.22 Sv’’. How was the threshold determined? The manuscript does not provide sufficient explanation. Also define AMOC collapse. This term is loosely used in the literature. Specify the criteria for collapse (threshold, or detection of bifurcation). I suggest that the authors show and illustrate the results supporting this value.
L78: ‘’The Community Earth System Model (CESM) is a fully coupled GCM. ‘’ Clarify the differences between CESM and CLIMBER-X. What processes included in each model?
L86-87: ‘’ Under this forcing, van Westen et al. (2024a) estimated that the AMOC reaches its tipping point at model year 1758, when the freshwater input into the North Atlantic reaches F_H^E = 0.53 Sv.’’ How was the tipping point estimated? Is it equivalent to say that the AMOC collapsed and that the AMOC reached a tipping point? What does the model year 1758 refer to? Does the simulation start at year 0? Provide contextfor the timeline.
L112-113 and Fig. 1: ‘’ The CNN is trained using Sea Surface Temperature (SST) and Sea Surface Salinity (SSS) fields across the Atlantic Ocean (spanning from 90°N to 35◦S)’’ Define the Atlantic Ocean boundaries. Most definition does not extend beyond 80°N (e.g., Fram straight). The Arctic included (see Fig. 1) justify this choice. Why do the authors choose to use SST and SST as input? Why not include subsurface ocean data, which may provide additional predictive skill?
L114: ‘’the full-depth salinity profile at 35◦S ‘’ Do the authors mean that they used the cross-section at 35°S in the Atlantic Ocean in the depth-longitude space? Justify the choice of this latitude and its relevance to AMOC dynamics and tipping point.
L121 : ‘’where F_H(t) denotes the freshwater flux value ‘’ Define freshwater flux. Does it refer to an additional freshwater flux added (e.g. hosing)? Or is it the actual total freshwater flux (precipitation minus evaporation + runoff)? Is it the same as the forcing rate provided L129 and L130? Clarify also the unit for both terms.
L129-130 ‘’ For all forcing rates, d_F(t) is defined with respect to a freshwater flux at tipping F_H (t_p) = F_H^C = 0.22 Sv, which corresponds to the tipping point identified for the slowest forcing experiment ‘’ How was the tipping point identified? Why use the same F_H (t_p) for all forcing rates? Would the results differ if F_H (t_p) varied? Just a question : would the results be different if d_F(t) was defined as (t-t_0)/(t_p-t), where t_0 the time of the initial conditions.
L130: Why is the unit of a freshwater forcing given in Sv yr-1? Clarify the units. A freshwater flux is typical expressed in Sv. Is the forcing a rate of change?
L135: ‘’ then evaluated on the trajectory excluded from both training and validation ‘’ Define trajectory. Does it refer to a time series from a single simulation? Please specify the data used for evaluation.
L161-162: ‘’ The reported predictions are the median across 20 independent training trials; variability across trials is negligible and therefore not shown. ‘’ Justify the need for multiple trials, and reduce the number of trial if results are deterministic.
L157-159 : ‘’ The LR model is trained using the same input variables and target (dF ), again following the procedure outlined in Section 2.4. ‘’ The LR model may overfit when using such high dimensional input. Did the authors try to apply regularization or to reduce the dimensionality of the input? If not, address this limitation and switch to a more robust baseline.
L172-173: ‘’ corresponding to a prediction uncertainty of 961 years (9.61 × 10−3 Sv if we express the error in terms of freshwater forcing) ‘’ Clarify the calculation. How was the 961 years derived? How does this translate to 9.61 10-3 Sv?
L179-180 : ‘’ with very low percentage errors relative to the total span of the test simulations.’’ Can the authors explain better? What is the length of the test simulation then?
L188-189: ‘’ the LR provides skillful predictions ‘’ Define skillful. Is this based on a statistical metric?
L213-215 : ‘’ The primary advantage of the CNN lies in its generalization capability. When trained on CLIMBER-X data and evaluated on the more complex CESM model, the LR model fails to provide reliable predictions, whereas the CNN demonstrates robust 215 generalization performance (see Section 3.2). ‘’ The claim about CESM results is premature, as these results are not yet presented in the manuscript.
L219-220: ‘’ For rF = 10−4 Sv yr−1, the collapse of the AMOC is initiated approximately 200 years before the system reaches its actual 220 tipping point, marking the onset of a regime shift’’ Which figure illustrates this result? Define AMOC collapse. Explain ‘’initiated’’. Does this refer to the start of a decline?
Appendix B : reduce the text and figures to focus on key results. Improve explanations to highlight the most important findings.
Fig. 3, legend : no need to explain what is a boxplot, this is common knowledge. However, I suggest to keep the definition of extremes and error bars.
Fig. 3: I suggest to use consistent color for models using the same inputs. For instance, CNN using SST-only in yellow boxplot and LR using SST-only with a yellow triangle…
Fig. 3: What is shown here? Evaluation of the model when using the test simulation with the six other simulations used for training and validation?
L246-251: The authors explained L226-232 that CESM data was only used for evaluation, why then the hyperparameter are modified using CESM data?
L248 : ‘’the first 430 years are discarded to remove transient effects’’ Define ‘’transient effects’’.
L263-268 : ‘’ Despite these measures, some variability remains in the validation result’’ and ‘’ In what follows, we present results obtained with the best-performing configuration on the CESM test set (last 880 years).‘’ the large variability in the results obtained suggest that there is a large uncertainty in the inference. I suggest to quantify the related uncertainty.
Fig 4 : Where is the training data in panel (a)? How is the tipping detected in panel (a)?
L277 : here 500 trials are mentioned while 50 trials are mentioned at L255? Why 500 ? Clarify the purpose of such a large number of trials.
L344-347 : here and more generally in this subsection, can the authors refer to the figure / panel for each statement.
L374 : ‘’SV relevance maps’’ Explain the acronym SV.
L390 : ‘’ the magnitude is quantified using Sen’s slope estimator.’’ Justify the choice of Sen’s slope estimator over a linear trend.
L409-410: ‘’ First, we note that patterns are consistent with those reported by Stouffer et al. (2006), who conducted an inter-comparison of several EMICs, including an earlier version of the CLIMBER-X model used here. ‘’ Update the discussion to include more recent papers for context.
Section 4.2 and 4.3 : refer ro figure panels when describing results to improve readability. Maybe focus on only one freshwater forcing, and show the sensitivity or the SST, SSS and S35S for CLIBBER-X only.
The relevance score in Figs. 6, 7, 8, 9 and 10 are quite noisy and suggest that the CNN may not be learning physically consistent features. Discuss the implication for the skill of the CNN. Can it be linked to the large variability obtained?
L600 : one reference is missing.