the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Quantitative Comparison of Causal Inference Methods for Climate Tipping Points
Abstract. Causal inference methods present a statistical approach to the analysis and reconstruction of dynamic systems as observed in nature or in experiments. Climate tipping points are likely present in several core components of the Earth system, such as the Greenland ice sheet or the Atlantic Meridional Overturning Circulation (AMOC), and are characterized by an abrupt and irreversible degradation under sustained global temperatures above their corresponding thresholds. Causal inference methods may provide a promising way to study the interactions of climate tipping elements, which are currently highly uncertain due to limitations in model-based approaches. However, the data-driven analysis of climate tipping elements presents several challenges, e.g., with regard to nonlinearity, delayed effects and confoundedness. In this study, we quantify the accuracy of three commonly used multivariate causal inference methods with regard to these challenges and find unique advantages of each method: The Liang–Kleeman Information Flow is preferable in simple settings with limited data availability, the Peter–Clark Momentary Conditional Independence (PCMCI) provides the most control, e.g., to integrate expert knowledge, and the Granger Causality for State Space Models is advantageous for large datasets and delayed interactions. In general, data sampling intervals should be aligned with the interaction delays, and the inclusion of a confounder (like global temperatures) is crucial to deal with the nonlinear response to (climate) forcing. Based on these findings and given their data masking capabilities, we apply the LKIF and PCMCI methods to reanalysis data to detect tipping point interactions between the AMOC and Arctic summer sea ice, which imply a bidirectional stabilizing interaction, in agreement with physical mechanisms. Our results therefore contribute robust evidence to the study of interactions of the AMOC and the cryosphere.
- Preprint
(3256 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 12 Feb 2026)
- RC1: 'Review of egusphere-2025-6258', Anonymous Referee #1, 12 Jan 2026 reply
-
RC2: 'Comment on egusphere-2025-6258', Anonymous Referee #2, 19 Jan 2026
reply
The authors compare multiple causal inference methods for their applicability to climate tipping points. They do this by applying all methods to different datasets generated from a network of nodes with tipping dynamics and comparing the dependence of the results on certain data parameters. Next, they apply two of the three methods to climate data to study the interaction between the AMOC and Arctic summer sea ice.
This is an important paper to establish which methods are most suitable for which types of data. However, since the results may depend on the methodological choice made, these need to be discussed more clearly and argued for thoroughly. My main comments are given below, and I attach a pdf with my specific comments.
Main comments:
- Please mention the data parameters you vary more clearly and discuss as to why these are relevant. Also, the description of the generation of the data can be improved (e.g. we link variables through a network, where at each node variability is modelled by SDE), where I suggest including the delay and explicit time dependence directly. The networks given in the appendix need to be justified up to some level, and some discussion as to how sensitive the results are to the choices made would be valuable. Furthermore, how sensitive are your results to parameters like noise size? It would be valuable to discuss the limitation of the chosen parameter settings and take those into account for the recommendations. And please specify earlier whether or not some variables undergo tipping in the time series you consider.
- The discussion and explanation of the three methods need to be improved. Currently particularly LKIF and GCSS are not clear to me, with e.g. the equations given not explained (what are X, A, …?). It would be valuable to discuss the choices that can be made when implementing this model and how much the results depend on this.
- There is great value in including a case study, however I have two concerns relating the discussed example.
- Firstly, temperature is included as a potential common driver, but then the causal interaction is found to go the other direction with multiple lags. Have you checked the results when excluding temperature and the sensitivity of the results to method parameters? And can you discuss the physical realism of the network, as it states that AMOC is the main drivers of sea ice with temperature only following? Also, please clarify in the main text that you have considered other temperature datasets.
- And secondly, it would be valuable to include an example where GCSS can be used, e.g. using climate model data. This allows for a more robust comparison and verification whether GCSS does as well for real data. Furthermore, in your recommendations PCMCI comes out worst, while here you deem it more reliable. Can you reconcile these conclusions?
In addition to the above comments, it would be valuable to check the structure of the paper, specifically:
- In the introduction you go causal – tipping elements – causal, while I think tipping – causal would be more insightful.
- Throughout the paper there is a lot of repetition. Up to some level this is ok, but some parts are mentioned over four times. I suggest reducing repetition throughout. E.g. in the introduction you already discuss a lot of results, which I suggest leaving for later and instead focus reasoning why this is a valuable exercise.
- Lots of paragraphs consist of only one sentence, which I found distracting while reading. Please check for coherent paragraphs.
- The order in which you explain the methods is opposite to what you show in the results figures. I suggest making this consistent.
Model code and software
Quantitative Comparison of Causal Inference Methods for Climate Tipping Points (Software) Niki Lohmann and Nico Wunderling https://doi.org/10.5281/zenodo.17864597
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 209 | 144 | 16 | 369 | 72 | 71 |
- HTML: 209
- PDF: 144
- XML: 16
- Total: 369
- BibTeX: 72
- EndNote: 71
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review: Quantitative Comparison of Causal Inference Methods for Climate Tipping Points
General Comments:
In this work, the authors conduct a quantitative investigation into the reliability and robustness of three multivariate causal inference methods:
This is done in the context of studying the interactions of climate tipping elements in various facets of the Earth system which pose specific operational challenges. Through the quantitative metric of choice (Matthews Correlation Coefficient; MCC), the authors showcase unique advantages for each method, while also identifying three general principles for addressing nonlinear responses, delayed effects, and confounders during the application of these causal methods to climate tipping points. The use of MCC is natural and justified, as it considers balanced ratios of the confusion matrix in binary classification.
Following a preliminary study on synthetic data generated by a network of differential equations, they apply LKIF and PCMCI, based on their recommendations, on reanalysis data to detect tipping point interactions between Atlantic Meridional Overturning Circulation (AMOC) and Arctic summer sea ice (ASSI), confirming established physical mechanisms (bidirectional stabilizing interactions) beyond confounding influences (Arctic temperatures).
This study is a welcome addition to both the climate tipping literature and to the causal inference community. The structure of the paper is sound, transitioning from a synthetic-data investigation to a realistic application to climate tipping point interactions between AMOC and ASSI. Physically consistent results are derived in the latter study, both in terms of state-space causality and temporal causal influence regions, by applying the recommendations derived from the former experiment.
My general assessment is that this is a well-written paper overall, with the authors presenting their methodology and results succinctly and clearly, which should be of interest to the relevant researchers. The results are put in context, well interpreted, and presented without drawing strong conclusions. This work fits into the scientific scope of NPG. My recommendation is that it can be published to NPG following some major revisions and clarifications, as well as some minor corrections and adjustments.
Specific Comments:
(Format: p.##, l.## - Page number, line number | Section/Appendix/Figure/Table ##)
p.2, l.32—44 – In terms of references, the authors appropriately cite most relevant works in the associated fields throughout the manuscript. But, while the authors succinctly explain climate tipping points and provide constructive and relevant examples here, I would recommend that they note, either implicitly or explicitly, how they essentially describe bifurcation-generated tippings here (with a hint towards rate-induced tips when referring to effects across time scales), with other regime-switching driving mechanisms (internal variability and rate-limited tipping) also being possible [1].
[1] https://arxiv.org/abs/1103.0169
p.7, l.178—179 – I would recommend elaborating a bit more on the details of how the time lag analysis is calculated in LKIF (“Time lag analysis is implemented by shifting any single input time series by a given number of time steps.”) and whether the adopted approach is operationally consistent with the approaches in PCMCI and GCSS. These details can be added in the appendix if preferred.
Section 2.3 – I would recommend adding a very brief paragraph here providing an interpretation of MCC and its values for people not familiar with the metric (e.g., maximum values and zero values, relation to the chi-squared statistic or other scores for intuition, etc.), which would also help with the self-containedness of the work.
p. 9, l. 223—227 (and p.21, l.481—483 by extension) – As a quick clarification, how are the results affected by a different heuristic choice of the time step Δt to account for causal delays? Specifically, how are the results in Panel (b), Figure 2 affected by implicitly changing the signal-to-noise ratio of each relation? A quick note here would also help with empirically elaborating on Recommendation 1 in Section 4.
p.10, l. 239—240 – Indeed, since GCSS assesses causal relationships by projecting the latent state process (cause) onto the space spanned by the infinite past of the observation variables with and without the effect, it provides higher explanatory power with the autoregressive part resolving the presence of time lags. Elaborating a bit more on this argument will make the justification slightly more rigorous.
p. 14, l. 325 – A quick note on why GCSS cannot be implemented here in a straightforward manner due to implementation details would be welcome (the small number of samples available for this experiment is also a valid limitation, based on the results of Fig. 2a). I do note the additional clarification in p. 17, l. 412—413, which is more than enough, but it does come towards the end of the case study.
p.14, l. 341—343 – Is the choice of including cells above the 66th percentile based on a specific heuristic in the associated literature? How sensitive are the results to this choice?
Figure 5 and p.15—16, l. 369—389 – I would recommend a clarification of the results in Figure 5 and the associated text: The colour of the causal arrow from ASSI to AMOC indicates a stabilizing/negative effect at one month delay but at five months delay there’s a destabilizing/positive link instead. While that is the weaker link, as the text notes, adding two arrows that are independently coloured instead of single one that is coloured with respect to the stronger link would remove any ambiguity or confusion. I would recommend the same for the causal effect from ASSI to the Arctic temperatures (adding three arrows). That way, the coefficient of each link can also be superimposed next to the corresponding arrow for clarity. If the authors choose to implement these changes, they can also apply them to Figure F1 for consistency.
Table C1 – For the synthetic data experiments, has a larger (or non-uniform) noise level been tested (but not too large as to break the required assumption of stationarity through the linear couplings)? Also, I would recommend adding the variable next to the parameter name in the first column, which would also clarify the use of uniform coupling strengths (not considering the choice of sign). E.g., “Noise scale (σ)”.
Appendices D, E, and F – Just wanted to note that these are a very nice addition to the text, further illuminating the implementation intricacies behind the causal methods utilized in terms of different variants, operational complexity, and robustness to observational noise and different reanalysis approaches, respectively.
Technical Corrections:
(Format: p.#, l.# - Page number, line number)