the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evolution of causal relationships under climate change: controls of Net Primary Productivity in the North Altantic Subpolar Gyre
Abstract. Understanding how climate change affects marine primary productivity requires examining the evolving causal relationships between physical and biogeochemical processes. We applied the PCMCI+ causal discovery algorithm to investigate how the mechanisms controlling Net Primary Productivity (NPP) in the North Atlantic Subpolar Gyre evolve under different climate scenarios across five Earth System Models. Using 100-year sliding windows, we compare causal relationships in future scenarios against pre-industrial conditions, focusing on the roles of mixed layer nutrients, vertical mixing and horizontal transport. Our analysis reveals three main categories of relationship evolution: the emergence of links, the disappearance of links, and changes in link strengths. For example, while the link between stratification and NPP emerges under climate change in CanESM5-CanOE, it strengthens in CMCC-ESM2 and remains stable with moderate to high intensities in other models. At the end of the 21st century, the spread between models regarding the effect of stratification on NPP is reduced compared to pre- industrial conditions, suggesting a reduction in inter-model uncertainty. However, the transport and vertical mixing controls on the supply of nutrients to the mixed layer exhibit a more diverse evolution among the ESMs studied. The CMCC-ESM2 model has a strengthening of the relationships between winter vertical mixing and nutrients, while IPSL-CM6A-LR and CanESM5CanOE show weakening of these relationships. Furthermore, the evolution of the link between nutrient supply to the mixed layer for NPP exhibits a large variability between models. These divergent pathways reveal that the dynamics of nutrients has uncertain evolution between models. Lastly, model-specific dynamics are also observed, such as the strengthening of the link between horizontal transport and the nutrient content of the mixed layer in IPSL-CM6A-LR. Together with the decreasing strength of the vertical mixing/nutrients link, this suggests the presence of compensation mechanisms and a shift from vertical mixing dominance to enhanced horizontal transport control over the course of the scenario. These findings offer mechanistic insights into the dynamics of ESMs, specifically in the evolving relationships between physical and biogeochemical processes that shape the projections of NPP and nutrients. The causality-based approach identifies mechanisms that traditional analyses miss, offering a novel framework for model intercomparison and understanding ecosystem responses to climate change.
- Preprint
(1518 KB) - Metadata XML
-
Supplement
(1146 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-4680', Anonymous Referee #1, 11 Dec 2025
-
AC1: 'Reply on RC1', Germain Bénard, 27 Jan 2026
Response to Reviewer Comments
We thank the reviewer for their constructive evaluation of our manuscript. Below, we address each comment in detail.
Major Methodological Comments
"This paper aims to investigate the evolution of mechanisms that control NPP in the North Atlantic Subpolar Gyre under three emission scenarios for five ESMs using causal analysis in a sliding window approach. It builds on a previous study by the same authors in which model differences of said mechanisms were analysed with causal analysis on piControl runs, i.e. under absence of anthropogenic forcing.
The paper is well-structured and the question of causal stationarity of drivers of North Atlantic NPP under global warming in different ESMs is relevant and interesting. The sliding window approach is sensible to address this question, with possible shortcomings discussed, and the findings, although limited to a small number of ESMs, give insights into inter-model differences and non-stationarities in the relationships between physical and biogeochemical oceanic variables.
The Discussion section is well written, the Limitations as well as the advantages of the presented approach are contextualised appropriately, and the relationship to other methodologies such as emergent constraints are interesting and timely. The Conclusion offers a good summary and outlook."
Comment 1:
"According to your Methods description, your causal analysis consists in applying the causal discovery algorithm PCMCI+. To my knowledge, causal discovery is only really suited to obtain the qualitative causal model; the actual causal effect strength needs to be estimated in a subsequent causal effect estimation (see Runge et al. Nat Comms 2015). It is unclear to me, whether you actually perform such a causal effect estimation or whether you simply use the coefficients of the causal discovery / PCMCI+ algorithm as values to describe causal effect strength."
Response:
We appreciate the reviewer raising this important methodological point. We acknowledge that, in principle, a separate causal effect estimation step (e.g., using LinearMediation in Tigramite) would provide more robust path coefficients derived through linear regression of each variable on all its parent variables.
However, our primary objective is to observe and track the evolution of causal relationships under climate change, rather than to precisely quantify the exact value of each causal link. In this context, implementing a formal causal effect estimation presents several challenges. First, our analysis specifically focuses on tracking the emergence and disappearance of causal links across sliding windows. Links that are significant in some windows may become non-significant in others. The formal causal effect estimation procedures assume causal sufficiency—that all relevant confounders are included and that the causal structure is stable. This assumption cannot be verified across all time steps for links that appear and disappear throughout the analysis period.
Second, our causal networks contain many contemporaneous links, which are formally undirected. While we can often orient these links through the construction of our signals (e.g., nutrients considered in March and NPP considered in May, implying nutrient → NPP) or through expert knowledge, there remain cases where the direction of the link between two variables is difficult to establish with assurance (using only PCMCI+ output). This ambiguity further complicates the application of formal causal effect estimation methods.
By using the cross-MCI coefficients from PCMCI+, even when fewer conditional independence tests are performed for non-significant links, we obtain a consistent measure that allows us to track the evolution of relationship strengths, including periods where causality attenuates or disappears. This approach provides a coherent picture of how biogeochemical interactions evolve under climate change, which aligns with our objective of observing evolution rather than establishing precise quantitative estimates.
We will add a clarification in the Perspectives that for studies aiming to establish a precise causal scheme rather than tracking evolution, the causal effect estimation step would be recommended:
"We use the cross-MCI coefficients from PCMCI+ as indicators of causal link strength. Our objective is to track the evolution of causal relationships rather than to precisely quantify causal effects. Formal causal effect estimation methods (e.g., LinearMediation, Runge et al. 2015) could provide more robust path coefficients but their application is complicated by the non-stationary nature of our analysis: links emerge and disappear across sliding windows. Additionally, the prevalence of contemporaneous links, which are formally undirected, introduces ambiguity that we resolve through signal construction and expert knowledge, though some uncertainty remains. For future studies aiming to establish precise causal schemes rather than tracking temporal evolution, formal causal effect estimation would be recommended."
Comment 2:
"Your Methods are very brief on how you determine statistical significance of the Links and it is a bit unclear to me; significance of Links vs significance of changes in Links are also not clearly discernible to me. This makes it difficult for me to interpret Tables 1-3, and the lack of significance of the vast majority of points in Figure 3. Further, are the Intensity-values in the tables obtained as averages of all Link strengths or do you leave the non-significant links out of the calculation, treating them as having disappeared? In Figure 1, it is unclear to me what you mean with 'the average minimum significant strength' of 0.31 - was this measured across the different links (some of which may be considerably stronger than others) and what is the reasoning behind using this as a threshold for all? And once more, those links that were not marked as significant, but still shown in the Figure, how would you interpret those?"
Response:
We thank the reviewer for highlighting this lack of clarity. We will expand the Methods section to address these points. When referring to the significance of individual links, we mean the PCMCI+ significance as determined by the algorithm's False Discovery Rate control. We propose to use the term "PCMCI+ significance" consistently throughout the manuscript to avoid confusion. In contrast, the significance of changes in links refers to whether link strengths have changed from pre-industrial conditions, which we assess using two complementary tests: the Cramér-von Mises test for distributional changes and t-tests for shifts in mean values. Additionally, we compare scenario values to the pre-industrial range (minimum to maximum observed).
The intensity values reported in the tables are computed as averages over all 35 sliding windows at the end of the scenario period, including non-significant links. We chose this approach because non-significant links still have estimated strength values that contribute to understanding the overall evolution. However, we report the number of PCMCI+ significant points separately in the "Sig." column to allow readers to assess robustness.
The 0.31 threshold represents the average of the minimum PCMCI+ significant link strengths detected across all runs (i.e., across all sliding windows and all links). For each window, we recorded the weakest link that was still statistically significant; 0.31 is the mean of these values. We use this as a display threshold in Figure 1 to show only links that are, on average, strong enough to be detected as significant. We acknowledge this threshold aggregates across different link types, and we will clarify this reasoning in the caption: "The 0.31 threshold represents the average of the minimum PCMCI+ significant link strengths detected across all runs".
Links shown but not marked as significant should be interpreted cautiously—they represent estimated relationships that did not pass the FDR correction, possibly due to insufficient signal-to-noise ratio or genuine absence of causal effect. We include them in figures to show the complete evolution, but readers should focus on PCMCI+ significant links for robust conclusions.
Comment 3:
"I would like to positively highlight the sensitivity tests you performed. The application of the sliding window analysis to the piControl data is important as a reference. Here I noticed that the spread of link strengths you find in the unforced data is considerable in Figures 2 and 3. In case of the link between stratification and NPP even the sign of the causal effect in the piControl runs is not consistent across windows. It would be good to address this in the Discussion and think through potential reasons and implications."
Response:
We thank the reviewer for this constructive observation. The considerable spread in pre-industrial link strengths, including sign changes for the stratification-NPP link, reflects a combination of internal climate variability and methodological factors.
A key factor explaining this spread is that these links are predominantly non-significant in the pre-industrial simulations. When links are non-significant, fewer MCI tests are performed by PCMCI+, making the estimated strength values more susceptible to variation. Consequently, interpreting the detailed structure of this pre-industrial range is difficult. The main information to retain from these results is that in one case the link strength fluctuates around 0 and remains non-significant, while in the other case it fluctuates around 0.1 and is also predominantly non-significant. This establishes a baseline against which human-induced climate changes must be compared: only changes in link strengths that clearly exceed this natural variability range and become PCMCI+ significant can be confidently attributed to anthropogenic forcing.
We propose adding to the Discussion:
"The considerable spread in link strengths observed in pre-industrial simulations, including sign reversals for the stratification-NPP relationship, reflects both internal climate variability and methodological factors. When links are non-significant, fewer conditional independence tests are performed, making strength estimates more variable. The key information from these pre-industrial results is that the links fluctuate around zero or around a small positive/negative value, and that they are predominantly non-significant."
Comment 4:
"Your tables contain a lot of information and are generally difficult to read. Each Table would benefit from a title that makes it clear at first sight which link it describes, this should then easily be identifiable in the according Figures of the causal graphs. It is unclear to me why you provide nutrient-resolved information in Tables 1 and 3, but flatten the nutrient-level information in Table 2 and instead provide ranges. This inconsistency is a bit confusing. In general, I would encourage rethinking which information here is really the most important and how you can make the main takeaways in terms of changes in links under the different scenarios easily accessible to the reader; possibly you could move the full tables to the Supplementary and provide simplified versions for the main text."
Response:
We appreciate this feedback on table clarity. We will implement the following improvements. First, each table will receive a clear title indicating the causal link it describes: Table 1 will be titled "Nutrient → NPP relationships", Table 2 "Stratification → NPP relationships", and Table 3 "MLD → Nutrient relationships".
Regarding the harmonization of Table 2, to avoid confusion we will report a single value representing the mean across nutrients, and we will specify this clearly in the caption. This approach maintains correctness while avoiding potential confusion from the range presentation.
Rather than creating simplified summary tables, we will improve readability of the existing tables by using descriptive terms alongside numerical values. For instance, we will indicate "mostly significant (32/35)" or "outside pre-industrial range (28/35)" to make the main takeaways immediately accessible to readers while preserving the detailed numerical information.
Comment 5:
"Figures 2+3: Please add model and nutrient to the Figure, for example as a title, and please include the full link, not only one variable, so it is clear at first sight what the respective Figure is describing. There are no networks shown (neither in the main text nor in the Supplement) that contain the variable 'no3'. This would be good to add. Further, it would be good to visually set aside also the points that are identical between all three rows, i.e. that span the historical runs only, like you did with the last 35 that only span scenarios. I also suggest adding a panel that shows GMT over time for the three scenarios as a reference."
Response:
We thank the reviewer for these practical suggestions to improve figure clarity. We will enhance all figure panel titles to include the model name, nutrient, and full link description (e.g., "CanESM5-CanOE: NO₃ → NPP" instead of just "no3 - SSP126"). We will also add causal network figures containing the NO₃ variable to the Supplementary Materials.
To distinguish the historical period from scenario-specific points, we will add a vertical line marking the transition between the period common to all scenarios and the diverging scenario periods. This visual marker will clearly delineate where the three scenarios begin to differ.
Regarding the suggestion to add a GMT panel, we have decided not to include temperature evolution alongside link strength evolution for readability reasons. Displaying the evolution of a variable together with the evolution of a link between two variables on the same figure risks creating confusion, as these represent fundamentally different quantities. We believe the vertical line marking scenario divergence will provide sufficient temporal reference.
Comment 6:
"Your causal networks consist of contemporaneous and, hence, undirected links; when discussing your results, however, you do describe causal influences in a directional manner. I assume that this interpretation of the links as oriented is based on expert knowledge; it would benefit the paper if you described the reasoning behind this. The same holds for the choice of timescales of the analysis; if all links you find are contemporaneous, this suggests that the time resolution chosen might be too coarse to resolve lagged links, which would more finely describe the actual physical processes. What was your reasoning behind this choice and might there be potential for more fine-grained analysis in the future?"
Response:
The reviewer raises an important point about the interpretation of contemporaneous links. We will clarify both issues in the revised manuscript. The directionality of contemporaneous links is primarily established through the construction of our signals: for instance, nutrients are considered in March while NPP is considered in May, which by construction implies a nutrient → NPP direction. This temporal ordering in our data processing naturally orients many links. This interpretation is then confirmed and complemented by expert knowledge from physical oceanography: stratification constrains vertical nutrient supply affecting NPP, MLD controls nutrient entrainment depth, and nutrients limit phytoplankton growth.
Regarding timescale choice, our annual resolution was specifically selected to study interannual variability, which is the focus of this work. This timescale captures the dominant interannual variability of spring blooms and winter preconditioning in the subpolar North Atlantic. Different research questions would call for different temporal resolutions: sub-annual (monthly) analysis could reveal lagged relationships (e.g. arising from ocean circulation), while decadal analysis could address longer-term variability patterns (e.g. AMOC variability). Each timescale would illuminate different aspects of the physical-biogeochemical system. We acknowledge that our annual resolution may be too coarse to resolve the sequential chain of processes, and we will discuss this as an avenue for future research.
We propose adding to the Methods:
"Contemporaneous links in PCMCI+ are formally undirected. We orient them primarily through signal construction: nutrients are measured in March and NPP in May, establishing by design a “nutrient → NPP” direction. This temporal ordering is confirmed by expert knowledge. The annual timescale was chosen to study interannual variability."
Comment 7:
"Throughout, please unify nomenclature: You use 'Intensity', 'Link intensity', and 'Link strength' interchangeably as far as I can tell. In Figure captions, please make sure to always add the abbreviations of variables that are used in the Figures to the variable names."
Response:
We will standardize the terminology throughout the manuscript, using "link strength" consistently, and ensure all figure captions include complete variable abbreviation legends. We will perform a thorough review of the manuscript to eliminate inconsistencies.
Comment 8:
"For the sake of reproducibility, please provide your code and the necessary aggregated time series to run it in a public repository."
Response:
We will create a public repository on github containing the analysis code used for PCMCI+ application and sliding window analysis, aggregated time series data for each model and scenario, and scripts to reproduce the main figures. The repository link will be added to the Code and Data Availability section.
Minor CommentsL 7-9:
"I suggest to re-formulate this sentence from 'Our analysis reveals three main categories of relationship evolution' to 'Our analysis detects three main types of relationship evolution'."
Response: We will adopt this suggestion. The revised sentence will read: "Our analysis detects three main types of relationship evolution: the emergence of links, the disappearance of links, and changes in link strengths."
Section 2.3:
"Please add a short description of the piControl runs, specifically also their respective length."
Response: We will add the following information to Section 2.3:
"Pre-industrial control (piControl) simulations represent stable climate conditions without anthropogenic forcing. The piControl data used in this study correspond to the last 500 years of each model run."
L 187-188:
"'the proportion of points outside the pre-industrial distribution' would correctly be 'the number of points…'; you could consider alternatively giving the proportion in %"
Response: We will correct this to "the number of points" and add percentages where appropriate for easier interpretation (e.g., "19/35 points (54%) outside the pre-industrial distribution").
L 345-347:
"Do I understand correctly that the external forcing narrows the uncertainty of this link? If this interpretation is accurate, it might make sense to specify this here, acknowledging that future ocean productivity might once more behave differently under emission scenarios with less forcing, such as Zero Emission Commitment scenarios, which could be interesting for future analysis."
Response: Yes, the reviewer's interpretation is correct. We will expand this section to clarify:
"We can suppose that this convergence may not persist under scenarios with reduced or reversed forcing, such as Zero Emission Commitment scenarios, where the stratification-NPP relationship might have greater inter-model variability."
L 409-411: Observational Data Availability
"As your final conclusion, you present the suggestion to conduct similar analysis on observational data, also stating that data availability may be limited. I agree that the study of observations is always desirable - however, often not possible for the reason you describe. If you could add a sentence or two on the actual situation, i.e. what variables would actually be available, how long and in what time resolution those records are available etc. and if you can derive any recommendations for measurement series from your study design and findings, that would be helpful."
Response: We acknowledge the importance of having information on data availability for future studies. However, the suggested compilation would be a study on its own and is beyond the scope of the present one.
We trust these responses and proposed revisions address the reviewer's concerns. We are grateful for her/his careful reading and constructive feedback, which will improve the manuscript.
Citation: https://doi.org/10.5194/egusphere-2025-4680-AC1
-
AC1: 'Reply on RC1', Germain Bénard, 27 Jan 2026
-
RC2: 'Comment on egusphere-2025-4680', Anonymous Referee #2, 12 Dec 2025
The paper “Evolution of causal relationships under climate change: controls of Net Primary Productivity in the North Atlantic Subpolar Gyre” by Bénard et al. uses a causal discovery algorithm to investigate how causal drivers of the Net Primary Productivity in the North Atlantic Subpolar Gyre evolve with time and especially under climate change. It compares causal graphs and causal effects between pre industrial and future time periods, under three different emission scenarios and across five Earth System Models.
The paper is well-written and clearly structured. The findings (e.g., large variability between models for some links) support the relevance of the scientific question. The limitations and discussion section effectively highlights the significance and implications of the results.
I would first like to raise a few points, which are as follows:- I could not find a link to the code used for the analysis, only to the Tigramite repository. For reproducibility, it would be helpful to have access to this code.
- It seems like the causal strength/intensity you are referring to is the value of the causal effect (l71: “PCMCI+ enables the quantification of causal relationships”). However, this is not what PCMCI(+) outputs with that color bar, but rather the cross-MCI value, which is a test statistic that sort of represents the algorithm's confidence that the link exists, but this is limited by the quality of the algorithm (see https://github.com/jakobrunge/tigramite/blob/master/tutorials/causal_discovery/tigramite_tutorial_pcmciplus.ipynb). It is not directly comparable between links because some links undergo more tests than others. To determine the value of a causal effect, one needs to perform causal effect estimation with a known graph (here it would be the one obtained using PCMCI+) (https://github.com/jakobrunge/tigramite/blob/master/tutorials/causal_effect_estimation/).
- Only stationarity is discussed as an assumption required for the use of PCMCI+. A (short) description of other assumptions and of the conditional independence test used would be valuable (causal sufficiency, faithfulness, (non)linearity of relationships influencing the choice of test, etc.), as these can influence the results.
- On Fig1 and the corresponding supplementary figures, the meaning of the double arrows ‘↔’ is not given, though it seems important. Does it mean that the algorithm found effects from X to Y and from Y to X? The lags then might differ? I strongly recommend not using double arrows in that case because in causal graphs they are usually used to represent hidden confounding. Moreover, the lags of causal effects are not discussed nor appear on the figures. This might be intentional, but should then be justified as it is also a parameter that could change with climate change, and because the defined maximum lag (3 years) sets a limit beyond which an effect can disappear from the analysis (but could for example exist at a longer lag).
- A more extensive description of how you assess the significance of links would be very helpful. You say at l163 “The statistical significance of changes is assessed using both the Cramer-von Mises test (Anderson, 1962) for distributional changes and t-tests (Kim, 2015) for shifts in mean values.” Could you link this sentence to the figures? What are you measuring with distributional changes as opposed to shift in mean values? For example on Fig2 I am not sure to understand what the boxes with values of the statistic and pvalue refer to. You might be referring to the distributional change between the grey and red histograms. It could help the reader to have a description of what these values are and how they are computed in the caption.
I also listed some more specific comments:
- l75 It is not clear to me what you refer to with “within a predefined conceptual scheme linking the physical and biochemicals variables”. You could clarify or reformulate.
- l84 “base on Granger causality” is ambiguous, I would suggest to reformulate the sentence as: “PCMCI+ is a causal discovery algorithm for time series that relies on conditional independence tests grounded in Granger-causal principles. It identifies both contemporaneous and time-lagged relationships between multiple variables.”
- l88 Suggestion for something a bit more accurate to describe PCMCI+: “The algorithm proceeds in two main steps: (1) the PC (Peter & Clark) step, which iteratively tests conditional independence between variables to eliminate spurious links and obtain a set of candidate parent variables; and (2) the MCI (Momentary Conditional Independence) step, which evaluates each candidate link by testing conditional independence while conditioning on the parents of the target variable and a relevant subset of the parents of the driver.”
- l140 “The graphs of causal relationships will be explored separately for each nutrient (nitrate, dissolved iron, silicate and phosphate).” Why do you choose not to include all nutrients in the same causal graph? Depending on the relationships between these nutrients, one could identify effects that are actually mediated effects. The comparison between nutrients is then not straightforward. It may nonetheless be justifiable to separate nutrients depending on expert knowledge. It would be good to explain the rationale behind this choice in more detail.
- l157 “We examined the performance of PCMCI+ according to the length of time-series with auto-regressive vectors.” What do you mean here? Did you simulate some data and applied PCMCI+ using different window lengths in your sensitivity analysis? If so, I would suggest to do a sensitivity analysis directly on the data used for the study, as it might have different characteristics than those of simulated auto-regressive vectors and the results might strongly depend on the window size.
- l159 I don’t understand what you mean with 'estimation errors in link strength', especially what the ground truth is (i.e., what the error is measured against). You could add a short explanation.
- l162 It would be very valuable to describe the pre-industrial control simulation, especially the years covered. For example, how do you obtain at l. 175 “one causal graph obtained under pre-industrial conditions” and “the mean causal graph representative for each SSP scenario”?
- l176 “ …the mean causal graphs representative for each SSP scenario. The latter are derived from the 35 last sliding windows”. I would suggest to add a slightly more extensive explanation on how the “mean causal graph” is obtained. Are ”link strengths” simply averaged over each graph?
- It would be clearer to use one terminology throughout the paper for the quantification of causal effects (and not strength, intensity, etc). Although at present it is not the value of the causal effect that is derived (see general comments), if you later perform causal effect estimation, I recommend defining the causal effect in your setting (see e.g. https://github.com/jakobrunge/tigramite/blob/master/tutorials/causal_effect_estimation/tigramite_tutorial_general_causal_effect_analysis.ipynb) and choosing only one word to refer to it.
And finally, some minor technical corrections:
- typo in the title for Atlantic
- l176 causal graphs —> causal graph
- Fig1 The arrows are a little too small (especially arrow heads).
- l252 5x10-3 —> 5x10-3; same l256 to be consistent with the rest of the text
- l355 point missing after ‘framework’
Citation: https://doi.org/10.5194/egusphere-2025-4680-RC2 -
AC2: 'Reply on RC2', Germain Bénard, 27 Jan 2026
Response to Reviewer Comments
We thank the reviewer for their constructive feedback on our manuscript. Their comments have helped us identify important clarifications needed to improve the paper. Below, we address each point in detail.
Main Comments
"The paper “Evolution of causal relationships under climate change: controls of Net Primary Productivity in the North Atlantic Subpolar Gyre” by Bénard et al. uses a causal discovery algorithm to investigate how causal drivers of the Net Primary Productivity in the North Atlantic Subpolar Gyre evolve with time and especially under climate change. It compares causal graphs and causal effects between pre industrial and future time periods, under three different emission scenarios and across five Earth System Models.
The paper is well-written and clearly structured. The findings (e.g., large variability between models for some links) support the relevance of the scientific question. The limitations and discussion section effectively highlights the significance and implications of the results."
Comment 1:
"I could not find a link to the code used for the analysis, only to the Tigramite repository. For reproducibility, it would be helpful to have access to this code."
Response:
We will create a public repository on github containing the analysis code used for PCMCI+ application and sliding window analysis, aggregated time series data for each model and scenario, and scripts to reproduce the main figures. The repository link will be added to the Code and Data Availability section.
Comment 2: Cross-MCI Value vs Causal Effect Estimation
"It seems like the causal strength/intensity you are referring to is the value of the causal effect (l71: 'PCMCI+ enables the quantification of causal relationships'). However, this is not what PCMCI(+) outputs with that color bar, but rather the cross-MCI value, which is a test statistic that sort of represents the algorithm's confidence that the link exists, but this is limited by the quality of the algorithm. It is not directly comparable between links because some links undergo more tests than others. To determine the value of a causal effect, one needs to perform causal effect estimation with a known graph."
Response:
We thank the reviewer for this important clarification. We acknowledge that we are indeed using cross-MCI values rather than formally estimated causal effects, and we will correct the terminology.
Our primary objective is to observe and track the evolution of causal relationships under climate change, rather than to precisely quantify the exact value of each causal effect. In this context, implementing formal causal effect estimation presents several challenges. First, our analysis focuses on tracking the emergence and disappearance of causal links across sliding windows. Links that are significant in some windows may become non-significant in others. Formal causal effect estimation procedures assume causal sufficiency—that all relevant confounders are included and that the causal structure is stable—which cannot be verified across all time steps for links that appear and disappear throughout the analysis.
Second, our causal networks contain many contemporaneous links, which are formally undirected. While we orient these links through signal construction (e.g., nutrients considered in March and NPP considered in May, implying nutrients →NPP) and expert knowledge, some ambiguity remains in certain cases, not discussed in this article, complicating formal causal effect estimation.
By using the cross-MCI coefficients, even when fewer conditional independence tests are performed for non-significant links, we obtain a consistent measure to track the evolution of relationship strengths, including periods where causality attenuates or disappears. We will clarify in the Methods that we use cross-MCI values as indicators of link strength, and mention in the Perspectives that for studies aiming to establish precise causal effect estimates rather than tracking evolution, formal causal effect estimation would be recommended.
"We use the cross-MCI coefficients from PCMCI+ as indicators of causal link strength. Our objective is to track the evolution of causal relationships rather than to precisely quantify causal effects. Formal causal effect estimation methods (e.g., LinearMediation Runge et al. 2015) could provide more robust path coefficients but their application is complicated by the non-stationary nature of our analysis: links emerge and disappear across sliding windows. Additionally, the prevalence of contemporaneous links, which are formally undirected, introduces ambiguity that we resolve through signal construction and expert knowledge, though some uncertainty remains. For future studies aiming to establish precise causal schemes rather than tracking temporal evolution, formal causal effect estimation would be recommended."
Comment 3:
"Only stationarity is discussed as an assumption required for the use of PCMCI+. A (short) description of other assumptions and of the conditional independence test used would be valuable (causal sufficiency, faithfulness, (non)linearity of relationships influencing the choice of test, etc.), as these can influence the results."
Response:
We agree that a more complete description of PCMCI+ assumptions would be valuable, especially on the linearity and the choice of test. We will add a paragraph to the Methods section describing the choice of conditional independence test. We use ParCorr (partial correlation), which assumes linear relationships between variables. This choice is appropriate for our study given the focus on large-scale climate-biogeochemical interactions where linear approximations are commonly used, though we acknowledge that non-linear relationships may exist at finer scales:
"We use the ParCorr (partial correlation) conditional independence test, which assumes linear relationships between variables. This choice is appropriate for our study given the focus on large-scale climate-biogeochemical interactions at annual timescales, where linear approximations are commonly employed in climate science. We acknowledge that non-linear relationships may exist at finer spatial and temporal scales, and that alternative tests could be explored in future work."Comment 4:
"On Fig1 and the corresponding supplementary figures, the meaning of the double arrows '↔' is not given, though it seems important. Does it mean that the algorithm found effects from X to Y and from Y to X? The lags then might differ? I strongly recommend not using double arrows in that case because in causal graphs they are usually used to represent hidden confounding. Moreover, the lags of causal effects are not discussed nor appear on the figures. This might be intentional, but should then be justified as it is also a parameter that could change with climate change, and because the defined maximum lag (3 years) sets a limit beyond which an effect can disappear from the analysis."
Response:
We thank the reviewer for raising this important point. The double arrows (↔) in our figures represent contemporaneous links (lag = 0), which PCMCI+ identifies as undirected because the algorithm cannot determine directionality at the same time step. We acknowledge that this notation can be confusing given its standard use for hidden confounding in causal graphs. We will revise the figures to use a different notation for contemporaneous links, clearly distinguishing them from lagged directed links, and add an explicit explanation in the figure captions.
Concerning the lags, the vast majority of significant links in our analysis are contemporaneous (lag = 0), which is why we did not discuss lag evolution. This prevalence of contemporaneous links reflects our choice of annual resolution to study interannual variability. Sub-annual analysis could reveal lagged relationships arising from oceanic circulation. We will add a discussion of this point, noting that the maximum lag of 3 years was chosen to capture inter annual variability without decennal or intra annual variability, but that finer temporal resolution would complement the analysis of these physical-biogeochemical processes.
"The maximum lag of 3 years was chosen to focus on interannual variability while filtering out both intra-annual fluctuations and decadal oscillations. This timescale is well-suited to capture the year-to-year dynamics of spring bloom preconditioning and nutrient replenishment cycles in the subpolar North Atlantic. Complementary analyses at finer (monthly) or coarser (decadal) temporal resolutions could reveal additional dynamics and complement our study."
Comment 5:"A more extensive description of how you assess the significance of links would be very helpful. You say at l163 'The statistical significance of changes is assessed using both the Cramer-von Mises test (Anderson, 1962) for distributional changes and t-tests (Kim, 2015) for shifts in mean values.' Could you link this sentence to the figures? What are you measuring with distributional changes as opposed to shift in mean values? For example on Fig2 I am not sure to understand what the boxes with values of the statistic and pvalue refer to."
Response:
We will expand the Methods section and figure captions to clarify these points. There are two distinct types of significance in our analysis. First, PCMCI+ significance refers to whether a link is statistically significant within a given sliding window, as determined by the algorithm's False Discovery Rate control. This is indicated by green (significant) versus red (non-significant) points in Figures 2 and 3.
Second, significance of changes refers to whether the distribution of link strengths under climate scenarios differs from pre-industrial conditions. The Cramér-von Mises test assesses whether the overall distribution has changed (comparing the grey histogram representing pre-industrial values to the red histogram representing end-of-scenario values). The t-test specifically assesses whether the mean value has shifted. The boxes in the figures display the Cramér-von Mises statistic and p-value, testing whether the scenario distribution significantly differs from the pre-industrial distribution. We will add this explanation to the figure captions and Methods section.
"To assess a change in causal relationships under climate scenarios compared to pre-industrial conditions, we employ two complementary statistical tests. The Cramér-von Mises test evaluates how the overall distribution of link strengths differs between pre-industrial and scenario periods, detecting changes in shape or spread. The t-test specifically assesses whether the mean link strength has shifted. Both tests compare the distribution of link strengths from the 35 last sliding windows of each scenario against the full distribution obtained from the pre-industrial control simulation."
Specific CommentsL75:
"It is not clear to me what you refer to with 'within a predefined conceptual scheme linking the physical and biochemicals variables'. You could clarify or reformulate."
Response: We will reformulate this sentence to clarify that we selected a specific set of variables based on established oceanographic understanding of the processes controlling NPP in the subpolar North Atlantic: "Our methodological approach consists in analyzing the evolution of causal relationships among a set of physical and biogeochemical variables selected based on their known roles in controlling NPP variability in the Eastern part of the North Atlantic Subpolar Gyre."
L84:
"'based on Granger causality' is ambiguous, I would suggest to reformulate the sentence as: 'PCMCI+ is a causal discovery algorithm for time series that relies on conditional independence tests grounded in Granger-causal principles. It identifies both contemporaneous and time-lagged relationships between multiple variables.'"
Response: We thank the reviewer for this more accurate formulation and will adopt it in the revised manuscript.
L88:
"Suggestion for something a bit more accurate to describe PCMCI+: 'The algorithm proceeds in two main steps: (1) the PC (Peter & Clark) step, which iteratively tests conditional independence between variables to eliminate spurious links and obtain a set of candidate parent variables; and (2) the MCI (Momentary Conditional Independence) step, which evaluates each candidate link by testing conditional independence while conditioning on the parents of the target variable and a relevant subset of the parents of the driver.'"
Response: We appreciate this more precise description and will incorporate it into the Methods section.
L140:
"'The graphs of causal relationships will be explored separately for each nutrient (nitrate, dissolved iron, silicate and phosphate).' Why do you choose not to include all nutrients in the same causal graph? Depending on the relationships between these nutrients, one could identify effects that are actually mediated effects. The comparison between nutrients is then not straightforward. It may nonetheless be justifiable to separate nutrients depending on expert knowledge."
Response: This is a valid concern. We chose to analyze nutrients separately for several reasons. First, including all nutrients simultaneously would substantially increase the dimensionality of the problem, reducing PCMCI+ performance. Second, our focus is on understanding how each nutrient's relationship with physical drivers and NPP evolves, rather than on nutrient-nutrient interactions. Third, nutrients co-vary a lot while not being causally linked to each other in a direct mechanistic sense at the scales we analyze. Considering the three nutrients all at once “falsifies” the causal graph obtained by having strong parent variables not really causally linked.
L157:
"'We examined the performance of PCMCI+ according to the length of time-series with auto-regressive vectors.' What do you mean here? Did you simulate some data and applied PCMCI+ using different window lengths in your sensitivity analysis? If so, I would suggest to do a sensitivity analysis directly on the data used for the study."
Response: Indeed, we tested different window lengths on simulated data and not on model data as we needed the ground truth (the true causal effect within the data) to assess the performance of PCMCI+. We also tested different window lengths with model data, but we can’t assess the performance of PCMCI+. The only information that can be observed is that the minimum causal link strength is lower when we have larger windows, an information also obtained with simulated data.
L159:
"I don't understand what you mean with 'estimation errors in link strength', especially what the ground truth is (i.e., what the error is measured against)."
Response: We will reformulate this sentence to clarify our meaning. In our sensitivity analysis, we compared the cross-MCI values obtained with different window lengths against the true value of the causal link. For standardized variables (zero mean, unit standard deviation), the cross-MCI value approximates the regression coefficient of the linear relationship between variables. Our tests showed that shorter time series produce more variable cross-MCI estimates, with deviations from the reference value increasing substantially below 100 years. While we acknowledge the reviewer's point that cross-MCI values are not formally estimated causal effects, they nonetheless provide consistent and reasonably accurate estimates of linear relationship strengths for standardized data. We will clarify this in the revised text: "sensitivity analyses showed that shorter time series produce less stable cross-MCI estimates, with increased deviation from reference values obtained with longer windows, justifying our choice of 100-year windows."
L162:
"It would be very valuable to describe the pre-industrial control simulation, especially the years covered. For example, how do you obtain at l. 175 'one causal graph obtained under pre-industrial conditions' and 'the mean causal graph representative for each SSP scenario'?"
& L176:
"'…the mean causal graphs representative for each SSP scenario. The latter are derived from the 35 last sliding windows'. I would suggest to add a slightly more extensive explanation on how the 'mean causal graph' is obtained. Are 'link strengths' simply averaged over each graph?"
Response: We will add detailed information about the piControl simulations: "Pre-industrial control (piControl) simulations represent stable climate conditions without anthropogenic forcing. The piControl runs used in this study are 500 years long." The pre-industrial causal graph in Figure 1 represents the mean over all sliding windows applied to the piControl simulation, providing a reference for unforced variability.
The mean causal graph is obtained by averaging the cross-MCI values for each link across the 35 sliding windows. We will clarify this in the text: "The mean causal graph is computed by averaging the link strength for each link across the 35 last sliding windows."
Terminology consistency
"It would be clearer to use one terminology throughout the paper for the quantification of causal effects (and not strength, intensity, etc)."
Response: We agree and will standardize the terminology throughout the manuscript, using "link strength" consistently to refer to cross-MCI values. We will also clarify that these are cross-MCI values rather than formally estimated causal effects.
Minor Technical Corrections
We thank the reviewer for catching these errors and will correct them all:
- Typo in the title: "Altantic" → "Atlantic"
- L176: "causal graphs" → "causal graph"
- Fig1: We will increase the size of arrows and arrow heads for better visibility
- L252 and L256: Correct superscript formatting for 5×10⁻³
- L355: Add missing point after "framework"
We trust these responses and proposed revisions address the reviewer's concerns. We are grateful for their detailed and constructive feedback, which will significantly improve the clarity and rigor of the manuscript.
Citation: https://doi.org/10.5194/egusphere-2025-4680-AC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 236 | 68 | 32 | 336 | 44 | 18 | 15 |
- HTML: 236
- PDF: 68
- XML: 32
- Total: 336
- Supplement: 44
- BibTeX: 18
- EndNote: 15
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This paper aims to investigate the evolution of mechanisms that control NPP in the North Atlantic Subpolar Gyre under three emission scenarios for five ESMs using causal analysis in a sliding window approach. It builds on a previous study by the same authors in which model differences of said mechanisms were analysed with causal analysis on piControl runs, i.e. under absence of anthropogenic forcing.
The paper is well-structured and the question of causal stationarity of drivers of North Atlantic NPP under global warming in different ESMs is relevant and interesting. The sliding window approach is sensible to address this question, with possible shortcomings discussed, and the findings, although limited to a small number of ESMs, give insights into inter-model differences and non-stationarities in the relationships between physical and biogeochemical oceanic variables.
The Discussion section is well written, the Limitations as well as the advantages of the presented approach are contextualised appropriately, and the relationship to other methodologies such as emergent constraints are interesting and timely. The Conclusion offers a good summary and outlook.
However, I have a few general methodological concerns and questions that should be addressed:
Furthermore, please find below a few specific minor comments and suggestions:
L 7-9: I suggest to re-formulate this sentence from “Our analysis reveals three main categories of relationship evolution” to “Our analysis detects three main types of relationship evolution".
Section 2.3: Please add a short description of the piControl runs, specifically also their respective length.
L 187-188: “the proportion of points outside the pre-industrial distribution” would correctly be “the number of points…”; you could consider alternatively giving the proportion in %
L 345-347: Do I understand correctly that the external forcing narrows the uncertainty of this link? If this interpretation is accurate, it might make sense to specify this here, acknowledging that future ocean productivity might once more behave differently under emission scenarios with less forcing, such as Zero Emission Commitment scenarios, which could be interesting for future analysis.
L 409-411: As your final conclusion, you present the suggestion to conduct similar analysis on observational data, also stating that data availability may be limited. I agree that the study of observations is always desirable - however, often not possible for the reason you describe. If you could add a sentence or two on the actual situation, i.e. what variables would actually be available, how long and in what time resolution those records are available etc. and if you can derive any recommendations for measurement series from your study design and findings, that would be helpful.