Deficient ocean&ndash;atmosphere feedbacks constrain seasonal NAO prediction

Kolstad, Erik W.

doi:10.5194/egusphere-2025-5075

Preprints

https://doi.org/10.5194/egusphere-2025-5075

Preprints

17 Oct 2025

| 17 Oct 2025

Deficient ocean–atmosphere feedbacks constrain seasonal NAO prediction

Erik W. Kolstad

Abstract. As the North Atlantic Oscillation (NAO) accounts for a dominant share of wintertime weather variability across the North Atlantic basin, it is a coveted target for seasonal prediction. Yet dynamical forecast systems continue to exhibit limited skill, in part due to deficiencies in representing ocean–atmosphere feedbacks. Here, mediation analysis – a statistical framework from causal inference – is applied to identify and quantify feedback pathways linking late-autumn North Atlantic sea surface temperature (SST) anomalies to the subsequent winter NAO. This approach is attractive because it is straightforward to apply, easy to interpret, and can be used directly on observations-derived data like reanalyses without requiring idealised model perturbation experiments.

The analysis reveals a physically coherent feedback sequence. Anomalous November SST patterns promote the gradual formation of a surface-pressure dipole rotated clockwise relative to the canonical NAO structure. This dipole induces advection anomalies in the western North Atlantic, which in turn modulate surface fluxes in the Subpolar Gyre and lower-tropospheric baroclinicity in the storm-track entry region east of Newfoundland. These changes nudge the NAO, which, once established, feeds back onto the fluxes and baroclinicity, reinforcing the anomaly and sustaining the circulation pattern.

A central finding is that a state-of-the-art seasonal prediction system fails to capture these feedback mechanisms. The baroclinicity pathway, the process through which changes in eddy growth reinforce the circulation anomaly, is particularly deficient, accounting for only 2 % of the lagged SST–NAO correlation in SEAS5 compared with 44 % in the ERA5 reanalysis. This misrepresentation likely represents a fundamental barrier to improved NAO forecast skill.

More broadly, the results demonstrate the potential of mediation analysis as a diagnostic tool for disentangling coupled feedbacks directly from observations, evaluating their representation in models, and guiding targeted improvements that could enhance seasonal prediction of the NAO.

Received: 14 Oct 2025 – Discussion started: 17 Oct 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Erik W. Kolstad

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-5075', Anonymous Referee #1, 27 Oct 2025

This article analyses the relationships between N Atlantic SST, heat fluxes and the North Atlantic Oscillation in a seasonal forecast system. The questions it asks are well thought out and the study is timely, relevant and of interest to many readers of this journal. However, while it has the potential to be excellent on all counts, I have had to mark it 'fair' for scientific content as it stands, because if I understand correctly how the analysis has been done, then there is a potentially large error in the analysis method related to the use of ensemble means as explained below.
MAJOR POINTS:
A) I remain unconvinced about the use of October hindcasts. November starts are normally used for DJF forecasts so why not use the forecasts that are relevant to the problem? We should at least be reassured that November forecasts show similar, if perhaps weaker errors.
B) The analysis is novel, relevant and interesting but there is a serious flaw. The analysis is carried out entirely on ensemble means (L186) and then compared to the observations (L325, L360 and throughout). This comparison is not valid. A simple example can illustrate why: assume for example that the NAO is entirely formed from unpredictable variability or 'noise'. In this case there would be no ensemble mean signal and no regressions between the modelled variables. However, the observational analysis will still show relationships, albeit from unpredictable 'noise'. In reality the difference will be less extreme as the NAO contains predictable and unpredictable components but the presented analysis would only be valid if the NAO is formed from entirely predictable variability. Fortunately, the problem is easily corrected as it simply needs to be redone on ensemble members. I hope this can be done as I still think this has the potential to be a very useful contribution but it is essential before publication.
MINOR POINTS:
The article seems to be overly positive about empirical forecast methods. Several of the examples cited have not performed well after publication in real out of sample cases. This is often the case with such methods which have often been inadvertently tuned to non-causal relationships in sections of the past observational record. Please therefore refine the language to better represent this, for example by saying "...achieved potentially useful levels of skill (but note the comments below about real time forecast skill)..." and at L34: "often appear to outperform" as this is not really outperforming if based on noncausal factors.
L45: Suggest "high surface NAO" as some studies claim NAO skill from high level circulation fields that is not reflected in surface NAO predictions
L46: Baker et al 2024 reported similar levels of skill for the NAO from later generations of forecasts and similar ranking of systems so a better phrasing here would be "However, there is a wide range of performance between systems and system upgrades have not significantly improved overall skill". Please also remove comments about reducing skill as the reported changes are not significant.
L110: typo "aa"
L138: I did not understand why this implies 'many pathways'
Sec3.1: why is this particular system (ECMWF SEAS5) used? Is it because it has lower skill than some of the others (c.f. Sec 4.1) and so useful to detect errors? If so please say this.
L201-205: How are anomalies calculated in SEAS5 and ERA5?
P7 line 1: This seems odd as there are only N values to start with so by definition there are many repeates and samples are not independent. This will reduce spread and affect results like those in Fig.5. Is there a simple inflation of spread that can be done to correct and compensate for this?
L268: what is the mean bias in the NAO?
L290: typo 'gyrefor'
L297: please state of this represents a positive feedback
L384: robust
L394, L405: grammar at the start of these sentences, please reword

Citation: https://doi.org/10.5194/egusphere-2025-5075-RC1
- AC1:
  'Reply to RC1', Erik Kolstad, 28 Oct 2025
  
  I thank the reviewer for their thoughtful and constructive comments, which will be very helpful in improving the manuscript. I am encouraged by the reviewer’s assessment that the study is timely and relevant. Below I address the two major points raised; detailed revisions and responses to minor comments will be provided in the formal response.
  Major point A: Use of October hindcasts
  
  The motivation for using the October initialisations was to ensure sufficient variation in the November SST states. In the November runs, SSTs are nearly identical across ensemble members due to inertia, which I reckoned would reduce the usefulness of the data for mediation analysis. Nevertheless, I agree that November forecasts are operationally the most relevant for DJF predictions. I have therefore decided to base the analysis on the November initialisations. As stated in the paper, these results confirm that the main conclusions are not sensitive to this choice. The biases are somewhat smaller, but the mediation remains weak. For the November runs, the anomaly correlation coefficient between the modelled and observed NAO is 0.29 (p = 0.06), which is consistent with the findings of Baker et al. (2024), while for the October runs it was non-significant. Within SEAS5, the SST–NAO correlation is 0.19, smaller than for the October runs, indicating that the errors are not reduced as hypothesised.
  Major point B: Use of ensemble means
  
  I fully agree with the reviewer that comparing ensemble-mean relationships to observational relationships is problematic, as the ensemble mean effectively filters out much of the internal variability. I had in fact originally used individual ensemble members but switched to ensemble means for simplicity. Following the reviewer’s suggestion, I have now repeated the analysis using all ensemble members. The results are broadly consistent with those based on ensemble means, though the relationship between the SST–NAO correlation and the indirect effect (Fig. 5) strengthens for SEAS5. In my opinion this reinforces the conclusion that feedback mechanisms play a central role in determining NAO predictability.
  Additionally, I have discovered a data-handling error affecting the Eady growth rate (EGR) calculations for the October runs. After correcting this issue, the indirect effect through baroclinicity increased and spatially it now more closely resembles the ERA5 result. As in ERA5, baroclinicity emerges as the dominant mediator (higher indirect effect than for the surface fluxes). I am grateful that the reviewer’s comments prompted this revision, which has clarified the results and improved the manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2025-5075-AC1
  - AC2: 'Reply on AC1', Erik Kolstad, 30 Oct 2025
    
    I have identified and corrected a data-handling error in the processing of the model data. This issue affected the calculation of the indirect (mediated) effects, particularly for the baroclinicity (Eady growth rate) pathway.
    In the corrected analysis, the model performs slightly better in reproducing the indirect effect – especially through the baroclinicity pathway. Broadly speaking, the main conclusions of the study stand, but the numerical results have been revised.
    Following the editor’s suggestion, I have sent the corrected manuscript directly to the editor, who will share it with the remaining reviewer(s). This ensures that the remaining review is based on the corrected version, while keeping the discussion open and transparent for all readers.
    
    Citation: https://doi.org/10.5194/egusphere-2025-5075-AC2
AC3: 'Correction of data handling error', Erik Kolstad, 05 Nov 2025

I identified a data-handling error in the processing of the model data, related to the incorrect chronological sorting of years when concatenating files. I have now corrected this issue and re-run the analysis. Overall the results have changed slightly, but the main conclusions remain qualitatively the same. The PDF file contains new figures and summarises the updates to the findings.

Citation: https://doi.org/10.5194/egusphere-2025-5075-AC3
RC2:
'Comment on egusphere-2025-5075', Anonymous Referee #2, 17 Nov 2025
This paper uses a statistical casual framework called mediation analysis to diagnose atmosphere-ocean feedbacks associated with NAO predictability and to compare them between ERA5 and the seasonal prediction system SEAS. It finds that surface heat fluxes and Eady growth rate mediate the effect of November SST on DJF NAO, which the paper refers to as an “indirect effect”. These indirect effects are found to be substantially weaker in SEAS than ERA5.
I find this to be an interesting and well written paper, albeit with some potentially important interpretation issues in its current form. My chief concerns are that the autocorrelation of the NAO is not considered, that much of the results hinge on the potentially coincidental correlation between the DJF NAO and the particular November SST pattern studied, and that nothing about the analysis demonstrates a causal role of subpolar heat fluxes in the SST-to-NAO feedback. I foresee that these issues could be addressed with some additional analyses and more careful wording, which could be addressed in a round of major revisions.
Please note that this review is based on an updated version of the manuscript, provided by the editor, in which the data handling error was corrected and where individual ensemble members were used instead of ensemble means.
Major Comments:
When the effects X -> Z -> Y are discussed, X -> Y -> Z is considered as an alternative, which motivates Figs. 3b, 3e, 4b, 4e. This is all fine and well, but the paper does not consider the alternative that Y -> X and Y -> Z. That Y -> X may seem a bit silly when Y is DJF NAO and X is November SST, but not necessarily if the autocorrelation of the NAO is considered. To rule this out, I think it is necessary (1) to consider the correlation between November NAO and DJF NAO and/or (2) to investigate the sub-seasonality of the X -> Y relationship. I can see in Kolstad and O’Reilly that this effect is largest in February, which certainly helps to address this concern, but I still more discussion of this is needed in this paper.

A related concern is that the mediation effect αβ could simply be the coincidental agreement between α and β. This is easy to imagine, because β is strong (everything co-varies with the NAO, albeit usually with NAO as the causal driver). Then any α that are large by coincidence (and of the same sign) will show as a strong indirect effect. My concern is that the correlation between the NAO and the November SST pattern is at least partially coincidental (combined with choices made to maximize this correlation, as discussed in the text). Then the comparison SEAS analysis is at a disadvantage in terms of correlations (compared to ERA5) throughout the rest of the analysis, because the ERA5 SST pattern was chosen, rather than choosing whatever November SST pattern would maximize the correlation in SEAS. To address this, I think at minimum requires repeating the SEAS analysis with the November SST pattern most correlated with the DJF NAO within the model.

Even after the above two statistical issues are addressed, I still don’t think it’s possible to fully conclude that DJF subpolar heat fluxes play a causal role in mediating the relationship between November SSTs and the DJF NAO. The reason is that heat fluxes in this region are strongly correlated with the NAO, where this primarily represents the reverse causality (i.e., NAO -> subpolar heat fluxes). This means that ANY other causal pathway that relates the November SST pattern to the DJF NAO will also show up as a strong mediation effect in the heat fluxes. I think this requires much more careful discussion throughout the manuscript as well as softening of the conclusions relating the role of subpolar heat fluxes. This all of course applies to Eady growth rate as well, but there the underlying physical explanations in the manuscript make more sense.

References to the subpolar gyre, or in some cases even just “the Gyre” are vague. When discussing the heat flux biases and heat flux mediation plots, the subpolar gyre was referring to a small northern part of the Gyre near Iceland (most egregiously on L. 273). Then the references to the gyre were to a location further south when the Eady growth rate results were discussed. Additionally, there are unlabeled emphasis boxes on several figures, which are not in the same place across figures. References to the subpolar gyre should be made with a map of the subpolar gyre streamfunction in mind, and the paper needs clearer labeling of which regions are being referred to where (especially the averaging regions used in Figure 5).

While the term “suppression” makes sense, and it seems that “inconsistent mediation” comes from the literature, I think the term “correct mediation” is misleading, because it’s also possible for there to be other effects (i.e., in completely other problems) where the correct effect (i.e., in reality) is one of inconsistent mediation or suppression. Is there another term that could be used for this? This terminology sometimes comes off as overconfident about the true direction of the various effects in reality (e.g., on lines 373-375).

Line Comments:
Figure 1: The figure caption should specify that the SEAS values are from the ensemble mean

72-79: After the preceding discussion about S2N paradox, I thought this paragraph could benefit from distinguishing between what has been found based on observations and what has been found based on models.

102-106: Of these 3 extensions, it seems to me like the first has already been done by Kolstad and O’Reilly (2024) and deserves less emphasis

218: “appropriate latitude-based weighting” is not specific enough, because there are two common choices, one where the data is cos(lat) weighted and one where the data is sqrt(cos(lat)) weighted such that the covariance matrix is area weighted. See discussion at https://climatedataguide.ucar.edu/climate-tools/empirical-orthogonal-function-eof-analysis-and-rotated-eof-analysis and in Baldwin et al. 2009 (https://doi.org/10.1175/2008JCLI2147.1).

229: Just checking, this is still the SST field and not the surface temperature field, right? Surface temperature has sea-ice surface temperature, which can be much colder than the freezing point, whereas SST should be no less than the freezing point (approx. -4°C).

302: “Its close resemblance to the indirect-effect pattern in the Subpolar Gyre underscores the feedback nature of this coupling” – It’s not a feedback on the NAO, because the NAO is forcing the heat fluxes, not the other way around. So maybe the heat fluxes of the same sign can be said to reinforce the heat fluxes, but this wording seems to be implying a feedback on the NAO, which cannot be diagnosed from the sign of the heat fluxes alone

Figure 2: Caption is for 6 panels instead of 8, 2 missing.

268: High west of Gibraltar = Azores High

Figure 3, 4: It’s important to note somewhere that the boxes added for emphasis are not in the same place across figures

322: “broadly negative” is not correct. Approximately just as much positive as negative.

332: “strengthens horizontal temperature gradients and reduces lower-tropospheric stability” – have you checked this, or is this an inference?

339: “unmistakable” is a bit strong. They look similar in pattern, but different in amplitude. Keep in mind that there hasn’t been any statistical test of the amplitude difference, just the sign

368: typo in p-value?

416-420: A larger feedback of the ocean is not necessarily all about ocean resolution. See for example Czaja et al. 2019 (https://doi.org/10.1007/s40641-019-00148-5) and Wills et al. 2024 (https://doi.org/10.1029/2023MS004123) on the role of atmospheric resolution
Citation: https://doi.org/10.5194/egusphere-2025-5075-RC2

Erik W. Kolstad

Viewed

Total article views: 584 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
397	152	35	584	15	15

HTML: 397
PDF: 152
XML: 35
Total: 584
BibTeX: 15
EndNote: 15

Views and downloads (calculated since 17 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	232	56	12	300
Nov 2025	133	57	15	205
Dec 2025	32	39	8	79

Cumulative views and downloads (calculated since 17 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	232	56	12	300
Nov 2025	133	57	15	205
Dec 2025	32	39	8	79

Viewed (geographical distribution)

Total article views: 594 (including HTML, PDF, and XML) Thereof 594 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 19 Dec 2025

Short summary

I investigated why predicting winter weather over the North Atlantic remains difficult by studying how autumn ocean conditions influence the atmosphere. Using a method called mediation analysis, I uncovered a sequence of feedbacks linking sea surface temperatures to changes in winds and storm tracks. These feedbacks are poorly captured in current forecast models, which helps explain their limited skill and points to ways they can be improved.


Total:	0
HTML:	0
PDF:	0
XML:	0