the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Signal, noise and skill in sub-seasonal forecasts: the role of teleconnections
Abstract. A set of relaxation experiments with a forecast model is used to explore the influence of tropical and stratospheric teleconnections on forecast skill, variability of forecast ensemble mean (EM) and ensemble spread (ES) in the wintertime Northern Hemisphere at sub-seasonal timescales. The influence is diagnosed by comparing the relaxation experiments, which relax the temperature and wind fields in specific regions to observed values, with the free running (control) experiment. During weeks 3–6 the tropical relaxation increases the forecast skill for sea level pressure (SLP) mostly south of 50° N but also over the North Atlantic, Northern Europe and eastern Canada. The stratospheric relaxation improves the skill mostly in high latitudes, over Europe, and North Atlantic. Skill improvements are considerably smaller for surface temperature and total precipitation, suggesting a smaller role of the teleconnections in their predictability. The increases in skill are generally associated with increased variability of EM, considered to represent the predictable signal, and reduced ES representing noise. However, this does not happen in all areas where the skill is increased. In high latitudes, where the stratospheric impacts are strongest, the EM variability does not increase in the stratospheric relaxation experiments consistently with increases in skill, implying that EM does not reflect the predictable signal. We estimate that the ensemble size available in the experiments (11 members) is not enough to make it possible to extract signal from noise, and that larger ensembles (typically 20–50 members or even more depending on area and variable) are required to study sub-seasonal predictability associated with the teleconnections in mid- and high latitudes, including windows for forecast opportunities.
Competing interests: Dr. Amy Butler is a co-editor of WCD.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(5083 KB) - Metadata XML
-
Supplement
(3668 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2556', Anonymous Referee #1, 03 Aug 2025
Reivew of " Signal, noise and skill in sub-seasonal forecasts: the role of
Teleconnections" by Karpechko et al
This study uses a set of ensemble relaxation experiments to explore the relationship between tropical and stratospheric teleconnections, forecast skill, and signal to noise relationships. Relaxing either the tropics or the stratosphere increases the forecast skill for SLP, and to a lesser degree for T2m and precip, in many regions; these effects are mostly consistent with previous work. The novel part is that the study then tries to diagnose whether the increases is associated with a signal in the ensemble mean, with a reduction in the ensemble spread, or both. While in many regions the answer is "both", there are numerous exceptions (including the Northern Europe signal in SLP to stratospheric nudging, where the ensemble mean signal is weak, and most of the skill increase comes from a reduction in ensemble spread). The authors then diagnose how big an ensemble is needed before it possible to reliably extract signal from noise, and find that larger ensembles than are used in this study would be needed to identify sub-seasonal predictability; this last part is where I think the study could be improved the most.
Overall, the required revisions could be relatively minor if the authors decide to tone down the statements I found most objectionable, or more major if they disagree with my assessment and provide additional evidence supporting their statements. Either way, revisions are needed before I consider the final version.
There are three major comments that are somewhat related to one-another and concern how to interpret the signal to noise metrics presented in this paper:
1a. As alluded to above, I think the conclusions drawn from the analysis on the minimal ensemble size are likely overstated. I am particularly bothered by lines 23-24 in the abstract and 71-73 in the introduction. The discussion section (lines 513-518) is a little more careful, but even there I think the wording can be refined.
The minimal ensemble size used in this paper is true for the S2N definition and perfect model definition used here. But there are other ways of extracting subseasonal signals from forecast ensembles and skill can be demonstrated from much smaller ensembles in many situations.
Using long hindcasts we can extract teleconnection signals from the tropics using <5 ensemble members (e.g. Stan et al 2022). Domeisen et al 2020 (already cited) also showed that <5 members is enough to extract signals from the stratosphere for many models. Both of these studies use long hindcasts from several models, and demonstrate some skill at representing teleconnections using far fewer members, even as the skill will of course increase as ensemble sizes increase. I think the authors' results are demonstrating that signal exceeds noise only for ensemble sizes larger than 20, and such a signal to noise analysis is essential for deciding on ensemble size of real-time operational forecasts. But real-time forecasts use 50 members or more at least for IFS, so it would seem that operational forecasts are already large enough to extract signals in most regions. It would seem that rewording the text in the three locations noted above would be enough to resolve this issue, unless the authors disagree with me in which case additional work is needed.
1b. A related issue is that equations 12 and 13 work in the limit that Control has no skill. If I understand equation 12 and 13 correctly, the residual skill in CTRL in week 5-6 will lead to an overestimate of the minimal ensemble size. This is because of nonzero sigma^2 in CTRL. Is there a way to account for this effect in the derivation of equation 13, or at least quantify how important this effect might be?
1c. An alternate way of thinking about "perfect model" and signal to noise is the ratio of predictable components (RPC) from Smith and Scaife 2018 (already cited). This definition seems to be more robust to ensemble size, and can identify S2N issues with relatively small ensembles (see figure 1 of Smith and Scaife and figure S17 of Garfinjkel et al 2024; already cited) though bigger ensemble sizes certainly help. I hate to add yet another metric to this already comprehensive paper, but I think the authors need to compute RPC if they really think their statements in the three locations outlined above are correct. Otherwise, the statements in the abstract and end of discussion about minimum ensemble size need to be made more specific to one specific method of ascertaining signal to noise. On a related note, it isn't clear to me whether RPC and S2N metrics are actually the same thing, or even closely related, despite the fact that they both use similar terminology; hence the closing paragraph on lines 535-540 seems overly speculative at the moment.
(Given the fact that STRAT nudging is increasing skill in Northern Europe despite not increasing EM variability, I strongly suspect there is an RPC>1 issue in this region. This is likely to be similar to the RPC>1 issue shown by Garfinkel et al 2024 for this model in polar cap height)
Minor comments
Line 19/20: an additional possibility is that the model isn't fully utilizing the predictable signal, or possibly is misrepresenting the predictable signal.
Line 44: missing word in "some state-of-art can capture"
Line 58: I suggest adding Stan et al 2022
Table 1: is there tapering for the stratospheric nudging below
Line 139: the "(\rho)" belong two words earlier in the sentence
Line 380-381: is it possible to provide a more physically meaningful interpretation? For example, is there overly strong downward coupling from the stratosphere to Northern Europe in control?
Figure 8 and similar other figures: suggest masking regions without skill with a different color than white, since white is used for topography.
Stan, Cristiana, Cheng Zheng, Edmund Kar-Man Chang, Daniela IV Domeisen, Chaim I. Garfinkel, Andrea M. Jenney, Hyemi Kim et al. "Advances in the prediction of MJO teleconnections in the S2S forecast systems." Bulletin of the American Meteorological Society 103, no. 6 (2022): E1426-E1447.
Citation: https://doi.org/10.5194/egusphere-2025-2556-RC1 -
RC2: 'Comment on egusphere-2025-2556', Anonymous Referee #2, 15 Aug 2025
Review of Signal, noise and skill in sub-seasonal forecasts: the role of teleconnections
The paper uses a series of forecast model temperature-nudging experiments to investigate how do atmospheric teleconnections from the stratosphere and the tropics influence the forecast skill at subseasonal timescales. Specifically, the study examines whether the increase in the forecast skill in the relaxation experiments is reflected in the variation of the ensemble mean or its spread (or both), by separating between the (predictable) signal and the (unpredictable) noise. Results show that in the stratospheric relaxation experiments, the increase in skill in high-latitude is not reflected in an increase in the ensemble mean. The authors conclude that extracting signal from noise requires a larger ensemble size than to the ensemble size used in this study.
Overall, the study performs a comprehensive analysis to extract signal from noise and understand subseasonal predictability skill and its sources. While the work is concise and well-written, I was not convinced about the reliability of the signal-noise model presented in this study. Therefore, a major revision is required in order to address several major issues (as described in detail below).
Main comments:
1. The study examines how does nudging tropical and stratospheric temperature and wind fields influence the forecast skill, ensemble variability and ensemble spread at subseasonal timescales. It raises the question whether using variances of the ensemble mean and ensemble spread are valid representations of the predictable (signal) and unpredictable (noise) variances of the model. However, as the authors themselves say “in general these are not the same things”. I am not convinced yet that these definitions reliably represent what they intend. In particular, the signal (EM) is defined as deviation from hindcast climatology (eq. 3), thus its variance (and well as the ensemble spread variance) represents the model’s variability, whereas the anomaly correlation coefficient (ACC) is defined with a ‘reference’ of ERA5 climatology. This ACC is later compared to STN, however – are they comparable? It would be good if the authors justify this approach and why do we expect STN to be directly comparable to ACC.
2. Using the same logic, a high false alarm rate, for example, may lead to an increased signal to noise ratio, since the STN is based on the hindcast climatologies, but it is not an indication of a good skill. Therefore, STN would be sensitive to false alarms in the model, and this can suggest another expiation for high skill in regions with low ACC.
3. In this study, ‘signal’ is defined based on anomalies from the hindcast climatology. This definition means that large variance of SLP anomalies will contribute to increased ‘signal to noise’ ratio (and possibly an improved skill), while smaller variance may not. However, if a model predicts conditions that are similar to climatology – with variance comparable to the climatological variance – what would STN represent in that case? Another related question: what does the high STN in the CTRL experiment in week 1-2 represent (Figure 3), e.g. is it close to 1 because of the small ensemble spread or due to a large variance of EM? Or both?
4. Is there a possibility that the ensemble agreement (reduced ES variance) may capture only a certain type of skill improvements, e.g., following SSW events. Could it be that the STN model can be a good reflection of the skill for such episodes rather than for all the initializations? Analyzing specific episodes could be a useful way to test if teleconnection-based skill really occurs simultaneously with STN increase, e.g., for a case study. This may help to clarify concerns and justify the STN model approach for skill.
5. Fig.4 suggest that the actual skill is a function of the “perfect model” skill only in the CTRL run and at short lead times. In the relaxed simulations – this relation does not seem to be as linear as for the CTRL. However, the interpretation of this result is that the STN ration does not reflect the improved skill in the relaxation experiments. However, the authors do not provide an alternative explanation for this outcome. This may also relate to the previous questions – does the STN model, as defined here, accurately represent skill changes?
6. Fig.3 shows that 𝑆𝑇𝑁 is largest in subtropical ocean basins – the Pacific and western Atlantic. This is not surprising as these are the storm track regions – where the variance of many atmospheric variables reaches their peak (on daily and weekly timescales; see Fig.2 in Chang et al., 2002). I think this goes back to the definition of STN, and the question whether it is right to represent the “signal” as the variance of SLP anomalies from hindcast climatologies (eq. 3). Please make sure that the definition of STN is not simply an indication of the regions with largest variance in winter.
Reference:
Chang, E. K., Lee, S., & Swanson, K. L. (2002). Storm track dynamics. Journal of climate, 15(16), 2163-2183. https://doi.org/10.1175/1520-0442(2002)015<02163:STD>2.0.CO;2
Minor/technical comments:
Line 122: The variability -> the variance?
Line 245: the authors mention that the mean downward coupling is usually small over the Pacific region (Dai et al., 2023), and therefore it is not clear how relaxation of the stratosphere contributes to the increased PNA skill. First, Dai et al has analyzed sudden stratospheric warming events, and therefore, focused only on specific episodes. Second, is it possible that STRAT overestimates the Pacific response to stratospheric variability, and the PNA response in particular. This can be easily examined by comparing to the free running CTRL.
Line 260: “actual correlation skill” -> perhaps rephrase the term “actual”, e.g., model’s skill
Line 235: using -> used
Line 500: “A not less interesting question” – rephrase.
Line 500: “these skill increases” –> this skill
Line 540 “We propose that some conclusions regarding the predictability of the extratropical troposphere … might need to be revised in the future when larger ensemble sizes required to correctly separate the signal and noise in the models will become available” – this statement is too general. Ca the authors specify what are “some conclusions”?
Citation: https://doi.org/10.5194/egusphere-2025-2556-RC2 - AC1: 'Authors' responses to Referees comments', Alexey Karpechko, 01 Sep 2025
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
334 | 29 | 13 | 376 | 23 | 31 | 47 |
- HTML: 334
- PDF: 29
- XML: 13
- Total: 376
- Supplement: 23
- BibTeX: 31
- EndNote: 47
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1