A spread-versus-error framework to reliably quantify the potential for subseasonal windows of forecast opportunity
Abstract. Mid-latitude forecast skill at subseasonal timescales often depends on 'windows of opportunity' that may be opened by slowly varying modes such as ENSO, the MJO or stratospheric variability. Most previous work has focused on the predictability of ensemble-mean states, with less attention paid to the reliability of such forecasts and how it relates to ensemble spread, which directly reflects intrinsic forecast uncertainty. Here, we introduce a spread-versus-error framework based on the Spread-Reliability Slope (SRS) to quantify whether fluctuations in ensemble spread provide reliable information about variations in forecast error. Using ECMWF S2S forecasts and ERA5 reanalysis data, aided by idealised toy-model experiments, we show that reliability is controlled by at least three intertwined factors: sampling error, the magnitude of physically driven spread variability and model fidelity in representing that variability. Regions such as northern Europe, the mid-east Pacific, and the tropical west Pacific exhibit robustly high SRS values (≈ 0.6 or greater for 50-member ensembles), consistent with robust modulation by slowly varying teleconnections. In contrast, areas like eastern Canada show little or no reliability, even for 100-member ensembles, reflecting limited low-frequency modulation of forecast uncertainty. We further demonstrate two practical implications: (i) a simple variance rescaling yields a post-processed 'corrected spread' that enforces reliability and may help to bridge ensemble output with user needs; and (ii) time averaging effectively boosts ensemble size, allowing even 10-member ensembles to achieve reliability of spread fluctuations comparable to larger ensembles. Finally, we discuss possible links to the signal-to-noise paradox and emphasize that adequate representation of ensemble spread variability is crucial for exploiting subseasonal windows of opportunity.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Weather and Climate Dynamics.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
The manuscript “A spread-versus-error framework to reliably quantify the potential for subseasonal windows of forecast opportunity” by Rupp et al. explores the relationship between the ensemble spread and forecast error in sub-seasonal ensemble forecasts (days 14-46) by ECMWF system and in a statistical toy model. The authors propose an approach, based on spread-error relationship, to identify regions where variations in ensemble spread correlate with variations in forecast error and demonstrate, using a simple statistical model, that spread-error relationship can be deteriorated by insufficient sampling, lack of physical processes that modulate predictability, and model deficiencies.
The paper provides several interesting ideas, in particular exploring the connection between intra-forecast and inter-forecast variability of the spread, and illustrating several critical issues of sub-seasonal forecasting (such of under-sampling) using the toy model. I have no doubt that the paper should be published in WCD. However, I ask the authors to clarify several critical points before publication.
Major points.
Specific points:
L61-64: Are these assertions supported by research, or is it your hypothesis? If this is the former, a reference is needed. If this is your hypotheses, please be clear about it.
L113: Provide full reference for Leutbecher et al.
L114-115: “A comparison between the IFS model and the CNRM model further shows qualitatively robust patterns (discussed in Section 6).” Robust patterns of what? Also, more information about the used CNRM data is needed.
L115-116: It is quite difficult to comprehend what exactly “forecast spread reliability is influenced by the potential for windows of opportunity” means. I am not sure which definition of “reliability” the authors are using. A reliable ensemble forecast system (or any other forecast system that provides probabilistic forecasts) is one whose predicted probabilities correspond to the observed frequencies; this is what a reliability diagram illustrates. It would help if the authors provided the definition of reliability they are using. In addition, what is the difference between “windows of opportunity” and “potential for windows of opportunity”? “Opportunity” and “potential” sound synonymous to me.
L125-127: “However, if the ensemble size is small, sampling errors will be relatively large. In such a case, some forecast/time step with, e.g., low spread, could be also associated with comparably large error, as the spread is simply underestimated due to sampling error.” You assume that spread is not a good predictor for accuracy, but has this been studied? Also, how to define whether the ensemble size is small or not? The size you are using (50 members at least) does not sound small to me.
Figure 2: Have you tried plotting only the “inter” component of your variance separation, rather than showing daily spread and error, which are mostly noise?
Figure 2 captions: “Red dashed line” not “Orange dashed line”
L151: How do you define “anomaly”? Figure 2 shows only positive values. For anomalies I would expect both positive (above climatology) and negative (below climatology) values.
L175: Do you assume that ensemble mean is well represented in the toy model, or do you also assume it is well represented in operational forecasts? Is this assumption justified?
L242: Does your assumption hold? I understand that, as you under-sample the forecast distribution, the variability of the spread will in general increase. However, I believe that the variability of ensemble mean would also increase, leading to increased error. Why this would not be the case?
L251: If the error is overestimated then how this can lead to a lower error?
L235-255: I cannot understand your explanations for decreased SRS in experiment (b), and I am not sure that you can explain it without analysing variability of ensemble mean.
L262-270: Do you mean that a larger ensemble size than 100 members would be required to capture the spread-error relationship in the case shown in panel “c”? Have you tested this with your toy model?
L271: “intrincic” -> ” intrinsic”
L289-290: Can you be more specific about which effects are unsystematic? I understand that insufficient number of cases leads to unsystematic effects, but can for example small sample size lead to unsystematic effects, or does it always lead to decreased SRS?
L324-329: Can you provide equations for the inter- and intra- variability?
L341: I do not know what the journal’s policy is, but I would prefer to see the definition of the theoretical sampling error estimate in the text rather than in figure captions.
L351-352: I presume you refer to Figure 4d? It would be nice to explicitly refer to this figure in the text, for clarity.
L388-389: It took me a while to figure out that you are using different colour scale for Figs. 9b and 9d. I suggest using the same scale because you are making the point about smallness of the anomalies in Fig.9d, which cannot be seen with the present scales.