the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A spread-versus-error framework to reliably quantify the potential for subseasonal windows of forecast opportunity
Abstract. Mid-latitude forecast skill at subseasonal timescales often depends on 'windows of opportunity' that may be opened by slowly varying modes such as ENSO, the MJO or stratospheric variability. Most previous work has focused on the predictability of ensemble-mean states, with less attention paid to the reliability of such forecasts and how it relates to ensemble spread, which directly reflects intrinsic forecast uncertainty. Here, we introduce a spread-versus-error framework based on the Spread-Reliability Slope (SRS) to quantify whether fluctuations in ensemble spread provide reliable information about variations in forecast error. Using ECMWF S2S forecasts and ERA5 reanalysis data, aided by idealised toy-model experiments, we show that reliability is controlled by at least three intertwined factors: sampling error, the magnitude of physically driven spread variability and model fidelity in representing that variability. Regions such as northern Europe, the mid-east Pacific, and the tropical west Pacific exhibit robustly high SRS values (≈ 0.6 or greater for 50-member ensembles), consistent with robust modulation by slowly varying teleconnections. In contrast, areas like eastern Canada show little or no reliability, even for 100-member ensembles, reflecting limited low-frequency modulation of forecast uncertainty. We further demonstrate two practical implications: (i) a simple variance rescaling yields a post-processed 'corrected spread' that enforces reliability and may help to bridge ensemble output with user needs; and (ii) time averaging effectively boosts ensemble size, allowing even 10-member ensembles to achieve reliability of spread fluctuations comparable to larger ensembles. Finally, we discuss possible links to the signal-to-noise paradox and emphasize that adequate representation of ensemble spread variability is crucial for exploiting subseasonal windows of opportunity.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Weather and Climate Dynamics.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(23267 KB) - Metadata XML
-
Supplement
(17575 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-4925', Anonymous Referee #1, 28 Nov 2025
-
AC1: 'Reply on RC1', Philip Rupp, 25 Mar 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-4925/egusphere-2025-4925-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Philip Rupp, 25 Mar 2026
-
RC2: 'Comment on egusphere-2025-4925', Tim Woollings, 09 Jan 2026
This is a nice paper which investigates the potential of ensemble spread to provide useful indication of likely forecast error on S2S timescales. The methods are novel and varied, the application sound and the results should prove useful to the forecasting community. I recommend publication after considering the comments below.
1. Fig 1 shows dominant regions of mean spread over the North Pacific and Atlantic. The shaded regions of large variability in spread lie on the flanks of these, so can the variability in spread be interpreted as (predominantly) north-south shifts of the usual regions of high spread aligning with the jets / storm tracks?
2. All days with lead times 14-46 days are combined in lots of the analysis here. Two thoughts on this: 1) Given the high day-to-day autocorrelation, the number of independent samples will be much less than it seems. Does this need to be taken into account anywhere? Perhaps the binning limits the impact of this. 2) The examples shown (eg fig 2a) show that the ensemble spread has saturated by day 14, which is good. This seems to be a necessary condition, as otherwise there might be a trivial link between spread and error as both are related to lead time. Can the authors confirm that saturation by day 14 is seen everywhere, not just at the couple of points shown?
3. The raw data fig 2b suggests low error values for the largest spread values (>25000), which goes against the overall relationship. Is this common or just a feature of this location?
4. Fig 3 plots the slope for every NH point, with hatching marking points where the slope is not significantly different from zero. Does this mean that at all the non-hatched points the correlation between spread and error is significant (accounting for autocorrelation)?
5. I’m not sure I understand the pink shading in fig 4 - this doesn’t seem to agree with the binned data shown by the black dots. In particular some of the dots with large values of spread and error seem to have very large spread values compared to the shading - is that right?
6. The SRS maps in fig 6 are interesting. Eg it looks like the model spread is considerably more reliable for the northern centre of the NAO than the southern centre. Is this consistent with any other literature?
7. The relation to inter-over-intra variability in fig 8 is interesting. Can this be taken further back, eg to basic variances of the real atmosphere such as shown for differnent frequency bands in Blackmon et al (1984), and others?
8. Section 5 could be rounded off with a summary number - eg what fraction of spatial variance in SRS is explained by the perfect model test?
9. The whole paper is framed around ‘windows of opportunity’, ie the low-spread end of the spectrum. Is there any interest in the high-spread end of the spectrum (walls of adversity perhaps…?)? The method uses a linear fit across the whole range of spread - do the results reflect the high-spread end of the relationship as much as the low-spread end?
10. There are several new results given in the Conclusions & Discussion section, which are important enough to make the abstract. Consider moving these into the main paper.
Minor:
- SRS of 0.6 is given as a summary figure in the abstract which is a nice idea but might be hard to interpret without knowing more about what SRS is.
- line 28: I would say that the whole ensemble is the ‘actual prediction’, not just the ensemble mean.
- line 43: ‘areas occasionally associated with anomalously low spread’ are highlighted here, but could it also be occasionally high spread?
- line 132: ‘potential’ windows of opportunity?
- line 397-8: ref to support this statement.
- line 451: consider linking to https://doi.org/10.48550/arXiv.2411.17694 on signal-noise issues in subseasonal forecasts.
Typos:
- line 80: Ref style
- line 108: forecasts
- line 220: considerably
- line 271: intrinsic
- fig 6 caption: black line rather than grey?
- line 335: not essentially
- fig 10 caption: check line colours
Citation: https://doi.org/10.5194/egusphere-2025-4925-RC2 -
AC2: 'Reply on RC2', Philip Rupp, 25 Mar 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-4925/egusphere-2025-4925-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Philip Rupp, 25 Mar 2026
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 422 | 208 | 37 | 667 | 56 | 33 | 34 |
- HTML: 422
- PDF: 208
- XML: 37
- Total: 667
- Supplement: 56
- BibTeX: 33
- EndNote: 34
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The manuscript “A spread-versus-error framework to reliably quantify the potential for subseasonal windows of forecast opportunity” by Rupp et al. explores the relationship between the ensemble spread and forecast error in sub-seasonal ensemble forecasts (days 14-46) by ECMWF system and in a statistical toy model. The authors propose an approach, based on spread-error relationship, to identify regions where variations in ensemble spread correlate with variations in forecast error and demonstrate, using a simple statistical model, that spread-error relationship can be deteriorated by insufficient sampling, lack of physical processes that modulate predictability, and model deficiencies.
The paper provides several interesting ideas, in particular exploring the connection between intra-forecast and inter-forecast variability of the spread, and illustrating several critical issues of sub-seasonal forecasting (such of under-sampling) using the toy model. I have no doubt that the paper should be published in WCD. However, I ask the authors to clarify several critical points before publication.
Major points.
Specific points:
L61-64: Are these assertions supported by research, or is it your hypothesis? If this is the former, a reference is needed. If this is your hypotheses, please be clear about it.
L113: Provide full reference for Leutbecher et al.
L114-115: “A comparison between the IFS model and the CNRM model further shows qualitatively robust patterns (discussed in Section 6).” Robust patterns of what? Also, more information about the used CNRM data is needed.
L115-116: It is quite difficult to comprehend what exactly “forecast spread reliability is influenced by the potential for windows of opportunity” means. I am not sure which definition of “reliability” the authors are using. A reliable ensemble forecast system (or any other forecast system that provides probabilistic forecasts) is one whose predicted probabilities correspond to the observed frequencies; this is what a reliability diagram illustrates. It would help if the authors provided the definition of reliability they are using. In addition, what is the difference between “windows of opportunity” and “potential for windows of opportunity”? “Opportunity” and “potential” sound synonymous to me.
L125-127: “However, if the ensemble size is small, sampling errors will be relatively large. In such a case, some forecast/time step with, e.g., low spread, could be also associated with comparably large error, as the spread is simply underestimated due to sampling error.” You assume that spread is not a good predictor for accuracy, but has this been studied? Also, how to define whether the ensemble size is small or not? The size you are using (50 members at least) does not sound small to me.
Figure 2: Have you tried plotting only the “inter” component of your variance separation, rather than showing daily spread and error, which are mostly noise?
Figure 2 captions: “Red dashed line” not “Orange dashed line”
L151: How do you define “anomaly”? Figure 2 shows only positive values. For anomalies I would expect both positive (above climatology) and negative (below climatology) values.
L175: Do you assume that ensemble mean is well represented in the toy model, or do you also assume it is well represented in operational forecasts? Is this assumption justified?
L242: Does your assumption hold? I understand that, as you under-sample the forecast distribution, the variability of the spread will in general increase. However, I believe that the variability of ensemble mean would also increase, leading to increased error. Why this would not be the case?
L251: If the error is overestimated then how this can lead to a lower error?
L235-255: I cannot understand your explanations for decreased SRS in experiment (b), and I am not sure that you can explain it without analysing variability of ensemble mean.
L262-270: Do you mean that a larger ensemble size than 100 members would be required to capture the spread-error relationship in the case shown in panel “c”? Have you tested this with your toy model?
L271: “intrincic” -> ” intrinsic”
L289-290: Can you be more specific about which effects are unsystematic? I understand that insufficient number of cases leads to unsystematic effects, but can for example small sample size lead to unsystematic effects, or does it always lead to decreased SRS?
L324-329: Can you provide equations for the inter- and intra- variability?
L341: I do not know what the journal’s policy is, but I would prefer to see the definition of the theoretical sampling error estimate in the text rather than in figure captions.
L351-352: I presume you refer to Figure 4d? It would be nice to explicitly refer to this figure in the text, for clarity.
L388-389: It took me a while to figure out that you are using different colour scale for Figs. 9b and 9d. I suggest using the same scale because you are making the point about smallness of the anomalies in Fig.9d, which cannot be seen with the present scales.