A large-scale evaluation of available subseasonal precipitation forecast products over the contiguous United States
Abstract. Accurate precipitation predictions at the subseasonal timescale (beyond a week but within a season) could benefit a range of human activities, but are highly challenging to achieve. Research efforts have been made through multi-agency and international collaborations, resulting in numerous forecast products such as those included in the Subseasonal Consortium (SubC) and the Subseasonal-to-Seasonal (S2S) Prediction Project. However, a unified and comprehensive evaluation of the full suite of hindcast datasets from these efforts remains limited, partly due to inconsistencies in hindcast frequency and data periods across products. In this study, we employ the full suite of nineteen precipitation hindcast datasets from the SubC and S2S projects over the contiguous United States (CONUS). The hindcast datasets are temporally aggregated into weekly values and are assessed against a reference dataset derived from the Parameter-elevation Regressions on Independent Slopes Model (PRISM). Overall and seasonal evaluations are carried out using statistical metrics including percentage bias (PBIAS), anomaly correlation coefficient (ACC), and continuous ranked probability score (CRPS). Furthermore, we adopt a baseline-referenced skillfulness approach that accounts for differences in hindcast initialization, frequency, and data periods for a relatively fair comparison among the employed hindcast datasets. Our results indicate widespread overestimations in winter and spring across most hindcast datasets, while underestimations are more likely to be observed in summer and autumn. Predictive accuracy generally declines over forecast lead time and remains marginal beyond week three. Notable variations in predictive skill are observed across regions, seasons, and lead times, with no single hindcast dataset consistently outperforming others. In summary, this work provides valuable references for both forecast end-users and model developers, and highlights the need for context-specific selection of available subseasonal forecast products for downstream applications.
This is an exhaustive analysis of the sub seasonal forecasts produced by a number of models that participated in two different model inter comparison campaigns. S2S and SubC. Â While I think this article should be published, it is more of a time capsule of simulations performed nearly a decade ago and experimental design that is somewhat limited by the computational limitations at that time. The number of inter and intra model ensembles is quite small compared to what would be feasible today at these spatial resolutions. In that sense I am not sure how much this work reflects the current model cap abilities that extend to convection resolving scales and much larger ensemble sizes. With that in mind a few additional comments:
a. Why was the choice made to scale everything to 0.25 degree, this is neither native of PRISM or the models. Â I suspect much of the "noise" or striations seen on the western mountainous regions in. figure 4 and 7 for example is probably an artifact of the remapping of the model outputs from 100km to 25 km (which seems to be a direct interpolation without consideration for terrain effects of other features).Â
b. It is remarkable that p[ast two weeks all the model variability of this (I amassing a mix of intra and inter model) variability collapses into approximately the same bias geographically (fig 5 for example). Â This seems to suggest that the deterministic equation setup in these models make this reach a climate 'state' after the initial conditions and no external perturbations (for example as in a climate models). Â Do all these models use the same SSTs and how often do they get updated?Â
c. Have you looked at the inter model variability (for example BOM or CNRM) that have larger ensemble size? Does the bias compared to PRISM for the inter model ensemble remain consistent with each other?Â
d. The CRPS results applied across different models with different initializations may be not appropriate? Â If applied to the same model ensemble it helps understand the model variability and predictability. Across different models it really not very useful as it much impossible to understand what makes these ensembles collapse.Â
e. Analysis of these model predictions under certain initial condition constraints (for example ENSO) could be separated to add additional value to the publication. Â My guess is that while there may be challenges in improving the overall predictability at these time scales, we may be able to find the contained time and spatial domains that would be more predictable under these special initial conditions. Â
f. The link to the ECMWF site for data doesn't work.Â
Â