A large-scale evaluation of available subseasonal precipitation forecast products over the contiguous United States

Zhang, Lujun; Wang, Yihan; Dresback, Kendra; Szpilka, Christine; Kolar, Randall; Yang, Tiantian

doi:10.5194/egusphere-2025-5062

Preprints

https://doi.org/10.5194/egusphere-2025-5062

Preprints

09 Dec 2025

| 09 Dec 2025

Status: this preprint is open for discussion and under review for Hydrology and Earth System Sciences (HESS).

A large-scale evaluation of available subseasonal precipitation forecast products over the contiguous United States

Lujun Zhang, Yihan Wang, Kendra Dresback, Christine Szpilka, Randall Kolar, and Tiantian Yang

Abstract. Accurate precipitation predictions at the subseasonal timescale (beyond a week but within a season) could benefit a range of human activities, but are highly challenging to achieve. Research efforts have been made through multi-agency and international collaborations, resulting in numerous forecast products such as those included in the Subseasonal Consortium (SubC) and the Subseasonal-to-Seasonal (S2S) Prediction Project. However, a unified and comprehensive evaluation of the full suite of hindcast datasets from these efforts remains limited, partly due to inconsistencies in hindcast frequency and data periods across products. In this study, we employ the full suite of nineteen precipitation hindcast datasets from the SubC and S2S projects over the contiguous United States (CONUS). The hindcast datasets are temporally aggregated into weekly values and are assessed against a reference dataset derived from the Parameter-elevation Regressions on Independent Slopes Model (PRISM). Overall and seasonal evaluations are carried out using statistical metrics including percentage bias (PBIAS), anomaly correlation coefficient (ACC), and continuous ranked probability score (CRPS). Furthermore, we adopt a baseline-referenced skillfulness approach that accounts for differences in hindcast initialization, frequency, and data periods for a relatively fair comparison among the employed hindcast datasets. Our results indicate widespread overestimations in winter and spring across most hindcast datasets, while underestimations are more likely to be observed in summer and autumn. Predictive accuracy generally declines over forecast lead time and remains marginal beyond week three. Notable variations in predictive skill are observed across regions, seasons, and lead times, with no single hindcast dataset consistently outperforming others. In summary, this work provides valuable references for both forecast end-users and model developers, and highlights the need for context-specific selection of available subseasonal forecast products for downstream applications.

Received: 13 Oct 2025 – Discussion started: 09 Dec 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Lujun Zhang, Yihan Wang, Kendra Dresback, Christine Szpilka, Randall Kolar, and Tiantian Yang

Status: open (until 24 Mar 2026)

Post a comment Subscribe to comment alert

RC1: 'Comment on egusphere-2025-5062', Anonymous Referee #1, 01 Jan 2026 reply

This is an exhaustive analysis of the sub seasonal forecasts produced by a number of models that participated in two different model inter comparison campaigns. S2S and SubC. While I think this article should be published, it is more of a time capsule of simulations performed nearly a decade ago and experimental design that is somewhat limited by the computational limitations at that time. The number of inter and intra model ensembles is quite small compared to what would be feasible today at these spatial resolutions. In that sense I am not sure how much this work reflects the current model cap abilities that extend to convection resolving scales and much larger ensemble sizes. With that in mind a few additional comments:
a. Why was the choice made to scale everything to 0.25 degree, this is neither native of PRISM or the models. I suspect much of the "noise" or striations seen on the western mountainous regions in. figure 4 and 7 for example is probably an artifact of the remapping of the model outputs from 100km to 25 km (which seems to be a direct interpolation without consideration for terrain effects of other features).
b. It is remarkable that p[ast two weeks all the model variability of this (I amassing a mix of intra and inter model) variability collapses into approximately the same bias geographically (fig 5 for example). This seems to suggest that the deterministic equation setup in these models make this reach a climate 'state' after the initial conditions and no external perturbations (for example as in a climate models). Do all these models use the same SSTs and how often do they get updated?
c. Have you looked at the inter model variability (for example BOM or CNRM) that have larger ensemble size? Does the bias compared to PRISM for the inter model ensemble remain consistent with each other?
d. The CRPS results applied across different models with different initializations may be not appropriate? If applied to the same model ensemble it helps understand the model variability and predictability. Across different models it really not very useful as it much impossible to understand what makes these ensembles collapse.
e. Analysis of these model predictions under certain initial condition constraints (for example ENSO) could be separated to add additional value to the publication. My guess is that while there may be challenges in improving the overall predictability at these time scales, we may be able to find the contained time and spatial domains that would be more predictable under these special initial conditions.
f. The link to the ECMWF site for data doesn't work.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-5062-RC1
CC1: 'Comment on egusphere-2025-5062', Nima Zafarmomen, 08 Jan 2026 reply

This study provides a comprehensive assessment of 19 subseasonal precipitation hindcast datasets from the Subseasonal Consortium (SubC) and the Subseasonal-to-Seasonal (S2S) Prediction Project. The evaluation covers the Contiguous United States (CONUS) and utilizes the PRISM dataset as a high-resolution reference. The paper provides high value to both forecast end-users and model developers. By identifying specific regions, seasons, and lead times where certain models excel, it helps practitioners move beyond "one-size-fits-all" model selection. The systematic use of both deterministic (ACC, PBIAS) and probabilistic (CRPS) metrics ensures a robust evaluation of model performance.

1- The authors correctly note that CRPS is sensitive to precipitation magnitude and should be interpreted alongside ACC. However, the discussion could be slightly strengthened by more explicitly discussing how "dry masks" might impact the perceived skill in arid regions like the Southwest during summer, as briefly mentioned in the discussion of future work

2- The study re-grids all datasets to a uniform 0.25-degree resolution. While this is a standard approach, it would be beneficial to add a brief comment on whether models with inherently finer native resolutions (like the 1-degree SubC models vs. the 2.5-degree BOM model) showed a consistent advantage in complex terrains due to better representative physics.
3- In the results, the authors identify "ACC hotspots" where accuracy remains higher for longer lead times (e.g., the West Coast and Florida Peninsula). It would be helpful to explicitly state in the conclusion if these hotspots are primarily driven by large-scale teleconnections (like ENSO or MJO) which are mentioned in the discussion, to provide a more process-based takeaway for the reader.
4- The authors transparently acknowledge the limitations of extending pairwise logic to multi-model comparisons. To add even more value, a sentence could be added to the Discussion recommending a specific "best practice" for future researchers who might have the computational resources to run a fully synchronized multi-model experiment.

5- I do strongly recommend the authors consider incorporating a comparison with recent findings in diverse hydroclimatic regions. Specifically, the study 'Analysis of historical global warming impacts on climatological trends for the partially gauged Hirmand river basin based on multiple data products and bias correction methods' demonstrates how similar multi-product evaluations and bias correction techniques perform in data-scarce, partially gauged environments.

Reply

Citation: https://doi.org/10.5194/egusphere-2025-5062-CC1

Lujun Zhang, Yihan Wang, Kendra Dresback, Christine Szpilka, Randall Kolar, and Tiantian Yang

Viewed

Total article views: 472 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
215	236	21	472	41	32

HTML: 215
PDF: 236
XML: 21
Total: 472
BibTeX: 41
EndNote: 32

Views and downloads (calculated since 09 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	117	126	8	251
Jan 2026	55	62	12	129
Feb 2026	43	48	1	92

Cumulative views and downloads (calculated since 09 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	117	126	8	251
Jan 2026	55	62	12	129
Feb 2026	43	48	1	92

Viewed (geographical distribution)

Total article views: 460 (including HTML, PDF, and XML) Thereof 460 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 28 Feb 2026

Short summary

Reliable rainfall forecasts for periods between one week and one season can support water management, agriculture, and disaster preparedness, yet remain difficult to achieve. This study evaluates nineteen international subseasonal rainfall forecast datasets across the United States to compare their performance. The results show large seasonal and regional differences in forecast quality and emphasize that no single product works best everywhere.


Total:	0
HTML:	0
PDF:	0
XML:	0