the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evaluating Extreme Precipitation Forecasts: A Threshold-Weighted, Spatial Verification Approach for Comparing an AI Weather Prediction Model Against a High-Resolution NWP Model
Abstract. Recent advances in AI-based weather prediction have led to the development of artificial intelligence weather prediction (AIWP) models with competitive forecast skill compared to traditional NWP models, but with substantially reduced computational cost. There is a strong need for appropriate methods to evaluate their ability to predict extreme weather events, particularly when spatial coherence is important, and grid resolutions differ between models.
We introduce a verification framework that combines spatial verification methods and proper scoring rules. Specifically, the framework extends the High-Resolution Assessment (HiRA) approach with threshold-weighted scoring rules. It enables user-oriented evaluation consistent with how forecasts may be interpreted by operational meteorologists or used in simple post-processing systems. The method supports targeted evaluation of extreme events by allowing flexible weighting of the relative importance of different decision thresholds. We demonstrate this framework by evaluating 32 months of precipitation forecasts from an AIWP model and a high-resolution NWP model. Our results show that model rankings are sensitive to the choice of neighbourhood size. Increasing the neighbourhood size has a greater impact on scores evaluating extreme-event performance for the high-resolution NWP model than for the AIWP model. At equivalent neighbourhood sizes, the high-resolution NWP model only outperformed the AIWP model in predicting extreme precipitation events at short lead times. We also demonstrate how this approach can be extended to evaluate discrimination ability in predicting heavy precipitation. We find that the high-resolution NWP model had superior discrimination ability at short lead times, while the AIWP model had slightly better discrimination ability from a lead time of 24-hours onwards.
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-5796', Anonymous Referee #1, 16 Mar 2026
-
RC2: 'Comment on egusphere-2025-5796', Anonymous Referee #2, 20 Mar 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2025-5796/egusphere-2025-5796-RC2-supplement.pdf
-
CEC1: 'Comment on egusphere-2025-5796 - No compliance with the policy of the journal', Juan Antonio Añel, 25 Mar 2026
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
First, I would like to note that in the preprint of your manuscript you do not provide important information regarding the deposit of some code and data for your manuscript. You have provided information about a repository (https://zenodo.org/records/17667747) to the editors internally. However, such information must be public, and should be in your manuscript, and therefore I am making it public here.
Second, the Code and Data Availability section in your manuscript does not provide a repository for the GraphCast and High-Resolution Rapid Refresh models, which you use in your work. Additionally, to access the data, you have linked sites that are not trusted long-term archival repositories, and therefore are not acceptable according to the policy of the journal.
We can not accept this, it is forbidden by our policy, and your manuscript should have never been accepted for Discussions or peer review given such lack of compliance. Our policy clearly states that all the code and data necessary to replicate a manuscript must be published openly and freely to anyone before submission. The GMD review and publication process depends on reviewers and community commentators being able to access, during the discussion phase, the code and data on which a manuscript depends, and on ensuring the provenance of replicability of the published papers for years after their publication. We cannot have manuscripts under discussion that do not comply with our policy.
Therefore, we are granting you a short time to solve this situation. You have to reply to this comment in a prompt manner with the information for the repositories containing all the models, code and data that you use to produce and replicate your manuscript. The reply must include the link and permanent identifier (e.g. DOI). Also, any future version of your manuscript must include the modified section with the new information. The 'Code and Data Availability’ section must also be modified to cite the new repository locations, and corresponding references added to the bibliography.
Additionally, I see that two reviewers have already posted comments on your manuscript. I ask you to refrain to address the comments by any reviewer until the situation regarding the compliance of your manuscript with the Code and Data policy of the journals is clarified and solved.
I must note that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Exec. EditorCitation: https://doi.org/10.5194/egusphere-2025-5796-CEC1 -
AC1: 'Reply on CEC1', Nicholas Loveday, 01 Apr 2026
Dear Juan,
Would the following statement meet your needs? We believe that it should be if we compare it to other papers published in GMD that used the same or similar data.
"One-minute ASOS data was retrieved from https://mesonet.agron.iastate.edu/request/asos/1min.phtml which contains an archive of data provided by the National Climatic Data Center. HRRRv4 uses the Weather Research and Forecasting (WRF) model v3.9.1, which is available at https://www2.mmm.ucar.edu/wrf/users/download/get_source.html (National Center for Atmospheric Research, 2025), with the namelist provided at https://rapidrefresh.noaa.gov/hrrr/wrf.nl.txt (National Oceanic and Atmospheric Administration, 2025) and is also available from https://hrrrzarr.s3.amazonaws.com/index.html. GraphCast-GFS is from NOAA’s Open Data Dissemination (NODD) program https://doi.org/10.1175/BAMS-D-24-0057.1 and can be retrieved from https://noaa-oar-mlwp-data.s3.amazonaws.com/index.html. ERA5 data is available in the Copernicus data store (doi.org/10.24381/cds.adbb2d47) and from https://console.cloud.google.com/storage/browser/weatherbench2/data/era5.
The verification measures and statistical tests used in this paper (e.g., twCRPS) were implemented in the scores package (https://doi.org/10.5281/zenodo.18638494). All code to reproduce the results and figures in this paper is available at https://zenodo.org/records/17667747 . "
Other notes to the editor:
- If it is required to meet the data availability requirements, we can try to put the subset of the GraphCast-GFS, HRRR, and observations data that we used for the paper on Zenodo. We believe this would address any remaining issues with data availability in our statement above. Could you please confirm if we need to do this and if it would address your concerns?
- We will update the scores zenodo link and the paper code zenodo link to be the correct versions when we resubmit a revised manuscript
Thanks,
Nick
Citation: https://doi.org/10.5194/egusphere-2025-5796-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 01 Apr 2026
Dear authors,
Many thanks for the reply. Unfortunately, your proposed solution does not address the outstanding issues which I pointed in my previous comment. We must insist that you have to publish all the code and data openly, and reply to this comment with the information about them. It is not enough that you correct it in a reviewed version of your manuscript. The information requested is necessary for the Discussions stage and peer review.
The new text that you propose continue citing multiple sites that are not suitable for long-term storage of assets linked to the publication of a paper. Only the two Zenodo repositories that you have mentioned are acceptable. Namely, the iastate.edu, ucar.edu, noaa.gov, amazonaws.com, or the sites linked in the BAMS paper you cite, are not acceptable. They do not fulfil GMD’s requirements for a persistent data archive because:
- They do not appear to have a published policy for data preservation over many years or decades (some flexibility exists over the precise length of preservation, but the policy must exist).
- They do not appear to have a published mechanism for preventing authors from unilaterally removing material. Archives must have a policy which makes removal of materials only possible in exceptional circumstances and subject to an independent curatorial decision,
- They do not appear to issue a persistent identifier such as a DOI or Handle for each precise dataset.If for any of them we have missed a published policy which does in fact address this matter satisfactorily, please post a response linking to it. If you have any questions about this issue, please post them in a reply.
I must insist that if you do not fix this problem, we cannot continue with the peer-review process or accept your manuscript for publication in GMD.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-5796-CEC2 -
AC2: 'Reply on CEC2', Nicholas Loveday, 08 Apr 2026
Dear Juan,
Before I upload large amounts of data to Zenodo, could you please let me know if the following will be sufficient?
- Put all observations used on Zenodo.
- Put the entire subset of GraphCast-GFS and HRRR data required to reproduce the results on Zenodo.
I think that the other data and code meets the requirements already.
- ERA5 data is already on the Copernicus data store (doi.org/10.24381/cds.adbb2d47)
- Scores code (that I implemented for this work) is on Zenodo.
- Code to reproduce all data-wrangling, calculations, and plotting is on Zenodo. I will update this when I respond to the reviewer's feedback.
Could you please let me know if this is sufficient? If it is, I will notify you when the data is uploaded to Zenodo.
Regards,
Nick
Citation: https://doi.org/10.5194/egusphere-2025-5796-AC2
-
AC2: 'Reply on CEC2', Nicholas Loveday, 08 Apr 2026
-
AC1: 'Reply on CEC1', Nicholas Loveday, 01 Apr 2026
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 174 | 0 | 3 | 177 | 0 | 0 |
- HTML: 174
- PDF: 0
- XML: 3
- Total: 177
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This article contributes to the discussion on the performance of AI models for weather forecasting, with a particular focus on their ability to predict extreme precipitation events. The methodology incorporates several novel ideas in verification, including spatial verification using a neighbourhood pseudo‑ensemble, a threshold‑weighted CRPS, and a decomposition of the CRPS using post‑processing.
The paper reads very well overall. The data and the verification methodology are generally well described, and the figures are clear and easy to read. However, I found myself going back and forth between Figures 3, 4, and 7 to compare the results. Perhaps the authors could find a way to keep the results from Figure 3 visible in Figures 4 and 7 (and those from Figure 4 in Figure 7). This would make it easier to follow the presentation of the results.
While I find the study very interesting, I would encourage the authors to add a couple of discussion points:
Minor comments:
References:
Ben Bouallegue et al (2026), SEEPS4ALL: an open dataset for the verification of daily precipitation forecasts using station climate statistics, https://doi.org/10.5194/essd-18-713-2026
Jin et al (2025), WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models, https://doi.org/10.48550/arXiv.2409.09371
Siegert, S. (2017), Simplifying and generalising Murphy's Brier score decomposition. Q.J.R. Meteorol. Soc., 143: 1178-1183. https://doi.org/10.1002/qj.2985
Theis, S.E., Hense, A. and Damrath, U. (2005), Probabilistic precipitation forecasts from a deterministic model: a pragmatic approach. Met. Apps, 12: 257-268. https://doi.org/10.1017/S1350482705001763