the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A General Comprehensive Evaluation Method for Cross-Scale Precipitation Forecasts
Abstract. With the development of refined numerical forecasts, the problems such as the score distortion due to the division of precipitation thresholds in both traditional and improved scoring methods for precipitation forecast and the increasing subjective risk arisen from the scale setting of the neighbourhood spatial verification method have become increasingly prominent. To solve this issue, a general comprehensive evaluation method (GCEM) has been developed for cross-scale precipitation forecasts by directly analysing the proximity of precipitation forecasts and observations in this study. In addition to the core element of the precipitation forecast accuracy score (PAS) index, the GCEM system also includes score indices for insufficient precipitation forecasts, excessive precipitation forecasts, precipitation forecast biases and clear/rainy forecasts. The PAS does not distinguish the magnitude of precipitation and delimit the area of influence, it constitutes a fair scoring formula with objective performance and can be suitable for evaluating the rainfall events such as general and extreme precipitation. The PAS can be used to calculate the accuracy of numerical models or quantitative precipitation forecasts, enabling the quantitative evaluation of the comprehensive capability of various refined precipitation forecasting products. Based on the GCEM, comparative experiments between the PAS and TS are conducted for two typical precipitation weather processes. The results show that relative to TS, the PAS aligns with subjective expectations much more, indicating that the PAS is more reasonable than the TS. In addition, other indices of the GCEM are utilized to analyse the range and extent of both insufficient and excessive forecasts of precipitation, as well as the precipitation forecast ability in two weather processes. These indices not only provide overall scores for individual cases similar to the TS but also offer two-dimensional score distribution plots, which can comprehensively reflect the performance and characteristics of precipitation forecasts. Both theoretical and practical applications demonstrate that the GCEM exhibits distinct advantages and potential promotion and application value compared to the various mainstream precipitation forecast verification methods.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(2664 KB)
-
Supplement
(236 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2664 KB) - Metadata XML
-
Supplement
(236 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2613', Anonymous Referee #1, 03 Jan 2024
General comments
In this article, a new evaluation method named GCEM is introduced for assessing the accuracy of precipitation forecast. This method employs a scoring function that produces more continuous scores compared to the traditional threat score (TS). The authors present GCEM as a general method suitable for cross-scale precipitation forecasts, including both general and extreme precipitation events. The advantages of GCEM, in terms of providing more continuous scores and more spatial information, are demonstrated by analyzing the distribution of several relevant indices in two typical cases. While GCEM offers an alternative to TS with a simple and concise formula, the authors should provide detailed insights into understanding the practicality and limitations of the new method, especially concerning extreme events. Furthermore, additional comparisons with other advanced methods are necessary to substantiate the effectiveness of GCEM. Overall, the manuscript is well written and makes a valuable contribution to the field of precipitation forecast evaluation. Therefore, I would recommend the article for publication after minor revisions according to the specific comments below.
Specific comments
P1 L17: Short summary: “This method does not utilize the traditional contingency table-based classification verification, can replace the threat score (TS), equitable threat score (ETS) and other skill score methods ...”. What do the other skill core methods refer to? How does GCEM compare with other modern forecasting evaluation methods? Spatial distribution of the other advanced indices or overall scores could be beneficial to demonstrate the advantages of GCEM.
P14 L304, L306, L309, L312: I suggest that the authors provide a rationale for choosing a coefficient of 0.6 under the given condition. Is PAS equal to 1 or 0.6 when u=0 and x=0? Further clarification on the extent of overall PAS value’s sensitivity to the coefficient would help understand the robustness of GCEM and its applicability to light rain.
P18 L371: The comparison of the two typical cases is clear but not enough to generalize the effectiveness of the general method across different types of precipitation events, geographical areas, or other meteorological conditions. I suggest examining the method in a broader selection of cases covering a range of conditions, especially for extreme precipitation (e.g., extreme rainfall event over Henan in July 2021). Meanwhile, analyzing results from different forecast models, if available, could be beneficial. This approach would allow us to discuss how GCEM can help identify specific problems in each model that TS cannot.
P20 L430: “…, which differs from subjective judgement” How do the authors define the subjective judgement (whether it be from forecasters, the public, or model developers)? Why is TS different from subjective judgment, perhaps due to the double penalty issue? The authors are encouraged to further explain how GCEM addresses and overcomes the limitations of TS in these cases. To enhance understanding, a comparison of GCEM with other advanced methods that avoid the double penalty issues would be valuable.
Technical corrections
P3 L52: Short duration heavy rainfall often leads to …
P10 L214: … its core scoring function is as follows.
P14 L301: The sentence is confusing.
P15 L313 writes “when 0<u<10” while L315 Eq. 9 writes “0<=u<10”. Please check whether it is correct.
P35 Fig. 4: The figure title can be more concrete by indicating representations of different styles (solid/dashed) of lines.
P36 Fig. 5: The same issue as in Fig. 4.
Citation: https://doi.org/10.5194/egusphere-2023-2613-RC1 -
AC2: 'Reply on RC1', Anning Huang, 14 Mar 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-2613/egusphere-2023-2613-AC2-supplement.pdf
-
AC2: 'Reply on RC1', Anning Huang, 14 Mar 2024
-
RC2: 'Comment on egusphere-2023-2613', Anonymous Referee #2, 06 Jan 2024
Comments on the manuscript entitled “A General Comprehensive Evaluation Method for Cross-Scale Precipitation Forecasts” by Zhang et al. submitted to GMD.
General comments:
The authors propose a novel precipitation evaluation method, the Precipitation Accuracy Score (PAS), diverging from traditional Threat Score (TS). PAS utilizes continuous functions for more precise precipitation forecast accuracy measurement, constituting a notable advancement in the evaluation method of precipitation forecasts. The study falls in the scope of GMD. However, the manuscript is hard to follow due to vague expressions and English writing. A substantial revision is imperative to enhance clarity and facilitate comprehension before the manuscript can be deemed suitable for publication in GMD. Specific comments are as follows:
Specific comments:
L63: Clarify why conventional scoring methods fail to reflect model performance improvements. Specify the methods encompassed within the "traditional scoring methods" category.
L141: Replace "Yang Dong et al., 2017" with "Yang et al., 2017."
L219-220: Elaborate on the meaning of "When the observed precipitation is not forecasted, PAS = 0."
Given the definitions of IPS score, EPS score, and IEPS score in Eqs. 2-4, reconsider the terminology. Generally, a higher value indicates a better score, while a near-zero value signifies a poorer score. The authors may consider replacing "score" with "index" or "deviation" for consistency.
L233: replace “0<u<x” with “0<u≤x” in Eq (3) ?
239: Suggest merging Case "x=u" with either Case "0≤x≤u" or "0<u≤x" in Eq (4). Moreover, IPS and EPS seem redundant under IEPS as both the IPS and EPS are included in IEPS.
L246-249: Given the definition of PASN, the interval of x and u in Eq. (2-4) may change accordingly, e.g. replace “0<u≤x” with “0.1<u≤x”? PASN includes PAS|ux0.1 when u ≥ 0.1 or x ≥ 0.1. Please clearly define the intervals to distinguish PASN from PAS|ux0.1.
L285-286: The authors mentioned in L285-286 that x represents observed precipitation and u stands for forecasted precipitation, which contradicts the information in L216-217. Should the authors confirm if there's an intention to interchange x and u to assess the symmetry of PAS in this context?
L306: Does PAS=0.6PAS|u->0 or PAS=0.6PAS|x->0 in L306?
L380-381: For Fig. 6, kindly label the location of Hunan, Jiangxi, and Zhejiang Provinces or provide the longitude and latitude range.
L414-416: The forecasted precipitation was interpolated onto observed grid points. Therefore, the forecasted and observed data should share the same grid points. How to understand "forecasted data on the grid point nearest to the observed grid point"? Did the authors employ a neighborhood verification method? Please clarify.
427: What is meant by "traditional scores" in L427? Tables 8 and 9 only display TS and PAS. Is the PAS from Eq (1) considered a traditional score?
425-436: Tables 8 and 9 depict contrasting results for PAS and TS in evaluating two precipitation cases. Is the difference attributed to distinct definitions of PAS and TS or the application of a neighborhood verification method in PAS calculation? Case 2019 has more scattered precipitation than case 2020. Compared to the point-to-point method, the neighborhood verification may significantly rise the skill score for scattered cases.
L441-443: the manuscript mentions "leading to a monotonous increase in scores" – what is it referring to?
L443-446: The statement, "The PAS assigns objective scores based on the proximity of the forecast to the observation, making it more reliable for precipitation evaluation than the TS," raises skepticism. Why is PAS considered more reliable than TS for precipitation evaluation? Is it because PAS is objective while TS is more subjective? If so, why expect the PAS evaluation align with the subjective judgment mentioned in L434-436?
462: How were the PASC scores for cases 1 and 2 calculated? Were they obtained by computing the area-mean PASC?
L473-474: For international readers' clarity, please mark the locations of Anhui, Zhejiang, Jiangxi, Hunan, and Hebei on the figure.
The manuscript's English writing style makes it challenging to follow. Many expressions, such as "general comprehensive evaluation method" in the title, "core element of the precipitation forecast accuracy score index" in the abstract, and "the scoring areas in Zhejiang exhibit alternatively distributed high and low scores" in L467, sound awkward. Consider revising the manuscript for clarity and coherence.
Citation: https://doi.org/10.5194/egusphere-2023-2613-RC2 -
AC1: 'Reply on RC2', Anning Huang, 14 Mar 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-2613/egusphere-2023-2613-AC1-supplement.pdf
-
AC1: 'Reply on RC2', Anning Huang, 14 Mar 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2613', Anonymous Referee #1, 03 Jan 2024
General comments
In this article, a new evaluation method named GCEM is introduced for assessing the accuracy of precipitation forecast. This method employs a scoring function that produces more continuous scores compared to the traditional threat score (TS). The authors present GCEM as a general method suitable for cross-scale precipitation forecasts, including both general and extreme precipitation events. The advantages of GCEM, in terms of providing more continuous scores and more spatial information, are demonstrated by analyzing the distribution of several relevant indices in two typical cases. While GCEM offers an alternative to TS with a simple and concise formula, the authors should provide detailed insights into understanding the practicality and limitations of the new method, especially concerning extreme events. Furthermore, additional comparisons with other advanced methods are necessary to substantiate the effectiveness of GCEM. Overall, the manuscript is well written and makes a valuable contribution to the field of precipitation forecast evaluation. Therefore, I would recommend the article for publication after minor revisions according to the specific comments below.
Specific comments
P1 L17: Short summary: “This method does not utilize the traditional contingency table-based classification verification, can replace the threat score (TS), equitable threat score (ETS) and other skill score methods ...”. What do the other skill core methods refer to? How does GCEM compare with other modern forecasting evaluation methods? Spatial distribution of the other advanced indices or overall scores could be beneficial to demonstrate the advantages of GCEM.
P14 L304, L306, L309, L312: I suggest that the authors provide a rationale for choosing a coefficient of 0.6 under the given condition. Is PAS equal to 1 or 0.6 when u=0 and x=0? Further clarification on the extent of overall PAS value’s sensitivity to the coefficient would help understand the robustness of GCEM and its applicability to light rain.
P18 L371: The comparison of the two typical cases is clear but not enough to generalize the effectiveness of the general method across different types of precipitation events, geographical areas, or other meteorological conditions. I suggest examining the method in a broader selection of cases covering a range of conditions, especially for extreme precipitation (e.g., extreme rainfall event over Henan in July 2021). Meanwhile, analyzing results from different forecast models, if available, could be beneficial. This approach would allow us to discuss how GCEM can help identify specific problems in each model that TS cannot.
P20 L430: “…, which differs from subjective judgement” How do the authors define the subjective judgement (whether it be from forecasters, the public, or model developers)? Why is TS different from subjective judgment, perhaps due to the double penalty issue? The authors are encouraged to further explain how GCEM addresses and overcomes the limitations of TS in these cases. To enhance understanding, a comparison of GCEM with other advanced methods that avoid the double penalty issues would be valuable.
Technical corrections
P3 L52: Short duration heavy rainfall often leads to …
P10 L214: … its core scoring function is as follows.
P14 L301: The sentence is confusing.
P15 L313 writes “when 0<u<10” while L315 Eq. 9 writes “0<=u<10”. Please check whether it is correct.
P35 Fig. 4: The figure title can be more concrete by indicating representations of different styles (solid/dashed) of lines.
P36 Fig. 5: The same issue as in Fig. 4.
Citation: https://doi.org/10.5194/egusphere-2023-2613-RC1 -
AC2: 'Reply on RC1', Anning Huang, 14 Mar 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-2613/egusphere-2023-2613-AC2-supplement.pdf
-
AC2: 'Reply on RC1', Anning Huang, 14 Mar 2024
-
RC2: 'Comment on egusphere-2023-2613', Anonymous Referee #2, 06 Jan 2024
Comments on the manuscript entitled “A General Comprehensive Evaluation Method for Cross-Scale Precipitation Forecasts” by Zhang et al. submitted to GMD.
General comments:
The authors propose a novel precipitation evaluation method, the Precipitation Accuracy Score (PAS), diverging from traditional Threat Score (TS). PAS utilizes continuous functions for more precise precipitation forecast accuracy measurement, constituting a notable advancement in the evaluation method of precipitation forecasts. The study falls in the scope of GMD. However, the manuscript is hard to follow due to vague expressions and English writing. A substantial revision is imperative to enhance clarity and facilitate comprehension before the manuscript can be deemed suitable for publication in GMD. Specific comments are as follows:
Specific comments:
L63: Clarify why conventional scoring methods fail to reflect model performance improvements. Specify the methods encompassed within the "traditional scoring methods" category.
L141: Replace "Yang Dong et al., 2017" with "Yang et al., 2017."
L219-220: Elaborate on the meaning of "When the observed precipitation is not forecasted, PAS = 0."
Given the definitions of IPS score, EPS score, and IEPS score in Eqs. 2-4, reconsider the terminology. Generally, a higher value indicates a better score, while a near-zero value signifies a poorer score. The authors may consider replacing "score" with "index" or "deviation" for consistency.
L233: replace “0<u<x” with “0<u≤x” in Eq (3) ?
239: Suggest merging Case "x=u" with either Case "0≤x≤u" or "0<u≤x" in Eq (4). Moreover, IPS and EPS seem redundant under IEPS as both the IPS and EPS are included in IEPS.
L246-249: Given the definition of PASN, the interval of x and u in Eq. (2-4) may change accordingly, e.g. replace “0<u≤x” with “0.1<u≤x”? PASN includes PAS|ux0.1 when u ≥ 0.1 or x ≥ 0.1. Please clearly define the intervals to distinguish PASN from PAS|ux0.1.
L285-286: The authors mentioned in L285-286 that x represents observed precipitation and u stands for forecasted precipitation, which contradicts the information in L216-217. Should the authors confirm if there's an intention to interchange x and u to assess the symmetry of PAS in this context?
L306: Does PAS=0.6PAS|u->0 or PAS=0.6PAS|x->0 in L306?
L380-381: For Fig. 6, kindly label the location of Hunan, Jiangxi, and Zhejiang Provinces or provide the longitude and latitude range.
L414-416: The forecasted precipitation was interpolated onto observed grid points. Therefore, the forecasted and observed data should share the same grid points. How to understand "forecasted data on the grid point nearest to the observed grid point"? Did the authors employ a neighborhood verification method? Please clarify.
427: What is meant by "traditional scores" in L427? Tables 8 and 9 only display TS and PAS. Is the PAS from Eq (1) considered a traditional score?
425-436: Tables 8 and 9 depict contrasting results for PAS and TS in evaluating two precipitation cases. Is the difference attributed to distinct definitions of PAS and TS or the application of a neighborhood verification method in PAS calculation? Case 2019 has more scattered precipitation than case 2020. Compared to the point-to-point method, the neighborhood verification may significantly rise the skill score for scattered cases.
L441-443: the manuscript mentions "leading to a monotonous increase in scores" – what is it referring to?
L443-446: The statement, "The PAS assigns objective scores based on the proximity of the forecast to the observation, making it more reliable for precipitation evaluation than the TS," raises skepticism. Why is PAS considered more reliable than TS for precipitation evaluation? Is it because PAS is objective while TS is more subjective? If so, why expect the PAS evaluation align with the subjective judgment mentioned in L434-436?
462: How were the PASC scores for cases 1 and 2 calculated? Were they obtained by computing the area-mean PASC?
L473-474: For international readers' clarity, please mark the locations of Anhui, Zhejiang, Jiangxi, Hunan, and Hebei on the figure.
The manuscript's English writing style makes it challenging to follow. Many expressions, such as "general comprehensive evaluation method" in the title, "core element of the precipitation forecast accuracy score index" in the abstract, and "the scoring areas in Zhejiang exhibit alternatively distributed high and low scores" in L467, sound awkward. Consider revising the manuscript for clarity and coherence.
Citation: https://doi.org/10.5194/egusphere-2023-2613-RC2 -
AC1: 'Reply on RC2', Anning Huang, 14 Mar 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-2613/egusphere-2023-2613-AC1-supplement.pdf
-
AC1: 'Reply on RC2', Anning Huang, 14 Mar 2024
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
276 | 79 | 22 | 377 | 36 | 16 | 16 |
- HTML: 276
- PDF: 79
- XML: 22
- Total: 377
- Supplement: 36
- BibTeX: 16
- EndNote: 16
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
1 citations as recorded by crossref.
Bing Zhang
Mingjian Zeng
Zhengkun Qin
Couhua Liu
Wenru Shi
Kefeng Zhu
Chunlei Gu
Jialing Zhou
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2664 KB) - Metadata XML
-
Supplement
(236 KB) - BibTeX
- EndNote
- Final revised paper