the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Smoothing and spatial verification of global fields
Abstract. Forecast verification plays a crucial role in the development cycle of operational numerical weather prediction models. At the same time, verification remains a challenge as the traditionally used non-spatial forecast quality metrics exhibit certain drawbacks, with new spatial metrics being developed to address these problems. Some of these new metrics are based on smoothing, with one example being the widely used Fraction Skill Score (FSS) and its many derivatives. However, while the FSS has been used by many researchers in limited area domains, there are, as of yet, no examples of it being used in a global domain. The issue is due to the increased computational complexity of smoothing in a global domain, with its inherent spherical geometry and non-equidistant and/or irregular grids. At the same time, there clearly exists a need for spatial metrics that could be used in the global domain as the operational global models continue to be developed and improved along with the new machine-learning-based models. Here, we present two new methodologies for smoothing in a global domain that are potentially fast enough to make the smoothing of high-resolution global fields feasible. Both approaches also consider the variability of grid point area sizes and can handle missing data appropriately. This, in turn, makes the calculation of smoothing-based metrics, such as FSS and its derivatives, in a global domain possible, which we demonstrate by evaluating the performance of operational high-resolution global precipitation forecasts provided by the European Centre for Medium-Range Weather Forecasts.
Status: closed
-
RC1: 'Comment on egusphere-2025-1525', Anonymous Referee #1, 19 Jun 2025
The manuscript presents two strategies for neighborhood-based smoothing. A kd tree range-search and an "advanced-front" style overlap-detection with reused intermediate steps on high-resolution grids on the spherical surface. While the topic is important and the ideas are promising, several gaps need to be closed before publication.
1. Clarify grid and geometry assumptions: When discussing the grid used in the model, please include detailed definitions, such as the grid data format used in the paper. Does it require connectivity or not? An example way to say this is: We only use a grid point for each face, without connectivity information. For each grid face, we just assume the face area is known. And when talking about a spherical surface, indicate if the sphere is a unit sphere or the sphere with the Earth's radius. And also, how did you handle the spherical geometry problem while you already assumed the grid area is provided?
2. Consider re-using the kd-tree during overlap pre-processing:The manuscript treats the two algorithms as mutually exclusive: k‑d tree for “small/medium” radii and overlap‑detection for “large” radii. But the overlap method already needs to enumerate every neighbourhood during its one‑time table build, and it's in O(n) time complexity. So, intuitively, building the O(nlogn) k-d tree for this overlap step is beneficial, and it can reduce the neighbour search as well. Since it keeps the cap-membership logic identical in both schemes, it can also reuse the previous structure. So, a discussion point for such a hybrid method will be helpful. Even if you decide not to proceed with the hybrid method, some sentences explaining what the limits are will help to comprehensively discuss this problem. Finally, giving some benchmark experiments showing when it's a good time for the k-d tree algorithms and when it's a good time for the overlap detection will be beneficial as well.
3. More general numerical-error analysis is required for a broader audience. The statement that overlaps deviates by “< 0.01 mm / 6 h” lacks context. Some readers do not work with precipitation. Add a paragraph that (i) defines the brute‑force/k‑d result as the reference and (ii) explains why that is the correct ground truth for round‑off studies. Provide information on absolute error/relative error/ maximum error across representative radii instead of using the term “< 0.01 mm / 6 h”. An error-growth plot versus traversal depth (0,2000,400,..., 20000 hops) will also be helpful.
4. Providing fuller details for the experiment setup will be helpful: What kind of grids of the experiments, and why can they represent the common grid types? How do these grids reveal the nature of the spherical surface? And when carrying out experiments, are you consistently using float64? A clear definition of the experiment setup will greatly help.Technical corrections:
p. 8, Fig. 5 caption: add CPU model and numeric precision.
Throughout, when stating runtimes (“single core”, “ten cores”), clarify whether these refer to physical cores or hardware threads, and whether hyper‑threading was enabled. This will help readers reproduce the scaling results.
Citation: https://doi.org/10.5194/egusphere-2025-1525-RC1 -
RC2: 'Comment on egusphere-2025-1525', Anonymous Referee #2, 03 Jul 2025
Speeding up computationally expensive algorithms is a highly desirable thing, and the analysis on the algorithm options and their computational costs is very interesting and useful to know to improve. The "need for speed" when crunching through 5 km global grids is undeniable. However, I did not feel the paper had the right title. The paper tries to argue for a lot of complexity. Unfortunately, I was not convinced of the necessity. This is perhaps because of the focus on precipitation, which at a global scale, is a very tricky prospect.
From a scientific/philosophical perspective I have a number of comments to make.
1. My philosophical point is the whole reference to smoothing. You start by showing the smoothing of precipitation fields, but for the FSS the precipitation fields are NOT smoothed. It is the process of thresholding and computing fractions over increasingly large neighbourhoods that provides the smoothing. I would not use these methods to coarse grain a raw field. I would probably regrid using a conserving regridder.
2. In lat-lon space these grids are regular, and new cube-sphere global grids are also regular, in physical spacing too. There is nothing wrong with performing global verification on a regular lat-lon grid. The issue comes with aggregation and interpretation. And here you are mixing apples and oranges when considering the combination of grid points at 45N and 15N. This is where real distances become more problematic. Forget about anything north of 45N or so. We have no reliable gridded rainfall analyses that are any good. Fundamentally I have problems with computing a single score for global precipitation spanning large regions, e.g. NH, TR and SH, but even smaller regions. Europe has an extremely heterogeneous precipitation climate. As papers on this subject, e.g. SEEPS (Rodwell et al, 2010; Haiden et al. 2012, North et al. 2022) demonstrated, the precipitation climatology globally varies so much, even at the same latitude, that some form of local climatology must be used to verify precipitation to not fall foul of false skill (a la Hamill and Juras, 2006).
3. The authors introduce the CSSS. This feels to me like a bit of an afterthought. If I understand this correctly, this is not relying on neighbourhoods or thresholds but does require the raw field to be smoothed. What would be the benefit of this over just regridding using a conserving regridder? Is it speed?
4. A general sense of dissatisfaction, I am led to believe, in not being able interpret/translate back what the FSS means, in terms of how to improve the model, was a primary motivation behind moving to the localised version of the FSS (Woodhams et al) and whilst the authors Woodhams et al. and Mittermaier may disagree on the size of the neighbourhoods used, the latter also demonstrated that high scores do not necessarily indicate skill because persistence scores even higher (over the Maritime Continent). That study has flaws, as you point out, but my counterargument would be that if one is not trying to aggregate over large regions (and there are very good reasons why one shouldn't without accounting for climatology), then the issue of what the underlying grid is becomes far less important. Furthermore, the FSS may be popular because it is easy to compute but that does not mean it is that helpful in differentiating skill, as Antonio and Aitchison (2024) also recently demonstrated.
From a verification "best practice" perspective, I am reluctant to see studies on global precipitation forecast verification published which are not accounting for the peculiarities of global precipitation. Basing this primarily on a metric that is increasingly exposed to having undesirable properties in terms of truly discerning model improvements, is another.
I would like the authors to think about the following:
- Why do you want to smooth? Is it really smoothing you're after? The title does not describe the paper and I feel the use of the word smoothing is misleading somehow.
- Why not produce a fast LFSS? This would enable the use of a gridded climatology and address a fundamental issue we have.
- The CSSS has potentially some merit, but right now I can’t see it. It isn’t explained well enough, and in the context of the CSSS, some comparison or reasoning for why not regridding doesn't do the same job? If the aim is to identify a skilful spatial scale of some kind, adaptations can be made on the grid much more easily (and probably more efficiently) before aggregating over a region (if someone REALLY wants an aggregate!) The latitudinal size adjustment is constant after all, right? I would strongly urge looking at the Antonio and Aitchison paper.
Citation: https://doi.org/10.5194/egusphere-2025-1525-RC2 - AC1: 'Responses to reviwers comments', Gregor Skok, 29 Aug 2025
Status: closed
-
RC1: 'Comment on egusphere-2025-1525', Anonymous Referee #1, 19 Jun 2025
The manuscript presents two strategies for neighborhood-based smoothing. A kd tree range-search and an "advanced-front" style overlap-detection with reused intermediate steps on high-resolution grids on the spherical surface. While the topic is important and the ideas are promising, several gaps need to be closed before publication.
1. Clarify grid and geometry assumptions: When discussing the grid used in the model, please include detailed definitions, such as the grid data format used in the paper. Does it require connectivity or not? An example way to say this is: We only use a grid point for each face, without connectivity information. For each grid face, we just assume the face area is known. And when talking about a spherical surface, indicate if the sphere is a unit sphere or the sphere with the Earth's radius. And also, how did you handle the spherical geometry problem while you already assumed the grid area is provided?
2. Consider re-using the kd-tree during overlap pre-processing:The manuscript treats the two algorithms as mutually exclusive: k‑d tree for “small/medium” radii and overlap‑detection for “large” radii. But the overlap method already needs to enumerate every neighbourhood during its one‑time table build, and it's in O(n) time complexity. So, intuitively, building the O(nlogn) k-d tree for this overlap step is beneficial, and it can reduce the neighbour search as well. Since it keeps the cap-membership logic identical in both schemes, it can also reuse the previous structure. So, a discussion point for such a hybrid method will be helpful. Even if you decide not to proceed with the hybrid method, some sentences explaining what the limits are will help to comprehensively discuss this problem. Finally, giving some benchmark experiments showing when it's a good time for the k-d tree algorithms and when it's a good time for the overlap detection will be beneficial as well.
3. More general numerical-error analysis is required for a broader audience. The statement that overlaps deviates by “< 0.01 mm / 6 h” lacks context. Some readers do not work with precipitation. Add a paragraph that (i) defines the brute‑force/k‑d result as the reference and (ii) explains why that is the correct ground truth for round‑off studies. Provide information on absolute error/relative error/ maximum error across representative radii instead of using the term “< 0.01 mm / 6 h”. An error-growth plot versus traversal depth (0,2000,400,..., 20000 hops) will also be helpful.
4. Providing fuller details for the experiment setup will be helpful: What kind of grids of the experiments, and why can they represent the common grid types? How do these grids reveal the nature of the spherical surface? And when carrying out experiments, are you consistently using float64? A clear definition of the experiment setup will greatly help.Technical corrections:
p. 8, Fig. 5 caption: add CPU model and numeric precision.
Throughout, when stating runtimes (“single core”, “ten cores”), clarify whether these refer to physical cores or hardware threads, and whether hyper‑threading was enabled. This will help readers reproduce the scaling results.
Citation: https://doi.org/10.5194/egusphere-2025-1525-RC1 -
RC2: 'Comment on egusphere-2025-1525', Anonymous Referee #2, 03 Jul 2025
Speeding up computationally expensive algorithms is a highly desirable thing, and the analysis on the algorithm options and their computational costs is very interesting and useful to know to improve. The "need for speed" when crunching through 5 km global grids is undeniable. However, I did not feel the paper had the right title. The paper tries to argue for a lot of complexity. Unfortunately, I was not convinced of the necessity. This is perhaps because of the focus on precipitation, which at a global scale, is a very tricky prospect.
From a scientific/philosophical perspective I have a number of comments to make.
1. My philosophical point is the whole reference to smoothing. You start by showing the smoothing of precipitation fields, but for the FSS the precipitation fields are NOT smoothed. It is the process of thresholding and computing fractions over increasingly large neighbourhoods that provides the smoothing. I would not use these methods to coarse grain a raw field. I would probably regrid using a conserving regridder.
2. In lat-lon space these grids are regular, and new cube-sphere global grids are also regular, in physical spacing too. There is nothing wrong with performing global verification on a regular lat-lon grid. The issue comes with aggregation and interpretation. And here you are mixing apples and oranges when considering the combination of grid points at 45N and 15N. This is where real distances become more problematic. Forget about anything north of 45N or so. We have no reliable gridded rainfall analyses that are any good. Fundamentally I have problems with computing a single score for global precipitation spanning large regions, e.g. NH, TR and SH, but even smaller regions. Europe has an extremely heterogeneous precipitation climate. As papers on this subject, e.g. SEEPS (Rodwell et al, 2010; Haiden et al. 2012, North et al. 2022) demonstrated, the precipitation climatology globally varies so much, even at the same latitude, that some form of local climatology must be used to verify precipitation to not fall foul of false skill (a la Hamill and Juras, 2006).
3. The authors introduce the CSSS. This feels to me like a bit of an afterthought. If I understand this correctly, this is not relying on neighbourhoods or thresholds but does require the raw field to be smoothed. What would be the benefit of this over just regridding using a conserving regridder? Is it speed?
4. A general sense of dissatisfaction, I am led to believe, in not being able interpret/translate back what the FSS means, in terms of how to improve the model, was a primary motivation behind moving to the localised version of the FSS (Woodhams et al) and whilst the authors Woodhams et al. and Mittermaier may disagree on the size of the neighbourhoods used, the latter also demonstrated that high scores do not necessarily indicate skill because persistence scores even higher (over the Maritime Continent). That study has flaws, as you point out, but my counterargument would be that if one is not trying to aggregate over large regions (and there are very good reasons why one shouldn't without accounting for climatology), then the issue of what the underlying grid is becomes far less important. Furthermore, the FSS may be popular because it is easy to compute but that does not mean it is that helpful in differentiating skill, as Antonio and Aitchison (2024) also recently demonstrated.
From a verification "best practice" perspective, I am reluctant to see studies on global precipitation forecast verification published which are not accounting for the peculiarities of global precipitation. Basing this primarily on a metric that is increasingly exposed to having undesirable properties in terms of truly discerning model improvements, is another.
I would like the authors to think about the following:
- Why do you want to smooth? Is it really smoothing you're after? The title does not describe the paper and I feel the use of the word smoothing is misleading somehow.
- Why not produce a fast LFSS? This would enable the use of a gridded climatology and address a fundamental issue we have.
- The CSSS has potentially some merit, but right now I can’t see it. It isn’t explained well enough, and in the context of the CSSS, some comparison or reasoning for why not regridding doesn't do the same job? If the aim is to identify a skilful spatial scale of some kind, adaptations can be made on the grid much more easily (and probably more efficiently) before aggregating over a region (if someone REALLY wants an aggregate!) The latitudinal size adjustment is constant after all, right? I would strongly urge looking at the Antonio and Aitchison paper.
Citation: https://doi.org/10.5194/egusphere-2025-1525-RC2 - AC1: 'Responses to reviwers comments', Gregor Skok, 29 Aug 2025
Model code and software
Smoothing on the Sphere Python software package snapshot Gregor Skok https://doi.org/10.5281/zenodo.15100264
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
722 | 0 | 3 | 725 | 0 | 0 |
- HTML: 722
- PDF: 0
- XML: 3
- Total: 725
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1