the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Spread/Error relationship and spatial error structure of precipitation ensemble nowcasting: Comparison of STEPS and generative AI
Abstract. The predictability of the generative AI-based nowcasting model LDCast is evaluated over Belgium, together with the pysteps implementation of the nowcasting algorithm STEPS. Neither STEPS nor LDCast were fine-tuned for the Belgian region, so both models are evaluated under conditions in which they will most likely be used in practice at national weather offices. STEPS and LDCast are slightly underdispersive, but the ensemble spread provides an estimation of the error at almost all scales. Both models adapt the properties of their ensembles to the type of event, either convective or stratiform. The spatial scores of the STEPS and LDCast ensembles are compared with those of surrogate ensembles, revealing that both STEPS and LDCast have very little ability to spatially localise the error of the ensemble mean. This suggests that the content of STEPS and LDCast ensembles is informative in terms of statistics, but not in terms of dynamics.
- Preprint
(3157 KB) - Metadata XML
-
Supplement
(3045 KB) - BibTeX
- EndNote
Status: open (until 24 Jun 2026)
- RC1: 'Comment on egusphere-2026-1460', Anonymous Referee #1, 02 Jun 2026 reply
-
RC2: 'Comment on egusphere-2026-1460', Anonymous Referee #2, 09 Jun 2026
reply
The paper compares two different ensemble nowcasting algorithms regarding their spatial error structure and the relationship between ensemble spread and nowcast error. On the one hand**,** the well-established classical STEPS method and**,** on the other hand**,** the modern generative AI approach LDCast are analyzed across different lead times and spatial scales of the forecast using metrics based on spectral variance, error, as well as covariance matrix eigenvalues. The main innovation is the comparison against surrogate ensembles derived by the MAAFT method described in the appendix. The analysis is conducted thoroughly, and the results appear sound and particularly interesting from both an academic and operational point of view. Most of the following specific comments aim to elicit clearer explanations or further discussion on particular points; therefore, I recommend the paper be published with minor revisions.
Specific comments:
(1) The introduction does not make the objective of this research entirely clear. In line 45, the authors state that the central question of the paper is to characterize the information contained in the ensembles generated by the nowcasting algorithms. Moreover, the authors prematurely present their results in this section. I recommend formulating clear research questions instead, and omitting the results here.
(2) Several of the diagnostic tools utilized in this paper would benefit from additional context. In particular, the eigenvalue method based on the ensemble covariance matrix requires further explanation. Did the authors develop this method specifically for this study, or are there existing applications of this approach for similar problems in the literature?
(3) To my knowledge, the original LDCast method was trained using rainfall rates (the MeteoSwiss RZC product). In contrast, this study uses 5-minute rainfall sums resulting from a rain-gauge adjustment using Kriging with External Drift. Previous research has demonstrated that rain-gauge adjustment via geostatistics tends to smooth spatial rainfall fields. While this may not significantly impact the overall results, a brief discussion regarding this potential effect should be included in the manuscript.
(4) The use of surrogate ensembles based on MAAFT is central to reaching one of the paper's main conclusions: namely, that neither STEPS nor LDCast ensembles contain dynamical information regarding the spatial localization of the error. The authors should incorporate an explanation of this method into Section 3, rather than relegating it almost entirely to the appendix.
(5) It would be highly valuable to include the authors' perspective on how the results of this paper might translate to other AI nowcasting methods. It seems unlikely that alternative approaches, such as DGMR, would be capable of providing true dynamic forecast uncertainty as defined in this study. The authors are encouraged to elaborate on this point in the discussion.
(6) The same applies to retraining. Do the authors expect to obtain significantly different results when LDCast is retrained using the Belgian RADCLIM dataset? Furthermore, what are the implications of the chosen training strategy on the expected results? To my knowledge, LDCast was trained using radar crops of different sizes while also applying data augmentation techniques such as random rotations and flipping.
Minor technical corrections:
Line 26: The first occurrence of LDCast needs a reference
Line 74: Clarify BPS method
Line 77: Replace RMI with Royal Meteorological Institute of Belgium (RMI)Citation: https://doi.org/10.5194/egusphere-2026-1460-RC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 240 | 66 | 26 | 332 | 35 | 17 | 16 |
- HTML: 240
- PDF: 66
- XML: 26
- Total: 332
- Supplement: 35
- BibTeX: 17
- EndNote: 16
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This paper looks at the scale dependence of spread and skill in two ensemble radar nowcasting systems: STEPS, a conventional Lagrangian extrapolation nowcaster that includes stochastic variability in a scale-dependent way, and LDCast, a generative AI model. The paper uses a variety of diagnostics to evaluate the performance as function of scale and lead time. The overall conclusion for both models seems to be that they capture the spatial variance spectrum of the observations, but show no skill in predicting the phase on scales that are not predicted by the ensemble mean. The results are interesting, though not entirely unexpected given the information available to the models. The analysis is well-done, but the paper was difficult to read, mainly because it uses unfamiliar diagnostics without much explanation. I have therefore recommended minor revisions, mostly to improve the clarity of the text.
Minor comments:
1. The title refers to spread/error and spatial error structures, but mainly uses spectra to examine scale dependence. I would suggest changing the title to more closely match the main results as listed in the introduction.
2. Spread and skill are indeed discussed in section 3.1, but never mentioned again. Could the diagnostics in the later sections be used to explain the features that were pointed out in Fig. 2?
3. I am not familiar with looking at the eigenvalues of the covariance across ensemble members. Could the authors supply reference to previous applications of this method, or if it is novel, give some more motivation for what questions it can answer and why it will be useful here? I am tempted to interpret it by analogy to PCA, in which case it might be interesting to look at the spatial patterns (analogous to EOFs), especially for STEPS where a few leading eigenvalues dominate the response.
4. The discussion of 2D versus 3D turbulence around line 159 is not entirely correct. The mesoscale is not like 3D turbulence except in that it appears to share the -5/3 energy spectrum. It is this, not the dimensionality, that matters for the rate of error growth, as shown by Rotunno and Snyder (2008) using the Lorenz (1969) model.
5. The decrease at small scales with time that is noted at line 185 appears to be only true for LDCast.
6. The introduction to section 3.3 should include more explanation of the MAAFT ensembles. It is noted that they have the same distribution of rainfall intensities and residual vectors with the same power spectra, but the reader has to go to the appendix to find out how they are different, and then work out for themselves if this is an appropriate reference ensemble for the questions considered here. The SPEC ensemble is also an interesting comparison and should be introduced properly.
7. Regarding the discussion of phase around line 225, isn't this behaviour produced by design in STEPS?