Spread/Error relationship and spatial error structure of precipitation ensemble nowcasting: Comparison of STEPS and generative AI

Bonte, Martin; De Cruz, Lesley; Debal, Fabian; Vannitsem, Stéphane

doi:10.5194/egusphere-2026-1460

Preprints

https://doi.org/10.5194/egusphere-2026-1460

Preprints

23 Apr 2026

| 23 Apr 2026

Spread/Error relationship and spatial error structure of precipitation ensemble nowcasting: Comparison of STEPS and generative AI

Martin Bonte, Lesley De Cruz, Fabian Debal, and Stéphane Vannitsem

Abstract. The predictability of the generative AI-based nowcasting model LDCast is evaluated over Belgium, together with the pysteps implementation of the nowcasting algorithm STEPS. Neither STEPS nor LDCast were fine-tuned for the Belgian region, so both models are evaluated under conditions in which they will most likely be used in practice at national weather offices. STEPS and LDCast are slightly underdispersive, but the ensemble spread provides an estimation of the error at almost all scales. Both models adapt the properties of their ensembles to the type of event, either convective or stratiform. The spatial scores of the STEPS and LDCast ensembles are compared with those of surrogate ensembles, revealing that both STEPS and LDCast have very little ability to spatially localise the error of the ensemble mean. This suggests that the content of STEPS and LDCast ensembles is informative in terms of statistics, but not in terms of dynamics.

Received: 16 Mar 2026 – Discussion started: 23 Apr 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3157 KB)

Supplement (3045 KB)

Download & links

Martin Bonte, Lesley De Cruz, Fabian Debal, and Stéphane Vannitsem

Status: final response (author comments only)

RC1:
'Comment on egusphere-2026-1460', Anonymous Referee #1, 02 Jun 2026

This paper looks at the scale dependence of spread and skill in two ensemble radar nowcasting systems: STEPS, a conventional Lagrangian extrapolation nowcaster that includes stochastic variability in a scale-dependent way, and LDCast, a generative AI model. The paper uses a variety of diagnostics to evaluate the performance as function of scale and lead time. The overall conclusion for both models seems to be that they capture the spatial variance spectrum of the observations, but show no skill in predicting the phase on scales that are not predicted by the ensemble mean. The results are interesting, though not entirely unexpected given the information available to the models. The analysis is well-done, but the paper was difficult to read, mainly because it uses unfamiliar diagnostics without much explanation. I have therefore recommended minor revisions, mostly to improve the clarity of the text.
Minor comments:
1. The title refers to spread/error and spatial error structures, but mainly uses spectra to examine scale dependence. I would suggest changing the title to more closely match the main results as listed in the introduction.
2. Spread and skill are indeed discussed in section 3.1, but never mentioned again. Could the diagnostics in the later sections be used to explain the features that were pointed out in Fig. 2?
3. I am not familiar with looking at the eigenvalues of the covariance across ensemble members. Could the authors supply reference to previous applications of this method, or if it is novel, give some more motivation for what questions it can answer and why it will be useful here? I am tempted to interpret it by analogy to PCA, in which case it might be interesting to look at the spatial patterns (analogous to EOFs), especially for STEPS where a few leading eigenvalues dominate the response.
4. The discussion of 2D versus 3D turbulence around line 159 is not entirely correct. The mesoscale is not like 3D turbulence except in that it appears to share the -5/3 energy spectrum. It is this, not the dimensionality, that matters for the rate of error growth, as shown by Rotunno and Snyder (2008) using the Lorenz (1969) model.
5. The decrease at small scales with time that is noted at line 185 appears to be only true for LDCast.
6. The introduction to section 3.3 should include more explanation of the MAAFT ensembles. It is noted that they have the same distribution of rainfall intensities and residual vectors with the same power spectra, but the reader has to go to the appendix to find out how they are different, and then work out for themselves if this is an appropriate reference ensemble for the questions considered here. The SPEC ensemble is also an interesting comparison and should be introduced properly.
7. Regarding the discussion of phase around line 225, isn't this behaviour produced by design in STEPS?

Citation: https://doi.org/10.5194/egusphere-2026-1460-RC1
- AC1: 'Reply on RC1', Martin Bonte, 08 Jul 2026
  
  Thank you very much for your comment. Please find the full reply in the attached file.
  
  Citation: https://doi.org/10.5194/egusphere-2026-1460-AC1
RC2:
'Comment on egusphere-2026-1460', Anonymous Referee #2, 09 Jun 2026

The paper compares two different ensemble nowcasting algorithms regarding their spatial error structure and the relationship between ensemble spread and nowcast error. On the one hand**,** the well-established classical STEPS method and**,** on the other hand**,** the modern generative AI approach LDCast are analyzed across different lead times and spatial scales of the forecast using metrics based on spectral variance, error, as well as covariance matrix eigenvalues. The main innovation is the comparison against surrogate ensembles derived by the MAAFT method described in the appendix. The analysis is conducted thoroughly, and the results appear sound and particularly interesting from both an academic and operational point of view. Most of the following specific comments aim to elicit clearer explanations or further discussion on particular points; therefore, I recommend the paper be published with minor revisions.

Specific comments:
(1) The introduction does not make the objective of this research entirely clear. In line 45, the authors state that the central question of the paper is to characterize the information contained in the ensembles generated by the nowcasting algorithms. Moreover, the authors prematurely present their results in this section. I recommend formulating clear research questions instead, and omitting the results here.
(2) Several of the diagnostic tools utilized in this paper would benefit from additional context. In particular, the eigenvalue method based on the ensemble covariance matrix requires further explanation. Did the authors develop this method specifically for this study, or are there existing applications of this approach for similar problems in the literature?
(3) To my knowledge, the original LDCast method was trained using rainfall rates (the MeteoSwiss RZC product). In contrast, this study uses 5-minute rainfall sums resulting from a rain-gauge adjustment using Kriging with External Drift. Previous research has demonstrated that rain-gauge adjustment via geostatistics tends to smooth spatial rainfall fields. While this may not significantly impact the overall results, a brief discussion regarding this potential effect should be included in the manuscript.
(4) The use of surrogate ensembles based on MAAFT is central to reaching one of the paper's main conclusions: namely, that neither STEPS nor LDCast ensembles contain dynamical information regarding the spatial localization of the error. The authors should incorporate an explanation of this method into Section 3, rather than relegating it almost entirely to the appendix.
(5) It would be highly valuable to include the authors' perspective on how the results of this paper might translate to other AI nowcasting methods. It seems unlikely that alternative approaches, such as DGMR, would be capable of providing true dynamic forecast uncertainty as defined in this study. The authors are encouraged to elaborate on this point in the discussion.
(6) The same applies to retraining. Do the authors expect to obtain significantly different results when LDCast is retrained using the Belgian RADCLIM dataset? Furthermore, what are the implications of the chosen training strategy on the expected results? To my knowledge, LDCast was trained using radar crops of different sizes while also applying data augmentation techniques such as random rotations and flipping.

Minor technical corrections:
Line 26: The first occurrence of LDCast needs a reference

Line 74: Clarify BPS method

Line 77: Replace RMI with Royal Meteorological Institute of Belgium (RMI)

Citation: https://doi.org/10.5194/egusphere-2026-1460-RC2
- AC2: 'Reply on RC2', Martin Bonte, 08 Jul 2026
  
  Thank you very much for your comment. Please find the full reply in the attached file.
  
  Citation: https://doi.org/10.5194/egusphere-2026-1460-AC2

Martin Bonte, Lesley De Cruz, Fabian Debal, and Stéphane Vannitsem

Supplement

https://doi.org/10.5194/egusphere-2026-1460-supplement

Martin Bonte, Lesley De Cruz, Fabian Debal, and Stéphane Vannitsem

Viewed

Total article views: 349 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
246	74	29	349	36	21	20

HTML: 246
PDF: 74
XML: 29
Total: 349
Supplement: 36
BibTeX: 21
EndNote: 20

Views and downloads (calculated since 23 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	97	30	7	134
May 2026	107	23	14	144
Jun 2026	36	13	5	54
Jul 2026	6	8	3	17

Cumulative views and downloads (calculated since 23 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	97	30	7	134
May 2026	107	23	14	144
Jun 2026	36	13	5	54
Jul 2026	6	8	3	17

Viewed (geographical distribution)

Total article views: 340 (including HTML, PDF, and XML) Thereof 340 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 16 Jul 2026

Short summary

The predictability of the generative AI-based nowcasting model LDCast is evaluated over Belgium, together with the pysteps implementation of the nowcasting algorithm STEPS. It appears that the ensembles of both models correctly estimate the error size through their spread, but fail at spatially representing the error. The analysis is done for two dynamically different types of events, showing how the models adapt their ensembles depending on the situation.


Total:	0
HTML:	0
PDF:	0
XML:	0