the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Predicting Forecast Errors with Diffusion Model for Uncertainty Quantification in Wind Speed Nowcasting
Abstract. Weather forecasts are inherently uncertain due to the chaotic nature of the atmosphere and unavoidable errors. Ensemble forecasting is the established approach for quantifying the uncertainty. However, it is both computationally expensive and inherently prone to under-dispersion, as it simulates multiple atmospheric trajectories with a finite number of members. In this study, we propose a novel paradigm that achieves uncertainty quantification by directly predicting forecast errors, thereby bypassing the need to simulate multiple trajectories. We employ a denoising diffusion probabilistic model for this task, as its generative capabilities are well-suited for learning high-dimensional distributions. By stochastically sampling from the learned distribution and adding the generated errors to the physics-based nowcast, an ensemble nowcast can be constructed efficiently without the need for perturbation generation or parallel model running. The proposed approach is applied to 10-meter wind speed nowcast, which is important but has received relatively limited attention in diffusion-based weather forecasting studies. Results show that the diffusion model effectively captures the spatial structure and probabilistic characteristics of forecast errors, leading to improved deterministic accuracy and a well-calibrated ensemble. In addition, different noise schedules for the diffusion process are systematically evaluated. The results indicate that the Cosine schedule provides the most reliable performance for uncertainty prediction, offering practical guidance for configuring diffusion models in weather forecasting applications.
- Preprint
(2834 KB) - Metadata XML
-
Supplement
(977 KB) - BibTeX
- EndNote
Status: open (until 16 Jun 2026)
- RC1: 'Comment on egusphere-2026-1438', Anonymous Referee #1, 17 May 2026 reply
-
RC2: 'Comment on egusphere-2026-1438', Anonymous Referee #2, 27 May 2026
reply
Manuscript Title: Predicting Forecast Errors with Diffusion Model for Uncertainty Quantification in Wind Speed Nowcasting
Authors: Yanwei Zhu, Aitor Atencia, Markus Dabernig, Yong Wang, Shuyan Zhou
Journal: Geoscientific Model Development (GMD) / EGUsphere
Recommendation: Major Revision
General Coments
The manuscript introduces a paradigm for uncertainty quantification in short-term weather forecasting, specifically focusing on high-resolution, 10-meter wind speed nowcasting. Traditional uncertainty quantification relies heavily on dynamical ensemble prediction systems (EPS) that simulate multiple atmospheric trajectories by perturbing initial conditions or physical parameterizations. While effective, EPS is computationally intensive for real-time applications and frequently suffers from under-dispersion. To overcome these constraints, the authors propose bypassing the generation of multiple physical trajectories altogether. Instead, they leverage a generative artificial intelligence framework—specifically a Denoising Diffusion Probabilistic Model (DDPM)—to directly model and predict the full conditional distribution of forecast errors, using a deterministic physical forecast as the conditioning background.
Data preprocessing includes a logarithmic transformation of wind speed data, which successfully converts skewed wind speed errors into a near-Gaussian distribution and mathematically restricts the reconstructed wind speed fields to positive values. A core contribution of the paper is the empirical evaluation of three distinct noise scheduling frameworks (Linear, Cosine, and Sigmoid) within the diffusion process. The experimental results demonstrate that the choice of the noise scheduler is critical for structural and statistical fidelity. The Cosine schedule outperforms the alternatives across multiple verification metrics, achieving optimal error scores and well-calibrated histograms. Furthermore, spatial power spectrum analysis confirms that the samples generated via the Cosine schedule preserve the correct kinetic energy cascade and spatial structures of the wind fields.
The manuscript is well-structured, the physical treatment of data via logarithmic transformation is sound, and the comparison of noise schedules (Linear, Cosine, and Sigmoid) provides valuable empirical insights for the atmospheric modeling community. However, there are critical limitations regarding the temporal coverage of the dataset, the severe seasonal bias in the verification phase, the lack of traditional statistical baselines, mathematical inconsistencies in the description of the diffusion framework, and several data visualization/structural clarity issues that must be addressed before publication.
Major Concerns (with line-by-line references)- (Lines 85–87): The total historical record used spans from 1 October 2021 to 30 June 2023. This represents less than two full years of atmospheric data. Since deep generative models like DMs typically benefit from a substantial and diverse set of samples to properly map high-dimensional chaotic fields, a dataset shorter than two years may increase the risk of overfitting to the specific intra-annual anomalies of those two cycles, potentially limiting the model's capacity to generalize to long-term climate variability. It would be highly beneficial if the authors could provide a brief justification as to why a dataset shorter than 24 months is sufficient for high-resolution (1 km) spatial error generation, or alternatively, consider expanding the training set by incorporating additional years of archival data from the SIVA system.
- (Lines 88–90 and Line 167): The authors state that the dataset was split chronologically, leaving the "remainder" for testing (which corresponds strictly to June 2023, as confirmed in Line 167). As defined in Lines 85–86, the spatial domain covers East China (115.35∘E−122.33∘E, 29.88∘N−35.81∘N). In this region, June is climatologically dominated by a meteorological regime characterized by stationary front dynamics. Evaluating the DDPM's performance exclusively during this single month could introduce a seasonal bias. Unless a specific seasonal behavior is being targeted (which should then be explicitly mentioned in the manuscript), it is recommended to evaluate the model across a broader range of seasonal conditions. Given the relatively small size of the dataset, a comprehensive evaluation might be challenging; however, adopting a validation strategy that ensures cross-seasonal representation would greatly help to demonstrate the model's generalizability across different atmospheric regimes.
- (Lines 68–70 and Lines 159–161): In Lines 68–70, the authors state that, unlike conventional statistical post-processing methods, their proposed model explicitly learns the full conditional distribution of errors. Later, in Lines 159–161, they explain the lack of a reference ensemble by noting that the operational SIVA system is purely deterministic. While this clarifies the absence of a dynamical ensemble comparison, it would be highly valuable to include a reference baseline for probabilistic verification. To more clearly demonstrate the added value and skill of the conditional estimation provided by the DDPM, the model could be benchmarked against standard reference baselines. Even a simple climatological distribution or a persistence-based error model would serve as an insightful benchmark to quantify the actual statistical improvement and contextualize the computational investment required by the diffusion architecture.
- (Lines 105–106): In Lines 105–106, the text states that μθ(xt,t) and σ2 are the mean and variance of the distribution at the previous step. While the explicit formulation to obtain μθ is provided in Equation 3, it would be helpful if the authors could explicitly define how σ2 is handled or computed.
- (Line 109 and Equation 4): In Line 109, the network is introduced as ϵθ(xt,t), yet in Equation 4, it appears as ϵθ(xt,y,t). To avoid ambiguity, the authors should clarify whether the conditional variable y should be consistently included in the notation throughout the text.
- (Figure 1 / Section 2.3): The flowchart in Figure 1, which outlines the overall framework of the model, can be somewhat difficult to follow, even when cross-referenced with the text explanation in Section 2.3. To further enhance its clarity and make it more intuitive for the reader, it is suggested to refine the representation of the data flow. Specifically, explicitly linking the visual blocks to the mathematical variables used in the text—such as the physical background (y), the forward/reverse processes (xt), and the final output—would significantly improve the diagram's transparency. Explicitly referencing the corresponding equation numbers (Equations 1–4) within the flowchart components would also greatly assist the community in understanding and reproducing the exact methodology.
3. Minor Concerns
- (Figure 6 / Section 4.1): In Section 4.1, specific conclusions are drawn based on the spatial patterns shown in Figure 6. However, the current color palette utilizes very soft gradients and smooth transitions, which can make it challenging for the reader to visually discern the differences highlighted in the text. It is suggested that the authors consider changing the color scale of Figure 6 to a high-contrast or perceptually uniform divergent palette to visually enhance and further substantiate the claims made in the manuscript.
- (Supplementary Material / Text Structure): The manuscript contains a prolonged text explanation discussing a specific figure that has been placed entirely in the Supplementary Material. When a figure requires such an extensive textual breakdown and is central to supporting the core arguments, forcing the reader to search for it in a separate document can disrupt the flow of reading. To make the paper more self-contained and streamline the reading process, it is recommended to move this specific figure from the supplementary material into the main manuscript, embedding it close to its corresponding text discussion.
Citation: https://doi.org/10.5194/egusphere-2026-1438-RC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 279 | 83 | 22 | 384 | 36 | 14 | 20 |
- HTML: 279
- PDF: 83
- XML: 22
- Total: 384
- Supplement: 36
- BibTeX: 14
- EndNote: 20
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Review of the manuscript
entitled: “Predicting Forecast Errors with Diffusion Model for Uncertainty Quantification in Wind Speed Nowcasting”
of authors: Yanwei Zhu, Aitor Atencia, Markus Dabernig, Yong Wang, Shuyan Zhou
Suggestion: reconsidered after major revisions
General comments:
The manuscript describes and evaluates an ensemble nowcasting system, which has been constructed with use of a denoising diffusion probabilistic model (DDPM). Though diffusion models (DM) are recently intensively studied in meteorology and weather forecasting, their ability to simulate the wind uncertainty has got less attention. Thus, the presented results can be very informative for those who would like to build a complex probabilistic nowcasting system based on a diffusion model. The authors succeeded to demonstrate with both statistical evaluation and case studies that such a model could be used for nowcasting of wind speed. The DDPM itself and methods used for its evaluation are generally well described and justified, with some exceptions (which will be commented below). It must be also appreciated that the performance of several noise schedules (Linear, Cosine, Sigmoid) was thoroughly tested and analyzed. Still, there are several aspects where the model could be improved, above all concerning extreme events – as it is admitted in the conclusion of the manuscript. The authors claim several times that one of the biggest advantages of the DM-s against other AI/ML techniques is the maintenance of physical consistency. But the evaluation is done only with respect to their own analyses and reference forecasts, one cannot find comparisons of the DDPM outputs with performance of AI models based on different principles. Thus, these statements would need some clarification. Also, the presented case studies could be analyzed and discussed with more details.
The manuscript can be considered as an interesting pilot study, introducing the application of DDPM in wind nowcasting. Despite the above-mentioned weaknesses and some deficiencies in the presentation of the schemes and results, the manuscript exhibits an overall good quality and after corrections and clarifications, it could be appropriate for publication.
Specific comments:
Between the Lines 80-85: “The learning objective of DDPM is the error of SIVA nowcast, defined as the difference between forecast and the corresponding analysis field.“
Reviewer (R): Usually, objective analysis also contains errors, above-all in grid-points laying far from observations, in complex terrain, etc. How it is in the case of SIVA nowcast, how much could the error of the analysis influence the DDPM training and the magnitude of forecast errors? Can you estimate somehow the accuracy of analyses used (e.g. have you done cross-validation)? Can you comment on this?
85: The Fig. S1 in the supplementary materials shows the domain only very schematically. One has no information about the orography of the area (e.g. are there also high mountains?), which can be important from the wind nowcast point of view. Also, it is rather unusual that the domain is presented in the supplementary material and it is not one of the figures of the manuscript.
85-90: „The data was spilt into nonoverlapping parts for training (1 October 2021‒30 April 2023), validation (1 May 2023‒31 May 2023), and testing (remainder).“
R: Later in the conclusion of the manuscript (lines 350-355) you mention that DDPM forecasts are still too smooth to reproduce some extreme events. Is it perhaps related to too short (~1,5 year) training period? Was there any particular reason for choosing the period of this length?
100-105: In the Section 3.1., the meaning of variables „q“ and „p“, which are products of Equations (1) and (2) is not explicitly mentioned, though one could guess that these are probability distributions. Also „I“ is not explained (Identity matrix?). I would recommend to specify it, for bigger clarity.
Figure1: The figure might help the reader to understand the training process. But I do not find it to be self-explanatory enough and it takes some time to interpret it. Hence I would propose to improve this Figure and to relate it better with the corresponding text in 3.2. and equations in 3.1. For example, one should easily find, where is the start of the training (this is the creation of „Errors“ in the upper left corner of the Figure). You could highlight this step somehow (e.g. with a letter or number) and refer to it also in the text of 3.2. Similarly also some other parts of the DDPM framework, the cycle of the sampling process, etc. as one might understand in detail, what is depicted on the Figure. There is also a 3D graph (maybe a distribution function, product of the Denoiser?) on the middle, right hand side of the Figure, which has no name and explanation. Please, amend that as well.
150: „For verification, the SIVA forecast errors are designated as the reference for evaluating the DDPM, while the analysis fields serve as benchmark for the ensemble nowcasts.“
R: Did you use every grid-point of the SIVA domain and the analysis for verification? This is not clear, as in the caption of the Fig. 2 you denote observations as „Truth“, while in the caption of the Fig. 4. the analysis field is denoted „Truth“. It should be clarified in the Metrics description that where are you using point observations and where nowcasting software analysis for verification and why. It must be also taken into account that the analysis can already exhibit errors so it can be considered only as a near description of the real state of the atmosphere.
155-160: „Given the negligible sensitivity of key verification metrics to ensemble size, an ensemble with 16 members was adopted to ensure statistical robustness while maintaining computational tractability.“
R: Can you support somehow your statement that the key verification metrics is not sensitive to ensemble size? Have you verified that? In the introduction, lines 35-45 you mention that „The ensemble often fails to fully characterize the true probability distribution“ or „However, a finite number of members cannot fully represent the true distribution, which inevitably leads to under-dispersion.“ This is seemingly in contradiction with your statement in the Evaluation Metrics part. How did you choose the 16 members then? Upon which criterion?
165-170: „The agreement of both the joint and marginal probability distributions with the benchmark (Fig. 2) demonstrates that the errors predicted by DDPM are physically consistent and statistically robust.“
R: Why do you think that if the statistical distribution of the evaluated model’s forecasts and forecast errors matches the benchmark (which is considered to be physically consistent), then it must be physically consistent, too? Can you prove it? Or can you cite examples that if the physical consistency would not be fulfilled (e.g. using different approaches) then the forecast error distribution would be different?
205-210: First sentence of 4.2: „Evaluations of the generated errors reveal that DDPM captures the physical characteristics of forecast errors, thereby learning more than just their statistics.“
R: As for 165-170, I am not convinced that this was really shown as there was no comparison with other methods, where we would expect “only” learning statistics.
255-265 and a part of the Caption in Figure 7:
R: The discussion on the Brier Score and Brier Skill Score should be moved to 3.3 (Evaluation Metrics). Also the note on the use of climatology probability in BSS calculation. You could also mention, why the climatology probability was chosen to be a reference. Though a standard practice, it may have some implications for verification of rare events, especially when you used only the training dataset (~1,5 year).
285-295: Figure S2: You have quite a lot of description (one paragraph) concerning the output of Fig. S2 (Probability diagram of wind ensemble nowcasting in different lead time for three schedules with threshold 1 ms-1). It would be fair to present it as a part of standard Figures of the manuscript, e.g. as Fig. 9a. The other diagram concerning the higher threshold of 10.8 ms-1 could be the Fig. 9b.
295: You probably erroneously refer to Fig. S2, while the diagram in the current Fig.9 (for the threshold 10.8 m/s) is not referenced in the text at all. As mentioned above, consider moving Fig. S2 to Fig. 9a and denote current Figure 9 as Fig. 9b.
315-325 and Figure 10: You mention Comparison with “truth” or “Ground Truth” although this is probably only the analysis of your deterministic nowcasting system and it may contain errors as I have mentioned earlier. It would be probably better to denote it as Analysis.
Figure 10: A mistype occurred in the caption of the figure: Instead of “Ground Turth” there should be “Ground Truth” or even better “Analysis”.
Figure 10: “(c) Ensemble Mean forecast”
R: I do not understand why you did not show the forecasts of ensemble maxima, not even in the supplementary materials. Though, this is a parameter, which is often used in weather forecasting, especially for severe weather warnings. And you even note the “smoothing effect on the ensemble mean” in the text below the Figure (line 330). How could we know that the “strong wind band” considered as false alarm (319-320) and appearing in the deterministic nowcast (REF) on Fig. 10b would be not reproduced by some of the DDPM ensemble members? Only comparison with ensemble maximum of wind could exclude that. In the supplementary Fig. S3 one can see that although less expressed than in REF, but certain members (e.g. 9,10,12) show a signal for such wind band in that area.
330-335 “The smoothing effect on the ensemble mean was also reduced, and the
probabilistic forecast shows finer and more accurate details.”
R: It would be nice to mention an example. In addition, it would be perhaps noteworthy to highlight as an interesting feature that there is an area of reduced wind speed on the right edge of the domain (nearly in the middle, over the sea), visible in all outputs valid for the +1h time (Figure 11). Although it vanishes in both analysis and reference as the wind strengthens in time, it remains in the DDPM forecast until +6 hour, which suggests that the DDPM can exhibit certain “inertia” in some cases, even against its reference.
350-355 “However, for extreme events, the model still suffers from the issue of
excessive smoothness. This limitation may stem from the use of a basic diffusion model architecture.”
R: Is there not an additional problem that you used a relatively short (~1.5 year) training period? See my previous comment for the lines 85-90.
370-375 “… while maintaining computational efficiency and physical consistency.”
R: Can you specify how much is DDPM computationally efficient? E.g. against a traditional nowcast (e.g. of your reference nowcast or some different nowcasts/ensembles).
Review criteria:
R: Yes, The question of using AI/ML methods and DM-s for high resolution nowcasting systems is a very relevant topic in current weather forecasting.
R: Yes, several ones (use of DM-s for wind nowcasting, analysis of results with respect to different schedules, etc.).
R: Yes, the use of DM-s in wind nowcasting as presented is advantageous due to its computational efficiency and ability of direct estimation of forecast errors. This could enable to construct forecast ensembles, which would be difficult and computationally very demanding with classic NWP approaches at such high spatial and temporal resolution.
R: Mostly yes, see the specific comments above for the exceptions.
R: Mostly yes, see the specific comments above for the exceptions. Some doubts are concerning physical consistency of the produced forecast errors and results of the case study.
R: I think that a reviewer could hardly verify this in reasonable time. But from the description of the model and methods and upon the claim in the “Code and data availability” (all the links were available by 17 May 2026) I believe that most of the results, if not all, are reproducible.
R: Yes.
R: Yes.
R: Yes.
R: Mostly yes, see some suggestions how to improve it in the specific comments.
R: Yes.
R: I missed the description of some variables (though probably obvious for experts in DM modeling area) and asked for amendment. See the specific comments above.
R: I made suggestions for some rearranging of the figures (Fig. S2, Fig. 9, etc.) – see the specific comments.
R: Yes.
R: There are links to code and data availability after conclusion, which might be sufficient for readers, who are highly interested in testing the DDPM. I suggested to move certain figures from the supplementary material to the main part of the manuscript as these were mentioned and discussed in the text and seemed to be important (see the specific comments). As a reader, I would personally prefer only such figures among supplementary materials, which are for “further reading”, as a kind of appendix, and are not as important and not discussed in the main part of the text. To search for both “standard” figures and supplementary figures during reading is a little bit disturbing to me.