the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Impact of reduced non-Gaussianity on analysis and forecast accuracy by assimilating every-30-second radar observation with ensemble Kalman filter: idealized experiments of deep convection
Abstract. This study investigates the impact of very high frequency data assimilation on analysis and forecast accuracy with the local ensemble transform Kalman filter for idealized deep convection. Previous studies showed that assimilating every 30 seconds data from Phased Array Weather Radar (PAWR) alleviates the problem of strongly non-Gaussian error probability distribution due to rapid nonlinear evolution of deep convection in real-world cases. This study aims to understand better the pure impact of non-Gaussian distribution and performs perfect model observing system simulation experiments with radar reflectivity every 5 minutes and 30 seconds. The idealized experimental settings have unique advantage in verifications for unobserved variables since it was unclear in the previous studies with real-world data. The results show that every 30 seconds data assimilation contributes to a significant improvement of the analysis accuracy, particularly for vertical velocity associated with strong convection, although the impact on the forecast accuracy is limited. We also find a significant reduction in the non-Gaussianity of first guess ensemble. The impact of assimilation frequency on reducing non-Gaussianity is enhanced when the uncertainty in background wind or stability is included in the initial ensemble perturbation.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Nonlinear Processes in Geophysics.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(4322 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2543', Wei Han, 10 Jul 2025
This paper examines the effect of assimilating high-frequency radar observations on analysis and forecast accuracy in convection-permitting numerical weather prediction. The authors conduct idealized experiments using the local ensemble transform Kalman filter (LETKF) and find that, compared to a 5-minute assimilation interval, assimilating radar reflectivity every 30 seconds significantly reduces non-Gaussianity in the background error distribution and improves analysis accuracy, especially for vertical velocity. However, it does not significantly improve precipitation forecasts. Additionally, the study offers several insights into the initial perturbation scheme. The paper is well-organized but could be improved by addressing the following issues:
Possible typographical and grammatical errors:
- Line 112: ‘assimilation or Doppler velocity is also not considered’ should be changed to ‘assimilation of Doppler velocity is also not considered.’
- The caption for Table 1 should appear above the table, not below it.
- Line 140: ‘figures 1a-1c’ should be corrected to ‘figures 3a-3c.’
- The description for the sub-figures in the third row (e.g., Figures 4, 9, and 10) is not clearly presented.
- Line 221: ‘5min-3D’ should be replaced with ‘5MIN-3D’ for consistency.
- In the caption for Figure 9, either the color of the contours should be changed to purple, or the word ‘Purple’ should be changed to ‘Red’ to match the figure.
General recommendations:
- Given the design of these idealized experiments, 100 ensemble members are sufficient to reduce sampling error to a small degree. However, it would be beneficial to include a discussion on the impact of ensemble size on sampling error, or at least cite relevant previous studies in this area.
- It is reasonable to disable temporal localization for the 5MIN-4D case to ensure a fair comparison with the 30SEC case. However, it should be noted in the discussion that assimilating observations every 30 seconds is not currently practical in real-world operational systems. The aim of this study is to explore the underlying relationship between assimilation frequency and non-Gaussianity in an idealized setting, rather than to propose a practical assimilation strategy.
- The impact of assimilation frequency in idealized experiments has been previously discussed by [1].
- If possible, please provide a theoretical conclusion regarding the effect of assimilation frequency under the EnKF framework for non-Gaussian problems.
- This study shows that increasing assimilation frequency improves the analysis state but does not significantly improve forecast performance. Is this a coincidental result, or have similar findings been reported in other studies?
Reference:
[1] He, Huan, et al. "Impacts of assimilation frequency on ensemble Kalman filter data assimilation and imbalances." Journal of Advances in Modeling Earth Systems 12.10 (2020): e2020MS002187.
Citation: https://doi.org/10.5194/egusphere-2025-2543-RC1 -
AC1: 'Reply on RC1', Arata Amemiya, 18 Jul 2025
>This paper examines the effect of assimilating high-frequency radar observations on analysis and forecast accuracy in convection-permitting numerical weather prediction. The authors conduct idealized experiments using the local ensemble transform Kalman filter (LETKF) and find that, compared to a 5-minute assimilation interval, assimilating radar reflectivity every 30 seconds significantly reduces non-Gaussianity in the background error distribution and improves analysis accuracy, especially for vertical velocity. However, it does not significantly improve precipitation forecasts. Additionally, the study offers several insights into the initial perturbation scheme. The paper is well-organized but could be improved by addressing the following issues:
We thank very much Dr. Wei Han for the referee comments. We answer to each comment in the following.
>Possible typographical and grammatical errors:
>Line 112: ‘assimilation or Doppler velocity is also not considered’ should be changed to ‘assimilation of Doppler velocity is also not considered.’
>The caption for Table 1 should appear above the table, not below it.
>Line 140: ‘figures 1a-1c’ should be corrected to ‘figures 3a-3c.’
>The description for the sub-figures in the third row (e.g., Figures 4, 9, and 10) is not clearly presented.
>Line 221: ‘5min-3D’ should be replaced with ‘5MIN-3D’ for consistency.
>In the caption for Figure 9, either the color of the contours should be changed to purple, or the word ‘Purple’ should be changed to ‘Red’ to match the figure.Thank you for pointing them out.
Regarding line 140 'figures 1a-1c', we meant 'figures 2a and 2c', the horizontal and vertical cross sections of the nature run at the time.
Regarding the inconsistency between the caption and image of Figure 9, we found that we applied the contour color and interval setting which was used for Fig.4, although we intended to change them as written in the caption. We will revise Fig.9 and Fig.10. Also we will revise Fig.4, 9 and 10 to show clearly the color map title (KL div.). The image files for revised Fig.4,Fig.9 and Fig.10 are attached to my next comment.
For the rest, we will correct them as suggested in the next revised version.>General recommendations:
>Given the design of these idealized experiments, 100 ensemble members are sufficient to reduce sampling error to a small degree. However, it would be beneficial to include a discussion on the impact of ensemble size on sampling error, or at least cite relevant previous studies in this area.
We considered that using 100 ensemble members is sufficient given the small localization scale. We set 4 km and 2 km for horizontal and vertical localization length scale respectively, while we had 1 km and 200 m horizontal and vertical grid spacings, which correspond to at least 4 km and 800 m resolution, respectively, (according to https://glossary.ametsoc.org/wiki/Model_resolution). Therefore rough estimate of the effective degree of freedom of a localized ensemble background field is 2x2x5=20 for one variable. Therefore, we considered the ensemble size of 100 is larger or at least comparable to the effective degree of freedom of localized background error.
>It is reasonable to disable temporal localization for the 5MIN-4D case to ensure a fair comparison with the 30SEC case. However, it should be noted in the discussion that assimilating observations every 30 seconds is not currently practical in real-world operational systems. The aim of this study is to explore the underlying relationship between assimilation frequency and non-Gaussianity in an idealized setting, rather than to propose a practical assimilation strategy.
We have mentioned it lines 51-53 in the introduction. We will also add the following sentence after line 129 of Section 2.3.
"Although this choice might not be practical, we prioritize exploring the underlying relationship between assimilation frequency and non-Gaussianity in an idealized setting.">The impact of assimilation frequency in idealized experiments has been previously discussed by [1]. If possible, please provide a theoretical conclusion regarding the effect of assimilation frequency under the EnKF framework for non-Gaussian problems.
Thank you for introducing us the related previous study. This study focuses on timescales between 30 seconds and 5 minutes, which is much smaller than 1 to 6 hours discussed in the previous study. The process which adjusts imbalance is not gravity waves but acoustic waves and possibly moist convection. Therefore it is not straightforward to compare the findings in this study with the previous study, but it can be said that the more frequent (30-second) assimilation in this study was shown to be advantageous in improving analysis and subsequent forecast (next first guess) accuracy, as shown in Fig.4. This is thought to be due to smaller analysis increment at each step, causing less nonlinear error growth. I will add the article [1] in the reference list and this discussion in Section 5.
>This study shows that increasing assimilation frequency improves the analysis state but does not significantly improve forecast performance. Is this a coincidental result, or have similar findings been reported in other studies?
We consider the same result has not been reported in other studies, as not many existing studies addressed this topic focusing at a short time scale as 30 seconds assimilation cycle and 30 minutes forecast. However, we consider we can interpret this result on the analogy of a similar issue at longer time scales, which we have a consensus. In general, the impact of improved initial condition by data assimilation is dominant in earlier forecast time and overwhelmed by the impact of boundary conditions in later forecast time (mentioned in [2] for example). Then we expect that the accuracy of 30 minutes forecast is more controlled by a larger-scale atmospheric variable field, which is not significantly constrained by data assimilation with a small localization scale.
Reference:
[1] He, Huan, et al. "Impacts of assimilation frequency on ensemble Kalman filter data assimilation and imbalances." Journal of Advances in Modeling Earth Systems 12.10 (2020): e2020MS002187.
[2] Clark, Peter, et al. "Convection-permitting models: A step-change in rainfall forecasting." Meteorological Applications 23.2 (2016): 165-181.Citation: https://doi.org/10.5194/egusphere-2025-2543-AC1 - AC2: 'Reply on AC1', Arata Amemiya, 18 Jul 2025
-
RC2: 'Comment on egusphere-2025-2543', Zheqi Shen, 26 Jul 2025
The study investigates the impact of high-frequency radar data assimilation on analysis and forecasting accuracy, which is a topic of significant scientific and practical importance. The experimental design is rational and rigorous. By conducting idealized experiments, the authors successfully eliminate complex interfering factors present in real-world applications, thereby enhancing the credibility of their findings. The results demonstrate that assimilating radar data every 30 seconds can significantly reduce non-Gaussianity and improve the analysis accuracy of vertical velocity. These findings provide valuable insights for future research on radar data assimilation. Overall, the paper addresses a meaningful topic, features a well-designed experiment, and presents reliable results. It is recommended for publication. I suggest a minor revision.
The other reviewer, Dr. Wei Han, has already pointed out some details regarding the figures and several important general opinions, with which I fully agree. Here, I would like to add some of my personal concerns.
-
Presentation of Assimilation Results: The discussion of the assimilation results, such as in Figures 3, 4, 9, and 10, only shows the final assimilation at 00:50:00. While a single assimilation can demonstrate the improvement effects of different schemes, completely ignoring the entire assimilation process seems inappropriate. I suggest using time series of some metric (such as RMSE or spread) or showing errors and spread at several different moments to illustrate how the assimilation gradually takes effect and reaches stability.
-
Terminology in EnKF Context: I feel that the term "first-guess" is more commonly used in variational assimilation. In the context of EnKF assimilation, "prior" might be more suitable. This is just a personal suggestion.
-
Temporal Localization (Line 129): I am not quite familiar with the term "temporal localization." Does it equate to the description of using different weights for observations at different times? The 5MIN-4D scheme not only uses ten times the amount of data compared to 5MIN-3D but also assigns all data from different times to the 5th minute without increasing the standard deviation of observational errors. I think the current description is not detailed enough and should be improved for the 5MIN-4D scheme.
-
Introduction to LETKF: Although LETKF is a very well-known method, I believe it is necessary to briefly introduce LETKF in the section on the assimilation system, especially how the Gaussian assumption is embedded in its algorithm.
-
Inflation Setting (Line 75): I agree with the no-inflation setting, as inflation under different assimilation frequencies can significantly affect the spread. I think a discussion on the possible impact of inflation on the conclusions in practical scenarios could be added to the conclusion section.
-
Figure Presentation: Figures 4, 9, and 10 contain very rich information, requiring repeated reading between the text and the figures. I suggest adding the names of the experiments to the titles of subplots (a), (b), and (c) to facilitate reading. Moreover, the contour lines in Figures 9 and 10 are too thin and light in color, making them hard to see. They need to be improved. The shading information also needs to be displayed in softer colors or with increased transparency.
-
Clarification on Perturbations (Line 210): The description of perturbations is not clear. It appears that there are 10 background wind profiles (or background thermal profiles) perturbed, resulting in 100 members. This seems to be combined with the initial perturbation scheme in Section 2.4. What is unclear to me is whether the 10 perturbations are superimposed on the 100 perturbations from Section 2.4, or whether the 10 perturbations from Section 2.4 are combined with these 10. It is necessary to clarify how the 100 members are generated, rather than just stating the conclusion: "Both of those 10 sets include one true profile, indicating only 10 members have the correct background wind or stability profile. The other 90 members are biased and are expected to have significant errors in the evolution at a convective scale."
Citation: https://doi.org/10.5194/egusphere-2025-2543-RC2 -
AC3: 'Reply on RC2', Arata Amemiya, 12 Aug 2025
> The study investigates the impact of high-frequency radar data assimilation on analysis and forecasting accuracy, which is a topic of significant scientific and practical importance. The experimental design is rational and rigorous. By conducting idealized experiments, the authors successfully eliminate complex interfering factors present in real-world applications, thereby enhancing the credibility of their findings. The results demonstrate that assimilating radar data every 30 seconds can significantly reduce non-Gaussianity and improve the analysis accuracy of vertical velocity. These findings provide valuable insights for future research on radar data assimilation. Overall, the paper addresses a meaningful topic, features a well-designed experiment, and presents reliable results. It is recommended for publication. I suggest a minor revision.
> The other reviewer, Dr. Wei Han, has already pointed out some details regarding the figures and several important general opinions, with which I fully agree. Here, I would like to add some of my personal concerns.
We thank Dr. Zheqi Shen very much for the referee comments. We answer each comment in the following.
> Presentation of Assimilation Results: The discussion of the assimilation results, such as in Figures 3, 4, 9, and 10, only shows the final assimilation at 00:50:00. While a single assimilation can demonstrate the improvement effects of different schemes, completely ignoring the entire assimilation process seems inappropriate. I suggest using time series of some metric (such as RMSE or spread) or showing errors and spread at several different moments to illustrate how the assimilation gradually takes effect and reaches stability.
We agree on the importance of confirming the evolution of metrics over the period of data assimilation cycle. However, in this experiment, the deep convection rapidly develops and makes the concept of stability difficult to apply, because the number of assimilated observations, the value of maximum reflectivity, and the area of high reflectivity all evolve with time. The figure in the attached file "supplement_figureA_AC3.pdf" shows the evolution of domain-averaged RMSE and spread in reflectivity, number of assimilated observations, and maximum analysis vertical velocity in the case of 5MIN-3D. The assimilated observation has a peak at 20 minutes, as 'no precipitation' observation signals are assimilated to remove artificial convection at random locations triggered by the initial random perturbation in some members. As the area of high reflectivity unfolds, the observation number increases again after about 50 minutes, and RMSE and spread keep increasing. The maximum value of analysis mean vertical velocity reaches around the peak value of 40 m/s at 50 minutes. Therefore, we focus on the analysis at this time, considering that the data assimilation has run enough to ignore the effect of initial adjustment, although the metrics do not show convergence.
> Terminology in EnKF Context: I feel that the term "first-guess" is more commonly used in variational assimilation. In the context of EnKF assimilation, "prior" might be more suitable. This is just a personal suggestion.
We understand that using "prior" is more common in some groups in the EnKF community, while there are papers which chose to use "first-guess". As this study addresses the issue of non-Gaussianity, which was discussed in earlier studies such as Ruiz et al. 2021, which used "first-guess", we decide to use "first-guess" in this paper for consistency.
> Temporal Localization (Line 129): I am not quite familiar with the term "temporal localization." Does it equate to the description of using different weights for observations at different times? The 5MIN-4D scheme not only uses ten times the amount of data compared to 5MIN-3D but also assigns all data from different times to the 5th minute without increasing the standard deviation of observational errors. I think the current description is not detailed enough and should be improved for the 5MIN-4D scheme.
The idea of temporal localization is to impose a weighting factor which is a function of the time difference within the assimilation window, effectively changing the relative observation error. As you pointed out, in the 5MIN-4D case, when temporal localization is not used, all the observations are assimilated with the same prescribed observation error standard deviation. We will revise the paragraph 124-129 adding clearer and more detailed description, considering your and the other referee's comment.
> Introduction to LETKF: Although LETKF is a very well-known method, I believe it is necessary to briefly introduce LETKF in the section on the assimilation system, especially how the Gaussian assumption is embedded in its algorithm.We agree with your suggestion. We will add the paragraph to introduce LETKF after the first paragraph of Section 2.3. As it contains many equations, it is attached to this comment as "supplement_text_AC3.pdf".
> Inflation Setting (Line 75): I agree with the no-inflation setting, as inflation under different assimilation frequencies can significantly affect the spread. I think a discussion on the possible impact of inflation on the conclusions in practical scenarios could be added to the conclusion section.We will add a discussion in Section 5.
> Figure Presentation: Figures 4, 9, and 10 contain very rich information, requiring repeated reading between the text and the figures. I suggest adding the names of the experiments to the titles of subplots (a), (b), and (c) to facilitate reading. Moreover, the contour lines in Figures 9 and 10 are too thin and light in color, making them hard to see. They need to be improved. The shading information also needs to be displayed in softer colors or with increased transparency.We have revised Figures 4, 9, and 10, using lighter colormap for the shading. Please see the attached file "supplement_figureB_AC3.pdf" and confirm if the revised figures look clear.
> Clarification on Perturbations (Line 210): The description of perturbations is not clear. It appears that there are 10 background wind profiles (or background thermal profiles) perturbed, resulting in 100 members. This seems to be combined with the initial perturbation scheme in Section 2.4. What is unclear to me is whether the 10 perturbations are superimposed on the 100 perturbations from Section 2.4, or whether the 10 perturbations from Section 2.4 are combined with these 10. It is necessary to clarify how the 100 members are generated, rather than just stating the conclusion: "Both of those 10 sets include one true profile, indicating only 10 members have the correct background wind or stability profile. The other 90 members are biased and are expected to have significant errors in the evolution at a convective scale."
The 100 perturbations described in Section 2.4 are imposed in the same way. We will add the following sentence in the first paragraph of Section 4.1.
"The random perturbation described in Section 2.4 (Table 1) is imposed on each member in the same way as before."
-
Model code and software
SCALE-LETKF Arata Amemiya et al. https://doi.org/10.5281/zenodo.13906038
SCALE Team SCALE https://scale.riken.jp/
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
705 | 62 | 14 | 781 | 33 | 38 |
- HTML: 705
- PDF: 62
- XML: 14
- Total: 781
- BibTeX: 33
- EndNote: 38
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1