the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multigrid Beta Filter for Faster Computation of Ensemble Covariance Localization
Abstract. This study applies a multigrid beta filter (MGBF) for covariance localization in ensemble-variational (EnVar) data assimilation instead of the conventional recursive filter (RF) to achieve faster computation in a large number of processors. The parallelization efficiency of the MGBF is higher than that of the RF because all-to-all communication to change the computational region of each processor is not necessary. However, the MGBF-based localization additionally requires horizontal variable exchange between processors; its computational cost is proportional to the number of grid points and to the ensemble size, and is generally more expensive than the RF. In this study, we implement the MGBF-based localization both for the single-scale localization and for the scale-dependent localization in the regional atmospheric EnVar data assimilation system. In addition, we clarify that applying a coarser filter grid and omitting filtering except for the coarsest resolution make the computation of the MGBF-based localization several times faster than that of the RF-based one without significantly changing the EnVar analysis.
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-1866', Anonymous Referee #1, 09 Jun 2025
Major comments:
In the original MGBF design (Purser et al., 2022), the filter is applied hierarchically across multiple resolutions (g₁, g₂, …, gₙ), with each level contributing to the final covariance operator. This multiscale construction is central to MGBF's ability to approximate broad localization functions and capture anisotropic or spatially inhomogeneous structures. The process involves adjoint and direct filtering at each grid level (see Eq. 18 and Purser et al., MWR 2022, p. 722), and the results are additively combined (Eqs. 16–17), ensuring smoothness, self-adjointness, and scalability.
In contrast, the present manuscript adopts a significant simplification: filtering is applied only at the coarsest filter grid, with no filtering at finer levels. This is a clear deviation from the original formulation, and although the authors mention it is for computational efficiency (Lines 99 and 306), the implications of this choice are not adequately discussed. Specifically, the manuscript should examine:
- How this approximation affects the effective shape of the localization function, especially for short localization length scales (e.g., 20 km);
- Whether it risks degraded performance (e.g., loss of sharpness or spurious correlations) in such cases;
- Whether the approximation is acceptable only in certain regimes, such as large-scale SDL with long localization radii, or whether it generalizes more broadly.
Clarifying these points would help readers understand the trade-offs and limitations of this modified implementation.
Minor comments:
Line 55, The current title of Section 2.1, "Ensemble-variational (EnVar) data assimilation", does not reflect the fact that this subsection includes a detailed mathematical formulation of scale-dependent localization (SDL) as applied in the GSI-based 3DEnVar system. In particular, Eqs. (3) and (4) describe the decomposition of ensemble perturbations across multiple spatial scales and the corresponding block-structured localization matrix.Since SDL is a significant methodological feature of the paper, both in terms of formulation and in experimental comparisons (e.g., RFSDL vs. MGBF04SDL), I recommend updating the subsection title to something more precise, such as “2.1 Ensemble-variational (EnVar) data assimilation with scale-dependent localization”.
In Line 105, the manuscript states that interpolations are performed “from g₁ to the analysis grid g₀.” Since g₁ is referred to as the “finest filter grid,” it may be misinterpreted as having equal or higher resolution than g₀. However, based on Table 2, g₁ can in fact be coarser than the analysis grid (e.g., in MGBF03–04). I suggest the authors clarify the resolution relationship between g₀ and g₁ to avoid potential confusion.
In Line 137, the authors mention that the analysis grid resolution is twice as coarse as the FV3LAM model grid (i.e., 6 km vs. 3 km), but do not provide any justification or discussion of this design choice. Since this resolution difference could affect the representativeness or accuracy of of the ensemble background error representation, localization, and filter application (especially given the role of multigrid interpolation in MGBF), it would be helpful if the authors could clarify:- The rationale for using a coarser analysis grid (e.g., computational efficiency, memory constraints, etc.),
- Whether this design introduces any limitations or trade-offs in terms of representativeness or localization sharpness,
- And whether the MGBF design is sensitive to the resolution mismatch between the filter grid and the model grid.
Table 2: The symbol “–” appears in several columns (e.g., "Number of the finest filter grids", "Weight of (g₁, g₂, g₃, g₄)", filter specifications), but its exact meaning is not defined. It is unclear whether “–” indicates “not applicable,” “not used,” “same as previous case,” or “no filtering applied.” To improve clarity and reproducibility, I suggest the authors include a footnote or caption line in Table 2 to explicitly define what “–” represents in each context.
Lines 248–250 and elsewhere:The sentence beginning with “Nevertheless, the difference from RF…” is grammatically correct, but a bit hard to follow due to its length and repeated comparative structure. With multiple experiments and color-coded references mentioned together, the logical comparison becomes difficult to parse.
I suggest breaking it into two simpler sentences or rephrasing it for clarity. For example:
“MGBF04σ showed a smaller deviation from RF than MGBF04. Similarly, MGBF04σSDL was closer to RFSDL than MGBF04SDL.”
In fact, similar long and repetitive sentence constructions appear in several other places in the manuscript. I recommend that the authors go through the manuscript to revise such sentences for improved readability and flow.
Figure 9: there seems to be a mismatch between the panel labels and their descriptions in the caption. Based on the plotted content, panels (a) and (b) appear to show RMSE and bias for temperature, while (c) and (d) show RMSE and bias for horizontal wind. However, the caption currently states that (a, c) are temperature and (b, d) are wind, which appears to be incorrect.
Lines 305-306: The sentence “… and showed how to prevent the computational problem found in applying it” reads a bit awkwardly. The phrase “prevent the computational problem” is not the best fit here, since the issue already occurred during implementation.
Citation: https://doi.org/10.5194/egusphere-2025-1866-RC1 - AC4: 'Reply on RC1', Sho Yokota, 26 Jun 2025
-
RC2: 'Comment on egusphere-2025-1866', Benjamin Ménétrier, 10 Jun 2025
Major comments:
As mentioned by referee #1 in his/her major comment, this paper is a restrictive application of the full MGBF method. With only one filtering level left, equation 10 seems similar to the NICAS method I developed independently in 2020 (https://doi.org/10.5281/zenodo.4058620, and mentioned in equation 10 of https://doi.org/10.5194/gmd-15-7859-2022). However, the NICAS method can use adaptive unstructured subgrids, handle complex boundaries, and produce inhomogeneous and anisotropic localization functions.
This kind of explicit convolution method on coarse subgrids is computationally efficient when the localization length-scale is large compared to the analysis grid cell size, since the subgrid can be coarse. However, I agree with referee #1 that it can become very expensive for smaller localization length-scales, because in this case a fine subgrid must be kept to maintain the localization function sharpness.
Another issue properly handled in the NICAS method and missing here is the localization normalization (i.e. diagonal coefficients of the localization matrix should all be equal to one). Figure 4 suggests that the MGBF method is perfectly normalized with all curves going to 1 at zero separation. However, I believe this is true only if the observation is located on a coarse grid node. Indeed, even if the continuous function B_p(x) is normalized (as mentioned after equation 11), the discrete low-resolution filters F_{BF} might not be, and even if they were, the final interpolation to then analysis grid would break this normalization. Only an outer diagonal scaling matrix taking all the operators (filters and interpolations) into account can ensure a proper normalization.
Minor comments:
In section 2.1, equation (2) is already an approximation of the general 3DEnVar formulation. Indeed, the authors are using the same 3D localization matrix for all the auto- and cross-localization blocks between different analysis variables. This method is sometimes referred to as "Mark Buehner's trick" (used in https://doi.org/10.1175/2009MWR3157.1 and clearly described in section 3.4.2. of https://doi.org/10.1002/qj.2325). It assumes that all the analysis variables have roughly the same error correlation length-scale. Whether this assumption holds here or not, I think it should be mentioned.
In equation (10) of section 2.3, the rightmost interpolation operator (D from g1 to gt) is actually not required if only one grid and one scale are used, as DD^T = I. If several grids are needed (e.g. g2 and g4 as in experiment MGBF03SDL), this interpolation operator is required to combine the scales with operator E, but the destination grid should be the finest grid used (here g2), not necessarily g1.
Finally, I think that the experiments with slightly reduced length-scales (with a sigma suffix) are not really necessary. As shown in https://doi.org/10.1175/MWR-D-22-0255.1, the analysis quality is not very sensitive to the localization length-scale, as long as this length-scale is good enough. Given all the other uncertainties about the localization function shape and the fact that it should actually be anisotropic and inhomogeneous, the optimization of the localization length-scale does not seem really relevant here. Removing it (or better keeping it and removing the non-sigma case) would make the article a bit lighter and easier to read.
Citation: https://doi.org/10.5194/egusphere-2025-1866-RC2 - AC5: 'Reply on RC2', Sho Yokota, 26 Jun 2025
-
CEC1: 'Comment on egusphere-2025-1866 - No compliance with the policy of the journal', Juan Antonio Añel, 20 Jun 2025
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.Also, for the observational data, you have stored it in service from a commercial provider. We can not accept this. You must store the data in one of the repositories listed in our policy, and share here the link and permanent identifier (e.g. DOI) for it.
I must note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the links and permanent identifiers of the new repositories.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/egusphere-2025-1866-CEC1 -
AC1: 'Reply on CEC1', Sho Yokota, 24 Jun 2025
Dear Prof. Añel,
Thank you for confirming our manuscript.
Just before the submission, we copied our code archived on Github to Zenodo. The DOI is https://doi.org/10.5281/zenodo.15193112. I'm sorry that the DOI was not in the 'Code and Data Availability' section.
The observational data (and also initial and lateral boundary data) are stored in NOAA's High Performance Storage System (HPSS) archives. Based on your comment and the paper published in the past ( https://gmd.copernicus.org/articles/15/6891/2022/#section6 ), it is probably suitable to write the HPSS archives in the 'Code and Data Availability' section.
Considering these, we will revise the 'Code and Data Availability' section in the manuscript as follows.
"ICs, LBCs, and observation data used in this study are obtained from NOAA's High Performance Storage System (HPSS) archives. The RRFS system used in this study is obtained from https://doi.org/10.5281/zenodo.15193112."
If we need to revise it in addition, could you let us know?
Best regards,
Sho Yokota
Citation: https://doi.org/10.5194/egusphere-2025-1866-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 24 Jun 2025
Dear authors,
We could admit that you do not store the data in a valid repository if it exist a justification for it. If no valid justification exists for it, then you must store them correctly. For example, a valid reason would be if the size of the dataset is of several TB. Therefore, what is preventing you or sharing the data in one of the repositories listed in our policy?
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC2 -
AC2: 'Reply on CEC2', Sho Yokota, 24 Jun 2025
Dear Prof. Añel,
A part of observation data in the HPSS archives are the restricted data with the policy: https://www.nco.ncep.noaa.gov/pmb/docs/restricted_data/, so we cannot provide the DOI. The initial and lateral boundary data in the HPSS archives are much larger than 100GB. Even in this case, should we provide the DOI only for unrestricted observation data in the HPSS archives?
Sho Yokota
Citation: https://doi.org/10.5194/egusphere-2025-1866-AC2 -
CEC3: 'Reply on AC2', Juan Antonio Añel, 25 Jun 2025
Dear authors,
For the restricted data, we can grant you an exception to our policy, so that you do not have to share them.
For the unrestricted data it is not that you must provide a DOI, but that NOAA servers are not valid repositories to deposit data for scientific publication. Therefore, you must copy the data in one of the suitable repositories listed in our policy. 100 GB is a size perfectly reasonable for it, and do not suppose a limitation. For example, you can split your data in several Zenodo repositories, which can be up to 50 GB in size.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC3 -
AC3: 'Reply on CEC3', Sho Yokota, 26 Jun 2025
Dear Prof. Añel,
Thank you for granting an exception to the policy for the restricted observation data. I will upload the initial and lateral boundary data and the unrestricted observation data on Zenodo, and revise the 'Code and Data Availability' section as follows.
"ICs, LBCs, and unrestricted observation data used in this study are obtained from https://doi.org/XX.XXXX/zenodo.XXXXXXXX. The RRFS system used in this study is obtained from https://doi.org/10.5281/zenodo.15193112."
Sho Yokota
Citation: https://doi.org/10.5194/egusphere-2025-1866-AC3 -
AC6: 'Reply on AC3', Sho Yokota, 30 Jun 2025
Dear Prof. Añel,
We uploaded the initial and lateral boundary data and the unrestricted observation data on Zenodo. We will revise the 'Code and Data Availability' section as follows.
"ICs, LBCs, and unrestricted observation data used in this study are obtained from https://doi.org/10.5281/zenodo.15744386, https://doi.org/10.5281/zenodo.15747449, and https://doi.org/10.5281/zenodo.15747476. The RRFS system used in this study is obtained from https://doi.org/10.5281/zenodo.15193112."
Sho Yokota
Citation: https://doi.org/10.5194/egusphere-2025-1866-AC6 -
CEC4: 'Reply on AC6', Juan Antonio Añel, 30 Jun 2025
Dear authors,
Many thanks. We can consider the current version of your manuscript in compliance with the code and data policy of the journal.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC4
-
CEC4: 'Reply on AC6', Juan Antonio Añel, 30 Jun 2025
-
AC6: 'Reply on AC3', Sho Yokota, 30 Jun 2025
-
AC3: 'Reply on CEC3', Sho Yokota, 26 Jun 2025
-
CEC3: 'Reply on AC2', Juan Antonio Añel, 25 Jun 2025
-
AC2: 'Reply on CEC2', Sho Yokota, 24 Jun 2025
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 24 Jun 2025
-
AC1: 'Reply on CEC1', Sho Yokota, 24 Jun 2025
Data sets
NOAA Rapid Refresh (RAP) NOAA https://registry.opendata.aws/noaa-rap/
Model code and software
Rapid Refresh Forecast System (RRFS) Sho Yokota https://doi.org/10.5281/zenodo.15193112
Viewed
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
727 | 0 | 12 | 739 | 0 | 0 |
- HTML: 727
- PDF: 0
- XML: 12
- Total: 739
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1