Multigrid Beta Filter for Faster Computation of Ensemble Covariance Localization

Yokota, Sho; Rancic, Miodrag; Lei, Ting; Purser, R. James; De Pondeca, Manuel S. F. V.

doi:10.22541/essoar.172499883.39847608/v2

Preprints

https://doi.org/10.22541/essoar.172499883.39847608/v2

Preprints

12 May 2025

| 12 May 2025

Multigrid Beta Filter for Faster Computation of Ensemble Covariance Localization

Sho Yokota, Miodrag Rancic, Ting Lei, R. James Purser, and Manuel S. F. V. De Pondeca

Abstract. This study applies a multigrid beta filter (MGBF) for covariance localization in ensemble-variational (EnVar) data assimilation instead of the conventional recursive filter (RF) to achieve faster computation in a large number of processors. The parallelization efficiency of the MGBF is higher than that of the RF because all-to-all communication to change the computational region of each processor is not necessary. However, the MGBF-based localization additionally requires horizontal variable exchange between processors; its computational cost is proportional to the number of grid points and to the ensemble size, and is generally more expensive than the RF. In this study, we implement the MGBF-based localization both for the single-scale localization and for the scale-dependent localization in the regional atmospheric EnVar data assimilation system. In addition, we clarify that applying a coarser filter grid and omitting filtering except for the coarsest resolution make the computation of the MGBF-based localization several times faster than that of the RF-based one without significantly changing the EnVar analysis.

Received: 18 Apr 2025 – Discussion started: 12 May 2025

Download & links

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (0 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

27 Oct 2025

Multigrid beta filter for faster computation of ensemble covariance localization

Sho Yokota, Miodrag Rancic, Ting Lei, R. James Purser, and Manuel S. F. V. De Pondeca

Geosci. Model Dev., 18, 7815–7829, https://doi.org/10.5194/gmd-18-7815-2025,https://doi.org/10.5194/gmd-18-7815-2025, 2025

Short summary

Sho Yokota, Miodrag Rancic, Ting Lei, R. James Purser, and Manuel S. F. V. De Pondeca

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1866', Anonymous Referee #1, 09 Jun 2025
Major comments:
      In the original MGBF design (Purser et al., 2022), the filter is applied hierarchically across multiple resolutions (g₁, g₂, …, gₙ), with each level contributing to the final covariance operator. This multiscale construction is central to MGBF's ability to approximate broad localization functions and capture anisotropic or spatially inhomogeneous structures. The process involves adjoint and direct filtering at each grid level (see Eq. 18 and Purser et al., MWR 2022, p. 722), and the results are additively combined (Eqs. 16–17), ensuring smoothness, self-adjointness, and scalability.
    In contrast, the present manuscript adopts a significant simplification: filtering is applied only at the coarsest filter grid, with no filtering at finer levels. This is a clear deviation from the original formulation, and although the authors mention it is for computational efficiency (Lines 99 and 306), the implications of this choice are not adequately discussed. Specifically, the manuscript should examine:
How this approximation affects the effective shape of the localization function, especially for short localization length scales (e.g., 20 km);

Whether it risks degraded performance (e.g., loss of sharpness or spurious correlations) in such cases;

Whether the approximation is acceptable only in certain regimes, such as large-scale SDL with long localization radii, or whether it generalizes more broadly.

Clarifying these points would help readers understand the trade-offs and limitations of this modified implementation.

Minor comments:

Line 55, The current title of Section 2.1, "Ensemble-variational (EnVar) data assimilation", does not reflect the fact that this subsection includes a detailed mathematical formulation of scale-dependent localization (SDL) as applied in the GSI-based 3DEnVar system. In particular, Eqs. (3) and (4) describe the decomposition of ensemble perturbations across multiple spatial scales and the corresponding block-structured localization matrix.
      Since SDL is a significant methodological feature of the paper, both in terms of formulation and in experimental comparisons (e.g., RFSDL vs. MGBF04SDL), I recommend updating the subsection title to something more precise, such as “2.1 Ensemble-variational (EnVar) data assimilation with scale-dependent localization”.
In Line 105, the manuscript states that interpolations are performed “from g₁ to the analysis grid g₀.” Since g₁ is referred to as the “finest filter grid,” it may be misinterpreted as having equal or higher resolution than g₀. However, based on Table 2, g₁ can in fact be coarser than the analysis grid (e.g., in MGBF03–04). I suggest the authors clarify the resolution relationship between g₀ and g₁ to avoid potential confusion.

In Line 137, the authors mention that the analysis grid resolution is twice as coarse as the FV3LAM model grid (i.e., 6 km vs. 3 km), but do not provide any justification or discussion of this design choice. Since this resolution difference could affect the representativeness or accuracy of of the ensemble background error representation, localization, and filter application (especially given the role of multigrid interpolation in MGBF), it would be helpful if the authors could clarify:
The rationale for using a coarser analysis grid (e.g., computational efficiency, memory constraints, etc.),

Whether this design introduces any limitations or trade-offs in terms of representativeness or localization sharpness,

And whether the MGBF design is sensitive to the resolution mismatch between the filter grid and the model grid.

Table 2: The symbol “–” appears in several columns (e.g., "Number of the finest filter grids", "Weight of (g₁, g₂, g₃, g₄)", filter specifications), but its exact meaning is not defined. It is unclear whether “–” indicates “not applicable,” “not used,” “same as previous case,” or “no filtering applied.” To improve clarity and reproducibility, I suggest the authors include a footnote or caption line in Table 2 to explicitly define what “–” represents in each context.
Lines 248–250 and elsewhere：The sentence beginning with “Nevertheless, the difference from RF…” is grammatically correct, but a bit hard to follow due to its length and repeated comparative structure. With multiple experiments and color-coded references mentioned together, the logical comparison becomes difficult to parse.
I suggest breaking it into two simpler sentences or rephrasing it for clarity. For example:
“MGBF04σ showed a smaller deviation from RF than MGBF04. Similarly, MGBF04σSDL was closer to RFSDL than MGBF04SDL.”
In fact, similar long and repetitive sentence constructions appear in several other places in the manuscript. I recommend that the authors go through the manuscript to revise such sentences for improved readability and flow.
Figure 9: there seems to be a mismatch between the panel labels and their descriptions in the caption. Based on the plotted content, panels (a) and (b) appear to show RMSE and bias for temperature, while (c) and (d) show RMSE and bias for horizontal wind. However, the caption currently states that (a, c) are temperature and (b, d) are wind, which appears to be incorrect.
Lines 305-306: The sentence “… and showed how to prevent the computational problem found in applying it” reads a bit awkwardly. The phrase “prevent the computational problem” is not the best fit here, since the issue already occurred during implementation.
Citation: https://doi.org/10.5194/egusphere-2025-1866-RC1
- AC4: 'Reply on RC1', Sho Yokota, 26 Jun 2025
  
  Dear reviewer #1
  Thank you for carefully reading our manuscript and giving useful comments. The response is attached in the supplement.
  Best regards,
  Sho Yokota
  
  Citation: https://doi.org/10.5194/egusphere-2025-1866-AC4
RC2:
'Comment on egusphere-2025-1866', Benjamin Ménétrier, 10 Jun 2025

Major comments:
As mentioned by referee #1 in his/her major comment, this paper is a restrictive application of the full MGBF method. With only one filtering level left, equation 10 seems similar to the NICAS method I developed independently in 2020 (https://doi.org/10.5281/zenodo.4058620, and mentioned in equation 10 of https://doi.org/10.5194/gmd-15-7859-2022). However, the NICAS method can use adaptive unstructured subgrids, handle complex boundaries, and produce inhomogeneous and anisotropic localization functions.
This kind of explicit convolution method on coarse subgrids is computationally efficient when the localization length-scale is large compared to the analysis grid cell size, since the subgrid can be coarse. However, I agree with referee #1 that it can become very expensive for smaller localization length-scales, because in this case a fine subgrid must be kept to maintain the localization function sharpness.
Another issue properly handled in the NICAS method and missing here is the localization normalization (i.e. diagonal coefficients of the localization matrix should all be equal to one). Figure 4 suggests that the MGBF method is perfectly normalized with all curves going to 1 at zero separation. However, I believe this is true only if the observation is located on a coarse grid node. Indeed, even if the continuous function B_p(x) is normalized (as mentioned after equation 11), the discrete low-resolution filters F_{BF} might not be, and even if they were, the final interpolation to then analysis grid would break this normalization. Only an outer diagonal scaling matrix taking all the operators (filters and interpolations) into account can ensure a proper normalization.
Minor comments:
In section 2.1, equation (2) is already an approximation of the general 3DEnVar formulation. Indeed, the authors are using the same 3D localization matrix for all the auto- and cross-localization blocks between different analysis variables. This method is sometimes referred to as "Mark Buehner's trick" (used in https://doi.org/10.1175/2009MWR3157.1 and clearly described in section 3.4.2. of https://doi.org/10.1002/qj.2325). It assumes that all the analysis variables have roughly the same error correlation length-scale. Whether this assumption holds here or not, I think it should be mentioned.
In equation (10) of section 2.3, the rightmost interpolation operator (D from g1 to gt) is actually not required if only one grid and one scale are used, as DD^T = I. If several grids are needed (e.g. g2 and g4 as in experiment MGBF03SDL), this interpolation operator is required to combine the scales with operator E, but the destination grid should be the finest grid used (here g2), not necessarily g1.
Finally, I think that the experiments with slightly reduced length-scales (with a sigma suffix) are not really necessary. As shown in https://doi.org/10.1175/MWR-D-22-0255.1, the analysis quality is not very sensitive to the localization length-scale, as long as this length-scale is good enough. Given all the other uncertainties about the localization function shape and the fact that it should actually be anisotropic and inhomogeneous, the optimization of the localization length-scale does not seem really relevant here. Removing it (or better keeping it and removing the non-sigma case) would make the article a bit lighter and easier to read.

Citation: https://doi.org/10.5194/egusphere-2025-1866-RC2
- AC5: 'Reply on RC2', Sho Yokota, 26 Jun 2025
  
  Dear Dr. Benjamin Ménétrier,
  Thank you for carefully reading our manuscript and giving useful comments. The response is attached in the supplement.
  Best regards,
  Sho Yokota
  
  Citation: https://doi.org/10.5194/egusphere-2025-1866-AC5
CEC1:
'Comment on egusphere-2025-1866 - No compliance with the policy of the journal', Juan Antonio Añel, 20 Jun 2025

Dear authors,

Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html

You have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, for the observational data, you have stored it in service from a commercial provider. We can not accept this. You must store the data in one of the repositories listed in our policy, and share here the link and permanent identifier (e.g. DOI) for it.
I must note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the links and permanent identifiers of the new repositories.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC1
- AC1:
  'Reply on CEC1', Sho Yokota, 24 Jun 2025
  
  Dear Prof. Añel,
  Thank you for confirming our manuscript.
  Just before the submission, we copied our code archived on Github to Zenodo. The DOI is https://doi.org/10.5281/zenodo.15193112. I'm sorry that the DOI was not in the 'Code and Data Availability' section.
  The observational data (and also initial and lateral boundary data) are stored in NOAA's High Performance Storage System (HPSS) archives. Based on your comment and the paper published in the past ( https://gmd.copernicus.org/articles/15/6891/2022/#section6 ), it is probably suitable to write the HPSS archives in the 'Code and Data Availability' section.
  Considering these, we will revise the 'Code and Data Availability' section in the manuscript as follows.
  "ICs, LBCs, and observation data used in this study are obtained from NOAA's High Performance Storage System (HPSS) archives. The RRFS system used in this study is obtained from https://doi.org/10.5281/zenodo.15193112."
  If we need to revise it in addition, could you let us know?
  Best regards,
  Sho Yokota
  
  Citation: https://doi.org/10.5194/egusphere-2025-1866-AC1
  - CEC2: 'Reply on AC1', Juan Antonio Añel, 24 Jun 2025
    
    Dear authors,
    We could admit that you do not store the data in a valid repository if it exist a justification for it. If no valid justification exists for it, then you must store them correctly. For example, a valid reason would be if the size of the dataset is of several TB. Therefore, what is preventing you or sharing the data in one of the repositories listed in our policy?
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC2
    
    AC2: 'Reply on CEC2', Sho Yokota, 24 Jun 2025
    
    Dear Prof. Añel,
    A part of observation data in the HPSS archives are the restricted data with the policy: https://www.nco.ncep.noaa.gov/pmb/docs/restricted_data/, so we cannot provide the DOI. The initial and lateral boundary data in the HPSS archives are much larger than 100GB. Even in this case, should we provide the DOI only for unrestricted observation data in the HPSS archives?
    Sho Yokota
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-AC2
    
    CEC3: 'Reply on AC2', Juan Antonio Añel, 25 Jun 2025
    
    Dear authors,
    For the restricted data, we can grant you an exception to our policy, so that you do not have to share them.
    For the unrestricted data it is not that you must provide a DOI, but that NOAA servers are not valid repositories to deposit data for scientific publication. Therefore, you must copy the data in one of the suitable repositories listed in our policy. 100 GB is a size perfectly reasonable for it, and do not suppose a limitation. For example, you can split your data in several Zenodo repositories, which can be up to 50 GB in size.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC3
    
    AC3: 'Reply on CEC3', Sho Yokota, 26 Jun 2025
    
    Dear Prof. Añel,
    Thank you for granting an exception to the policy for the restricted observation data. I will upload the initial and lateral boundary data and the unrestricted observation data on Zenodo, and revise the 'Code and Data Availability' section as follows.
    "ICs, LBCs, and unrestricted observation data used in this study are obtained from https://doi.org/XX.XXXX/zenodo.XXXXXXXX. The RRFS system used in this study is obtained from https://doi.org/10.5281/zenodo.15193112."
    Sho Yokota
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-AC3
    
    AC6: 'Reply on AC3', Sho Yokota, 30 Jun 2025
    
    Dear Prof. Añel,
    We uploaded the initial and lateral boundary data and the unrestricted observation data on Zenodo. We will revise the 'Code and Data Availability' section as follows.
    "ICs, LBCs, and unrestricted observation data used in this study are obtained from https://doi.org/10.5281/zenodo.15744386, https://doi.org/10.5281/zenodo.15747449, and https://doi.org/10.5281/zenodo.15747476. The RRFS system used in this study is obtained from https://doi.org/10.5281/zenodo.15193112."
    Sho Yokota
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-AC6
    
    CEC4: 'Reply on AC6', Juan Antonio Añel, 30 Jun 2025
    
    Dear authors,
    Many thanks. We can consider the current version of your manuscript in compliance with the code and data policy of the journal.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC4

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1866', Anonymous Referee #1, 09 Jun 2025
Major comments:
      In the original MGBF design (Purser et al., 2022), the filter is applied hierarchically across multiple resolutions (g₁, g₂, …, gₙ), with each level contributing to the final covariance operator. This multiscale construction is central to MGBF's ability to approximate broad localization functions and capture anisotropic or spatially inhomogeneous structures. The process involves adjoint and direct filtering at each grid level (see Eq. 18 and Purser et al., MWR 2022, p. 722), and the results are additively combined (Eqs. 16–17), ensuring smoothness, self-adjointness, and scalability.
    In contrast, the present manuscript adopts a significant simplification: filtering is applied only at the coarsest filter grid, with no filtering at finer levels. This is a clear deviation from the original formulation, and although the authors mention it is for computational efficiency (Lines 99 and 306), the implications of this choice are not adequately discussed. Specifically, the manuscript should examine:
How this approximation affects the effective shape of the localization function, especially for short localization length scales (e.g., 20 km);

Whether it risks degraded performance (e.g., loss of sharpness or spurious correlations) in such cases;

Whether the approximation is acceptable only in certain regimes, such as large-scale SDL with long localization radii, or whether it generalizes more broadly.

Clarifying these points would help readers understand the trade-offs and limitations of this modified implementation.

Minor comments:

Line 55, The current title of Section 2.1, "Ensemble-variational (EnVar) data assimilation", does not reflect the fact that this subsection includes a detailed mathematical formulation of scale-dependent localization (SDL) as applied in the GSI-based 3DEnVar system. In particular, Eqs. (3) and (4) describe the decomposition of ensemble perturbations across multiple spatial scales and the corresponding block-structured localization matrix.
      Since SDL is a significant methodological feature of the paper, both in terms of formulation and in experimental comparisons (e.g., RFSDL vs. MGBF04SDL), I recommend updating the subsection title to something more precise, such as “2.1 Ensemble-variational (EnVar) data assimilation with scale-dependent localization”.
In Line 105, the manuscript states that interpolations are performed “from g₁ to the analysis grid g₀.” Since g₁ is referred to as the “finest filter grid,” it may be misinterpreted as having equal or higher resolution than g₀. However, based on Table 2, g₁ can in fact be coarser than the analysis grid (e.g., in MGBF03–04). I suggest the authors clarify the resolution relationship between g₀ and g₁ to avoid potential confusion.

In Line 137, the authors mention that the analysis grid resolution is twice as coarse as the FV3LAM model grid (i.e., 6 km vs. 3 km), but do not provide any justification or discussion of this design choice. Since this resolution difference could affect the representativeness or accuracy of of the ensemble background error representation, localization, and filter application (especially given the role of multigrid interpolation in MGBF), it would be helpful if the authors could clarify:
The rationale for using a coarser analysis grid (e.g., computational efficiency, memory constraints, etc.),

Whether this design introduces any limitations or trade-offs in terms of representativeness or localization sharpness,

And whether the MGBF design is sensitive to the resolution mismatch between the filter grid and the model grid.

Table 2: The symbol “–” appears in several columns (e.g., "Number of the finest filter grids", "Weight of (g₁, g₂, g₃, g₄)", filter specifications), but its exact meaning is not defined. It is unclear whether “–” indicates “not applicable,” “not used,” “same as previous case,” or “no filtering applied.” To improve clarity and reproducibility, I suggest the authors include a footnote or caption line in Table 2 to explicitly define what “–” represents in each context.
Lines 248–250 and elsewhere：The sentence beginning with “Nevertheless, the difference from RF…” is grammatically correct, but a bit hard to follow due to its length and repeated comparative structure. With multiple experiments and color-coded references mentioned together, the logical comparison becomes difficult to parse.
I suggest breaking it into two simpler sentences or rephrasing it for clarity. For example:
“MGBF04σ showed a smaller deviation from RF than MGBF04. Similarly, MGBF04σSDL was closer to RFSDL than MGBF04SDL.”
In fact, similar long and repetitive sentence constructions appear in several other places in the manuscript. I recommend that the authors go through the manuscript to revise such sentences for improved readability and flow.
Figure 9: there seems to be a mismatch between the panel labels and their descriptions in the caption. Based on the plotted content, panels (a) and (b) appear to show RMSE and bias for temperature, while (c) and (d) show RMSE and bias for horizontal wind. However, the caption currently states that (a, c) are temperature and (b, d) are wind, which appears to be incorrect.
Lines 305-306: The sentence “… and showed how to prevent the computational problem found in applying it” reads a bit awkwardly. The phrase “prevent the computational problem” is not the best fit here, since the issue already occurred during implementation.
Citation: https://doi.org/10.5194/egusphere-2025-1866-RC1
- AC4: 'Reply on RC1', Sho Yokota, 26 Jun 2025
  
  Dear reviewer #1
  Thank you for carefully reading our manuscript and giving useful comments. The response is attached in the supplement.
  Best regards,
  Sho Yokota
  
  Citation: https://doi.org/10.5194/egusphere-2025-1866-AC4
RC2:
'Comment on egusphere-2025-1866', Benjamin Ménétrier, 10 Jun 2025

Major comments:
As mentioned by referee #1 in his/her major comment, this paper is a restrictive application of the full MGBF method. With only one filtering level left, equation 10 seems similar to the NICAS method I developed independently in 2020 (https://doi.org/10.5281/zenodo.4058620, and mentioned in equation 10 of https://doi.org/10.5194/gmd-15-7859-2022). However, the NICAS method can use adaptive unstructured subgrids, handle complex boundaries, and produce inhomogeneous and anisotropic localization functions.
This kind of explicit convolution method on coarse subgrids is computationally efficient when the localization length-scale is large compared to the analysis grid cell size, since the subgrid can be coarse. However, I agree with referee #1 that it can become very expensive for smaller localization length-scales, because in this case a fine subgrid must be kept to maintain the localization function sharpness.
Another issue properly handled in the NICAS method and missing here is the localization normalization (i.e. diagonal coefficients of the localization matrix should all be equal to one). Figure 4 suggests that the MGBF method is perfectly normalized with all curves going to 1 at zero separation. However, I believe this is true only if the observation is located on a coarse grid node. Indeed, even if the continuous function B_p(x) is normalized (as mentioned after equation 11), the discrete low-resolution filters F_{BF} might not be, and even if they were, the final interpolation to then analysis grid would break this normalization. Only an outer diagonal scaling matrix taking all the operators (filters and interpolations) into account can ensure a proper normalization.
Minor comments:
In section 2.1, equation (2) is already an approximation of the general 3DEnVar formulation. Indeed, the authors are using the same 3D localization matrix for all the auto- and cross-localization blocks between different analysis variables. This method is sometimes referred to as "Mark Buehner's trick" (used in https://doi.org/10.1175/2009MWR3157.1 and clearly described in section 3.4.2. of https://doi.org/10.1002/qj.2325). It assumes that all the analysis variables have roughly the same error correlation length-scale. Whether this assumption holds here or not, I think it should be mentioned.
In equation (10) of section 2.3, the rightmost interpolation operator (D from g1 to gt) is actually not required if only one grid and one scale are used, as DD^T = I. If several grids are needed (e.g. g2 and g4 as in experiment MGBF03SDL), this interpolation operator is required to combine the scales with operator E, but the destination grid should be the finest grid used (here g2), not necessarily g1.
Finally, I think that the experiments with slightly reduced length-scales (with a sigma suffix) are not really necessary. As shown in https://doi.org/10.1175/MWR-D-22-0255.1, the analysis quality is not very sensitive to the localization length-scale, as long as this length-scale is good enough. Given all the other uncertainties about the localization function shape and the fact that it should actually be anisotropic and inhomogeneous, the optimization of the localization length-scale does not seem really relevant here. Removing it (or better keeping it and removing the non-sigma case) would make the article a bit lighter and easier to read.

Citation: https://doi.org/10.5194/egusphere-2025-1866-RC2
- AC5: 'Reply on RC2', Sho Yokota, 26 Jun 2025
  
  Dear Dr. Benjamin Ménétrier,
  Thank you for carefully reading our manuscript and giving useful comments. The response is attached in the supplement.
  Best regards,
  Sho Yokota
  
  Citation: https://doi.org/10.5194/egusphere-2025-1866-AC5
CEC1:
'Comment on egusphere-2025-1866 - No compliance with the policy of the journal', Juan Antonio Añel, 20 Jun 2025

Dear authors,

Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html

You have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other long-term archival and publishing alternatives, such as Zenodo. Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy.
Also, for the observational data, you have stored it in service from a commercial provider. We can not accept this. You must store the data in one of the repositories listed in our policy, and share here the link and permanent identifier (e.g. DOI) for it.
I must note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the links and permanent identifiers of the new repositories.
Juan A. Añel

Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC1
- AC1:
  'Reply on CEC1', Sho Yokota, 24 Jun 2025
  
  Dear Prof. Añel,
  Thank you for confirming our manuscript.
  Just before the submission, we copied our code archived on Github to Zenodo. The DOI is https://doi.org/10.5281/zenodo.15193112. I'm sorry that the DOI was not in the 'Code and Data Availability' section.
  The observational data (and also initial and lateral boundary data) are stored in NOAA's High Performance Storage System (HPSS) archives. Based on your comment and the paper published in the past ( https://gmd.copernicus.org/articles/15/6891/2022/#section6 ), it is probably suitable to write the HPSS archives in the 'Code and Data Availability' section.
  Considering these, we will revise the 'Code and Data Availability' section in the manuscript as follows.
  "ICs, LBCs, and observation data used in this study are obtained from NOAA's High Performance Storage System (HPSS) archives. The RRFS system used in this study is obtained from https://doi.org/10.5281/zenodo.15193112."
  If we need to revise it in addition, could you let us know?
  Best regards,
  Sho Yokota
  
  Citation: https://doi.org/10.5194/egusphere-2025-1866-AC1
  - CEC2: 'Reply on AC1', Juan Antonio Añel, 24 Jun 2025
    
    Dear authors,
    We could admit that you do not store the data in a valid repository if it exist a justification for it. If no valid justification exists for it, then you must store them correctly. For example, a valid reason would be if the size of the dataset is of several TB. Therefore, what is preventing you or sharing the data in one of the repositories listed in our policy?
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC2
    
    AC2: 'Reply on CEC2', Sho Yokota, 24 Jun 2025
    
    Dear Prof. Añel,
    A part of observation data in the HPSS archives are the restricted data with the policy: https://www.nco.ncep.noaa.gov/pmb/docs/restricted_data/, so we cannot provide the DOI. The initial and lateral boundary data in the HPSS archives are much larger than 100GB. Even in this case, should we provide the DOI only for unrestricted observation data in the HPSS archives?
    Sho Yokota
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-AC2
    
    CEC3: 'Reply on AC2', Juan Antonio Añel, 25 Jun 2025
    
    Dear authors,
    For the restricted data, we can grant you an exception to our policy, so that you do not have to share them.
    For the unrestricted data it is not that you must provide a DOI, but that NOAA servers are not valid repositories to deposit data for scientific publication. Therefore, you must copy the data in one of the suitable repositories listed in our policy. 100 GB is a size perfectly reasonable for it, and do not suppose a limitation. For example, you can split your data in several Zenodo repositories, which can be up to 50 GB in size.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC3
    
    AC3: 'Reply on CEC3', Sho Yokota, 26 Jun 2025
    
    Dear Prof. Añel,
    Thank you for granting an exception to the policy for the restricted observation data. I will upload the initial and lateral boundary data and the unrestricted observation data on Zenodo, and revise the 'Code and Data Availability' section as follows.
    "ICs, LBCs, and unrestricted observation data used in this study are obtained from https://doi.org/XX.XXXX/zenodo.XXXXXXXX. The RRFS system used in this study is obtained from https://doi.org/10.5281/zenodo.15193112."
    Sho Yokota
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-AC3
    
    AC6: 'Reply on AC3', Sho Yokota, 30 Jun 2025
    
    Dear Prof. Añel,
    We uploaded the initial and lateral boundary data and the unrestricted observation data on Zenodo. We will revise the 'Code and Data Availability' section as follows.
    "ICs, LBCs, and unrestricted observation data used in this study are obtained from https://doi.org/10.5281/zenodo.15744386, https://doi.org/10.5281/zenodo.15747449, and https://doi.org/10.5281/zenodo.15747476. The RRFS system used in this study is obtained from https://doi.org/10.5281/zenodo.15193112."
    Sho Yokota
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-AC6
    
    CEC4: 'Reply on AC6', Juan Antonio Añel, 30 Jun 2025
    
    Dear authors,
    Many thanks. We can consider the current version of your manuscript in compliance with the code and data policy of the journal.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2025-1866-CEC4

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Sho Yokota on behalf of the Authors (07 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (15 Aug 2025) by Guoqing Ge

RR by Anonymous Referee #1 (28 Aug 2025)

RR by Benjamin Ménétrier (31 Aug 2025)

ED: Publish subject to minor revisions (review by editor) (11 Sep 2025) by Guoqing Ge

AR by Sho Yokota on behalf of the Authors (19 Sep 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (20 Sep 2025) by Guoqing Ge

AR by Sho Yokota on behalf of the Authors (26 Sep 2025) Manuscript

Journal article(s) based on this preprint

27 Oct 2025

Multigrid beta filter for faster computation of ensemble covariance localization

Sho Yokota, Miodrag Rancic, Ting Lei, R. James Purser, and Manuel S. F. V. De Pondeca

Geosci. Model Dev., 18, 7815–7829, https://doi.org/10.5194/gmd-18-7815-2025,https://doi.org/10.5194/gmd-18-7815-2025, 2025

Short summary

Sho Yokota, Miodrag Rancic, Ting Lei, R. James Purser, and Manuel S. F. V. De Pondeca

Data sets

NOAA Rapid Refresh (RAP) NOAA https://registry.opendata.aws/noaa-rap/

Model code and software

Rapid Refresh Forecast System (RRFS) Sho Yokota https://doi.org/10.5281/zenodo.15193112

Sho Yokota, Miodrag Rancic, Ting Lei, R. James Purser, and Manuel S. F. V. De Pondeca

Viewed

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 2,073 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,061	0	12	2,073	0	0

HTML: 2,061
PDF: 0
XML: 12
Total: 2,073
BibTeX: 0
EndNote: 0

Views and downloads (calculated since 12 May 2025)

Month	HTML	PDF	XML
May 2025	67	0	67
Jun 2025	165	10	175
Jul 2025	108	2	110
Aug 2025	353	0	353
Sep 2025	1,300	0	1,300
Oct 2025	68	0	68
Nov 2025	0
Dec 2025	0
Jan 2026	0
Feb 2026	0
Mar 2026	0
Apr 2026	0

Cumulative views and downloads (calculated since 12 May 2025)

Month	HTML	PDF	XML
May 2025	67	0	67
Jun 2025	165	10	175
Jul 2025	108	2	110
Aug 2025	353	0	353
Sep 2025	1,300	0	1,300
Oct 2025	68	0	68
Nov 2025	0
Dec 2025	0
Jan 2026	0
Feb 2026	0
Mar 2026	0
Apr 2026	0

Viewed (geographical distribution)

Since the preprint corresponding to this journal article was posted outside of Copernicus Publications, the preprint-related metrics are limited to HTML views.

Total article views: 2,061 (including HTML, PDF, and XML) Thereof 2,061 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 11 Apr 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint

Final revised paper

Short summary

Covariance localization to mitigate sampling error of ensemble-based forecast error covariances is one of the main parts of the calculation in ensemble-variational data assimilation for the atmosphere. This study clarifies that the multigrid beta filter-based localization makes it several times faster than the conventional recursive filter-based one without significantly changing the analysis if a coarser filter grid is applied and filters except for the coarsest resolution are omitted.


Total:	0
HTML:	0
PDF:	0
XML:	0

Multigrid Beta Filter for Faster Computation of Ensemble Covariance Localization

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Data sets

Model code and software

Viewed

Viewed (geographical distribution)