the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Localization in the mapping particle filter
Abstract. Data assimilation involves sequential inference in geophysical systems with nonlinear dynamics and observational operators. Particle filters are a promising approach for data assimilation because they are able to represent non-Gaussian densities.
The mapping particle filter incorporates the Stein variational gradient descents to produce a particle flow that transforms state vectors from prior to posterior densities, aiming to minimize the Kullback-Leibler divergence. However, for applications in geophysical systems, challenges persist in high dimensions, where sample covariance underestimation leads to filter divergence. This work proposes two localization methods, one in which a local kernel function is defined and the particle flow is global. The second method, given a localization radius, physically partitions the state vector and performs local mappings at each grid point. Gaussian and Gaussian mixtures are evaluated as a prior density. The performance of the proposed Local Mapping Particle Filters (LMPFs) is assessed in synthetic experiments. Observations are produced with a two-scale Lorenz-96 system, while a single-scale Lorenz-96 is used as a surrogate model, introducing model error in the inference. The methods are evaluated with full and partial observations and with different linear and non-linear observational operators. The LMPFs with Gaussian mixtures perform similarly to Gaussian filters such as ETKF and LETKF in most cases, and in some scenarios, they provide competitive performance in terms of analysis accuracy.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Nonlinear Processes in Geophysics.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
(2866 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2420', Peter Jan van Leeuwen, 20 Jul 2025
This is a well-written paper on the highly relevant subject of nonlinear filtering for the geosciences. Specifically, the paper focuses on so-called Particle Flow Filters, which have recently been introduced in the geosciences and have to potential to provide an accurate solution for nonlinear high-dimensional applications. However, there are several issues that need attention. The first is to consider a particle flow filter as a special case of a particle filter. This is quite misleading. A particle filter is defined via likelihood weighting, and the effort is concentrated on making these weights more equal. A particle flow filter is an iterative algorithm to move particles from the prior to the posterior via a flow, such that likelihood weighting is not present. It is more an iterative MCMC method. Seeing it as a special particle filter will confuse readers. More importantly, however, is the slight misrepresentation of some of the existing literature and the omission of many relevant papers that appeared in the last 5 years. Furthermore, the particle flow filters are not optimized as they perhaps should be, and the conclusions are overly optimistic on the performance compared to an LETKF. For these reasons, I suggest a major (but could also be considered minor) revision along the lines suggested below.
Abstract: It would be misleading to call the mapping particle filter a particle filter. This might seem a contradiction, but a particle filter is defined by likelihood weights, which are not present in a particle flow filter. A particle flow filter is an iterative ensemble method, perhaps better called an iterative MCMC method, and likelihood weights are not involved by construction. I suggest to remove the confusing statement referring to particle filters.
32: A 4DVar is not a linear DA method, but an iterative nonlinear DA method. It is true that is can struggle when nonlinearity is strong, but that doesn't make it a linear DA method!
42: Localization in particle filters was introduced in Bengtsson, T., Snyder, C. and Nychka, D. (2003) Toward a nonlinear ensemble filter for high-dimensional systems. Journal of Geophysical Research, 108, 8775–8785, and independently in van Leeuwen, P.J. (2003) Nonlinear ensemble data assimilation for the ocean. In seminar Recent Developments in Data Assimilation for Atmosphere and Ocean. 8–12 September 2003, ECMWF, Reading, UK.
43: Jittering is used to rejuvenate particles after a tempering step. Please correct the wording.
76: A particle flow filter is not a special particle filter, because it does not use likelihood weighting, see point 1 above.
61: Some recent literature is missing. The stochastic version of the Particle Flow Filter is unbiased at any ensemble size, and solves many of the issues mentioned, see e.g. Gallego, V. and Insua, D. R. (2020) Stochastic gradient mcmc with repulsive forces. arXiv preprint arXiv:1812.00071, https://arxiv.org/abs/1812.00071; Leviyev, A., Chen, J., Wang, Y., Ghattas, O. and Zimmerman, A. (2022) A stochastic stein variational newton method. arXiv:2204.09039. https://doi.org/10.48550/arXiv.2204.09039; Ma, Y.-A., Chen, T. and Fox, E. (2015) A complete recipe for stochastic gradient mcmc. In Advances in Neural Information Processing Systems (eds. C. Cortes, N. Lawrence, D. Lee, M. Sugiyama and R. Garnett), vol. 28. Curran Associates, Inc. https://doi.org/10.48550/arXiv.1506.04696.
79: The description of Hu and Van Leeuwen is incorrect. They use a preconditioning matrix in the flow to speed up convergence, and they choose a localizated prior covariance for this matrix. The result is that the prior covariance matrix is cancelled by this precondition matrix. Nothing special is done to the likelihood. Note also that the preconditioning matrix does not have to be very accurate, a rough localization will do. This is a distinct difference with LEnKFs, in which accurate locatization is crucial. (PS the method is also applied to a full atmospheric model in Hu, C-C , P.J. van Leeuwen , J. L. Anderson (2024) An implementation of the particle flow filter in an atmospheric model, Monthly Weather Rev., doi: 10.1175/MWR-D-24-0006.1.)
93: Subrahmanya et al also minimize the KL divergence, but formulate the flow field using the FP equation, and then propose approximations to come up with solutions for high-dimensional systems. The main difference with Pulido and Van Leeuwen is that they do not use a RKHS.
Eq. (5): This flow, together with Eq. (4) does not conserve the physical dimension of the state and hence is inconsistent. We did this wrong in 2019, apologies. Assume the state contains temperature measured in K. Then the flow has physical dimension K^{-1} (assuming the kernel has no physical dimension, such as the one used in Eq. (6)), which is inconsistent with Eq. (4). The preconditioning with a covariance matrix of the state as explored by Hu and Van Leeuwen is one way to solve this issue.
147: Note that if the model is stochastic with additive Gaussian errors, Eq. (9) is not an approximation, see Pulido and Van Leeuwen. It might be good to point that out.
160: The beta localization comes close to the methodology implemented in Hu et al. Monthly Weather Rev., doi: 10.1175/MWR-D-24-0006.1.
Eq 10: note difference between independence and uncorrelated. Perhaps rephrase sentence above this.
Eq. 11: Is there a reason to change the nabla notation?
205: Please finish the sentence.
233: In the beta algorithm, assume one grid point is completely updated, and we move to the next point. Will that point use the updated first grid point value, or the original first grid point value? The former will lead to smoother global fields, but makes the result depend on the order in which the grid points are updated. One could imagine a mixture between alpha and beta, where beta is used over the whole filed at each iteration. Hu et al. Monthly Weather Rev., doi: 10.1175/MWR-D-24-0006.1 use another local updating scheme. It might be something to discuss; it would help future users of these methods.
Eq. (23): what does the index LS mean? I assume Large Scale?
263: Notation: M means something different in section 3.0 compared to here.
324: Particle filters -> particle flow filters
Experiments: It doesn't make sense to tune methods on lowest RMSE because ensemble spread is as important. For instance, in operational weather prediction centers both are optimized. If not corrected in the experiments, please provide some discussion.
Experiments: Can the authors provide a rule of thumb for choosing the hyperparameters?
Experiments: Can the authors provide a rough comparison of computational costs compared to the LETKF?
456: I would not say that the LMPF are 'highly competitive' compared to an LETKF. They tend to be worse, and with similar performance at best. The spread is typically underestimated, which is not a good sign.
469: Convergence results of the MPF rely on keeping the kernel a fixed function of its two arguments, and changing the covariance matrix does not fulfil this criterion. 'this matrix must evolve with pseudo time' is incorrect. Please change the text and mention this issue.
474: A step in this direction is Hu et al Monthly Weather Rev., doi: 10.1175/MWR-D-24-0006.1.
Citation: https://doi.org/10.5194/egusphere-2025-2420-RC1 -
RC2: 'Comment on egusphere-2025-2420', Alban Farchi, 24 Jul 2025
In this manuscript, the authors propose to use localisation to make the mapping particle filter viable in high-dimensional geophysical systems. Two localisation techniques are derived and presented. The resulting algorithms are tested using twin experiments with a low-order system. The manuscript is overall rather well written, although it could benefit from some reformulations here and there. The topic is highly relevant and I am glad to see some concrete results as I remember discussing the methodology with some of the co-authors about five years ago. Nevertheless, I have some major concerns about the manuscript.
General / major comments
------------------------On the methodology
Overall, I have the impression that the presentation of the localisation techniques could be improved given the following remarks.
- From what I understood, the introduction of \tilde{x} instead of x is not fundamental (meaning that the expressions would still be valid by replacing \tilde{x} by x) but practical (because of the localisation, many entries of the covariance matrices are zero and hence the matrix-vector products can be computed using only a subset of the state). If this is indeed the case, I think that this should be explained at the end of the section. The authors could perhaps even consider presenting a version without \tilde{x} and only mention that the algebra can be easily optimised by selecting the non-zero entries in the matrices and vectors at the end of the section. I have the feeling that it would make the section easier to understand.
- In the section, a lot of modifications to the original equations are presented, but without writing the modified equations. While this helps keeping the manuscript short and focused, it makes it harder to follow. Therefore, I think that it would make sense to present the finalised expressions, perhaps in an appendix, in one of the two cases for the prior density (Gaussian or Gaussian mixture).
- I think that a discussion about the algorithmic complexity of the algorithm is missing.
- Somewhere in the text, it should be mentioned that beta-localisation can only be applied in the case of local observations (i.e. observations which have a well-defined location in physical space).On the numerical experiments
- It would be interesting before section 4 to make a summary, for each filter of the tuning parameters. Unless I missed it, it is not specified whether the parameters are tuned separately for each experiment or once and for all experiments. Furthermore, I don't understand why the localisation radius is not included in the set of tuning parameters. In my opinion, using a fixed value of localisation radius favours one algorithm over the others in a quite arbitrary way.
- 2.2k cycles for the Lorenz system is really small and makes the results really sensitive to the specific choice of random seed. I would highly recommend to use at least 10k cycles (100k would be better) and to repeat each experiments N times with N different random seeds (from which you could get error bars for the results). This sensibility is really visible in some figures which look "noisy" (eg fig 7), and error bars could help distinguish what differences are significant or not.
- Sometimes, you mention that one or the other of the filters did not converge. Could you be more explicit by what you mean here, for both the (L)MPF and for the (L)ETKF?
- Finally, even if it is already a good step to make a comparison with the optimally tuned (L)ETKF, it should be noted that the (L)ETKF is not designed to handle non-linearity and model error. Therefore, in the non-linear cases, the EnKF cannot really be considered state-of-the-art and should in principle be replaced by iterative EnKFs, and in all cases, there should be a treatment for model error (even a basic one) in the EnKFs. I can understand that making a full comparison could be out of scope of the present work, but I think that these points should be at least mentioned.Minor / technical comments
--------------------------L 4: "the Stein variational gradient descents" -> "the Stein variational gradient descent"?
L 5: "aiming to minimize the Kullback-Leibler divergence" I find this sentence not entirely clear: who is "aiming to" and between what and what is the KL divergence computed?
L 9: "Gaussian and Gaussian mixtures are evaluated as a prior density." This formulation seems a bit weird. In addition, I am not entirely sure that this information is essential in the abstract.
L 32: "such as four-dimensional variational" 4D-Var is not a linear data assimilation method! Here (and in the following sentences) you are mixing non-Gaussianity and non-linearity. It is true that non-linearity does induce non-Gaussianity, but the two effects should remain distinguished in my opinion.
L 42: I think that there is a typo in the name of Poterjoy.
L 42-44: Localisation, tempering, and jittering techniques have different objectives. Therefore, I find it a bit weird to see them in the same sentence (even though it is true that they all ultimately aim at making the particle filter work).
L 46: "The MPF is a novel data assimilation approach" While I agree that the MPF is more recent than other DA algorithms, it has already been introduced 6 years ago, based on the SVGD proposed 9 years ago. Can we still say it is "novel"?
L 79-83: Can you briefly describe here how your method differs from that of Hu and van Leeuwen, 2021?
L 84-86: The outline is missing at the end of the introduction.
L 108: "that sample" -> "that samples"
L 121: please avoid nested parentheses in citations.
L 124: "Kullback-Leibler Divergence DKL" between what and what?
Before Eqs. (8) and (9) It would be nice to give the expression of the associated priors, no?
L 158: "global banded prior covariance matrix" wouldn't it be more appropriate to say "localised prior covariance matrix"?
L 194: A punctation sign is missing at the end of the sentence.
L 205: the sentence is not finished.
Table 1: "Long-scale" -> "Large-scale"?
Eq. 22: N_{SU} should be N_{SS}, right? Also it would be good to define the acronyms LS and SS.
L 270: As it is, R corresponds to a variance (and not a standard deviation). I would therefore advise to use a symbol "squared" (eg \sigma^2...)
L 279-281: Why not using the surrogate model to initialise the particles? This would make the algorithm completely independent of the true model.
L 289-291: It is in my opinion not clear here that you are referring to the experiments in section 4.4.
L 307: please specify between what and what you compute the RMSE (analysis ensemble mean and truth I imagine).
Figure 1 is nice and beautiful, but I am not sure that it is absolutely necessary. If you chose to keep it, I would strongly recommend to replace the colorbar "jet" by a perceptually uniform colormap (see the explanations here: https://matplotlib.org/stable/users/explain/colors/colormaps.html)
L 349: "which is optimum for the ensemble Kalman filter." This is true only for 20 ensemble members. For 50 ensemble members, given the fact that the ETKF performs approximately on par with the LETKF, I suspect that the optimal radius would be the largest one.
L 394: I find it a bit deceiving to speak of "global case" when l=18. Indeed even if all the variables are included in the local domains, the Gaspari-Cohn tapering does make localisation different from the global case.
L 401: Can you give the number of positive (and neutral) Lyapunov exponents.
Figure 8 could be shown in log-log scale.
L 425-428: Usually, this issue can be solve by applying random rotations to the analysis perturbations.
L 452: "requires the prior density function to be declared beforehand" Do you mean that the parametrisation of the prior density should be chosen beforehand? But isn't it the case for most DA methods?
L 469: "this matrix must evolve with pseudo time" I am not sure to understand what is meant here.About the name of the low-order model: the model known as "Lorenz 1996" (Lorenz and Emanuel, 1998, https://doi.org/10.1175/1520-0469(1998)055%3C0399:OSFSWO%3E2.0.CO;2) does not include a two-scale version. The two scale version was only proposed later on (Lorenz 2005, http://dx.doi.org/10.1175/JAS3430.1) and should be rigorously named "Lorenz 2005-III" (as it is the third model presented in this article). A good compromise could be to use the following names: "one-scale Lorenz model" (referring to the Lorenz 1996 model) and "two-scale Lorenz model" (referring to the Lorenz 2005-III model). Also, please do not forget the citations.
The format of cross-references to equations, algorithms, etc. are inconsistent throughout the manuscript, which makes it harder to follow. I strongly recommend the use of the latex package "cleveref" which automatically handles them.
Citation: https://doi.org/10.5194/egusphere-2025-2420-RC2 -
EC1: 'Comment on egusphere-2025-2420', Olivier Talagrand, 25 Jul 2025
In view of the comments submitted by the referees, who both express a positive opinion on the scientific quality of the paper, I suggest the authors start preparing their responses as well as writing a revised version of their paper, taking into account the referees' comments.
Citation: https://doi.org/10.5194/egusphere-2025-2420-EC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
698 | 34 | 12 | 744 | 39 | 49 |
- HTML: 698
- PDF: 34
- XML: 12
- Total: 744
- BibTeX: 39
- EndNote: 49
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Juan Martin Guerrieri
Manuel Arturo Pulido
Takemasa Miyoshi
Arata Amemiya
Juan José Ruiz
This work extends the Mapping Particle Filter to account for local dependencies. Two localization methods are tested: a global particle flow with local kernels, and iterative local mappings based on correlation radius. Using a two-scale Lorenz-96 truth and a one-scale forecast model, experiments with full/partial and linear/nonlinear observations show Root Mean Square Error (RMSE) reductions using localized Gaussian mixture priors, achieving competitive performance against Gaussian filters.
This work extends the Mapping Particle Filter to account for local dependencies. Two...