the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Bridging classical data assimilation and optimal transport
Abstract. Because optimal transport acts as displacement interpolation in physical space rather than as interpolation in value space, it can potentially avoid double penalty errors. As such it provides a very attractive metric for nonnegative physical fields comparison – the Wasserstein distance – which could further be used in data assimilation for the geosciences. The algorithmic and numerical implementations of such distance are however not straightforward. Moreover, its theoretical formulation within typical data assimilation problems face conceptual challenges, resulting in scarce contributions on the topic in the literature.
We formulate the problem in a way that offers a unified view on both classical data assimilation and optimal transport. The resulting OTDA framework accounts for both the classical source of prior errors, background and observation, together with a Wasserstein barycentre in between states that stand for these background and observation. We show that the hybrid OTDA analysis can be decomposed as a simpler OTDA problem involving a single Wasserstein distance, followed by a Wasserstein barycentre problem which ignores the prior errors and can be seen as a McCann interpolant. We also propose a less enlightening but straightforward solution to the full OTDA problem, which includes the derivation of its analysis error covariance matrix. Thanks to these theoretical developments, we are able to extend the classical 3DVar/BLUE paradigm at the core of most classical data assimilation schemes. The resulting formalism is very flexible and can account for sparse, noisy observations and nonGaussian error statistics. It is illustrated by simple one– and two–dimensional examples that show the richness of the new types of analysis offered by this unification.
 Preprint
(1386 KB)  Metadata XML
 BibTeX
 EndNote
Status: closed

RC1: 'Comment on egusphere20232755', Anonymous Referee #1, 17 Feb 2024
MS egusphere20232755
Title: Bridging classical data assimilation and optimal transport
Authors: Bocquet et alSummary
The authors propose a modification of 3DVar in which instead of using a classic quadratic error function in both the observational and background error, a variant of the Wasserstein distance is used. The authors argue that this could potentially improve results as certain problems with the quadratic error function (including what is known as the ``double penalty error'') are thus avoided. The authors motivate the detailed form of the cost function and discuss possible approaches to carry out the 3DVar minimisation in order to obtain the analysis.
General CommentsThe paper discusses a very interesting and pertinent problem and proposes a solution that is certainly worth looking at further. The continued use of quadratic error functional is somewhat anachronistic given that the main motivation for using it are simplicity of analytic computations and an assumption of Gaussian errors. The first argument carries less weight in the age of supercomputers, and the second was always known to be wrong except in (important) special cases.
I cannot, however, recommend the publication of the paper in its present form and believe that the paper requires a major revision. This is due to the following major concerns (see below for a few more minor concerns)
(1) I find the first section very confusing. The authors are discussing problems related to the socalled double penalty error and the nonoverlap of functions (or distributions) that appear in data assimilation. It is not clear however on what level the authors are working. More specifically, it seems first that the authors want to work on the level of probability distributions for state variables. Later however it turns out that they want to work on the level of state variables directly, yet focus on those that represent meteorological fields which essentially have the character of distributions. This, however, has to be clear from the very beginning.
(2) I understand what the authors label as the first (of two) weaknesses of classical DA which is often termed the double penalty error. This problem however is ultimately an issue resulting from a mismatch between the employed distance and the smoothness of meteorological fields. If the correct metric is selected depending on the smoothness of the meteorological fields, there is no double penalty problem. Labelling this as a problem of ``classical DA'' however inplies that classical DA only uses the mean square error which is not correct (error covariances are important part of the error functional and have a strong influence on whether the double penalty problem occurs or not).
(3) Related to the previous question, I do NOT understand what the authors label as the second weakness of classical DA. In fact, the last paragraph of Sec. 1.1 hardly makes any sense when it talks about ``overlap in space and time'' between background and observations. The material seems to draw on intuition coming from the Bayes rule but that applies to probability densities; the 3DVar analysis is an operatorconvex combination of meteorological fields which is something completely different.
(4) As far as I can see, the technique can only assimilate meteorological fields that essentially behave like distributions, which is clearly a major restriction. I believe the assumption that the fields be positive is not enough (after all, by choice of origin any meteorological field can be assumed to have nonnegative values). All the examples mentioned by the authors are extensive quantities. Is there an issue with applying the approach to an intensive quantity such as, say, the temperature?
Minor CommentsI have only a few minor comments at this point but I believe that a quite a few more might pop up once the major issues above have been clarified
(a) It is not clear how the authors deal with comparing fields that do not have the same mass. Also, it is not clear what the cumbersome result in Fig. 5 has to do with the assumption whether or not the two fields have the same mass.
(b) The concept of the entropy regularisation is not clear. It is not even clear why this renders the problem convex or at least uniquely solvable.
(c) There are other ways to measure distances between meteorological fields that avoid or alleviate the double penalty error (depending on the smoothness of the fields). Have the authors compared their Wasserstein approach with other metrics, also given that the entropy regularised Wasserstein distances are not easy to calculate and optimise?
(d) On pg 6 the authors claim that ``our problem is not subject to the curse of dimensionality''. Although this is true, the curse of dimensionality (in the sense implied by the discussion here) is not actually a concern in 3DVar either which is what the method should be compared with. So this remark is misleading.
Citation: https://doi.org/10.5194/egusphere20232755RC1 
AC1: 'Reply on RC1', Marc Bocquet, 26 Feb 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere20232755/egusphere20232755AC1supplement.pdf

AC1: 'Reply on RC1', Marc Bocquet, 26 Feb 2024

RC2: 'Comment on egusphere20232755', Anonymous Referee #2, 09 Mar 2024
The paper proposed a different objective function for the data assimilation problem, which is a mixed sum between the classic cost function and the Wasserstein metrics. With assumptions on strictly convexity and linearity, the problem is convex. The paper then derives the optimization conditions for the new objective function by combining convex optimization analysis and the duality concerned about the Wasserstein metric with entropy regularization. It is further supported by several 1D and 2D data assimilation problems.
 My biggest complaint is the paper title. I think the title does not match the content of the paper. Based on the title, it sounds like a providing a theoretical connection between classical data assimilation and OT. However, it is more like introducing an application of OT theory for data assimilation problems. I suggest the authors change the title of this paper to a more descriptive one.
 The assumption in equation (11) is rather strong. H is linear, and all the costs are convex. That means the paper only deals with logconcave distributions, which does not apply to the multimodal distributions that are challenging to handle. The baseline DA (17) is a strictly convex problem that can be solved easily. Even if one is worried about noise overfitting, many existing good methods exist to handle this.
 Between (17) and (18), there are new introductions of x^b and x^o with Wasserstein metrics, turning the problem from strongly convex to a mixed problem. The motivation is not very clearly stated. The significantly increased computational cost associated with (18) has to be supported by very strong reasons. For example, what properties can we achieve by combining these two different cost functions?
 Section 2.4 has too many details about the derivation that are standard steps in convex optimization. I suggest putting many in an appendix and only stating the main formula.
 The numerical examples in Sections 3 & 4 are a bit too simple. Of course, 1D and 2D OT are not so costly. However, when the dimension becomes large, the extra two terms in (18) become increasingly cumbersome, and computational cost is forbidden. This work is for geoscience applications with often highdimensional state space.
Overall, I feel the paper title is too big of a summary for the paper, and the numerical examples, on the other hand, are elementary. While I can relate the formulation from (6) to (7), which is very neat and has a clear mathematical intuition, the hybrid sum in (18) seems to be a "cocktail" of two different metrics. Further understanding is necessary even if the authors don't plan on proving any mathematical properties.
Citation: https://doi.org/10.5194/egusphere20232755RC2 
AC2: 'Reply on RC2', Marc Bocquet, 13 Mar 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere20232755/egusphere20232755AC2supplement.pdf

EC1: 'Comment on egusphere20232755', Olivier Talagrand, 13 Mar 2024
I thank the authors for their prompt response to the referees' comments, and I look forward to receiving the announced revised version of their paper. They may join to that new version any further comments they may have on the referees' reports or on the paper. I intend to submit the revised version to the two referees.
Citation: https://doi.org/10.5194/egusphere20232755EC1
Status: closed

RC1: 'Comment on egusphere20232755', Anonymous Referee #1, 17 Feb 2024
MS egusphere20232755
Title: Bridging classical data assimilation and optimal transport
Authors: Bocquet et alSummary
The authors propose a modification of 3DVar in which instead of using a classic quadratic error function in both the observational and background error, a variant of the Wasserstein distance is used. The authors argue that this could potentially improve results as certain problems with the quadratic error function (including what is known as the ``double penalty error'') are thus avoided. The authors motivate the detailed form of the cost function and discuss possible approaches to carry out the 3DVar minimisation in order to obtain the analysis.
General CommentsThe paper discusses a very interesting and pertinent problem and proposes a solution that is certainly worth looking at further. The continued use of quadratic error functional is somewhat anachronistic given that the main motivation for using it are simplicity of analytic computations and an assumption of Gaussian errors. The first argument carries less weight in the age of supercomputers, and the second was always known to be wrong except in (important) special cases.
I cannot, however, recommend the publication of the paper in its present form and believe that the paper requires a major revision. This is due to the following major concerns (see below for a few more minor concerns)
(1) I find the first section very confusing. The authors are discussing problems related to the socalled double penalty error and the nonoverlap of functions (or distributions) that appear in data assimilation. It is not clear however on what level the authors are working. More specifically, it seems first that the authors want to work on the level of probability distributions for state variables. Later however it turns out that they want to work on the level of state variables directly, yet focus on those that represent meteorological fields which essentially have the character of distributions. This, however, has to be clear from the very beginning.
(2) I understand what the authors label as the first (of two) weaknesses of classical DA which is often termed the double penalty error. This problem however is ultimately an issue resulting from a mismatch between the employed distance and the smoothness of meteorological fields. If the correct metric is selected depending on the smoothness of the meteorological fields, there is no double penalty problem. Labelling this as a problem of ``classical DA'' however inplies that classical DA only uses the mean square error which is not correct (error covariances are important part of the error functional and have a strong influence on whether the double penalty problem occurs or not).
(3) Related to the previous question, I do NOT understand what the authors label as the second weakness of classical DA. In fact, the last paragraph of Sec. 1.1 hardly makes any sense when it talks about ``overlap in space and time'' between background and observations. The material seems to draw on intuition coming from the Bayes rule but that applies to probability densities; the 3DVar analysis is an operatorconvex combination of meteorological fields which is something completely different.
(4) As far as I can see, the technique can only assimilate meteorological fields that essentially behave like distributions, which is clearly a major restriction. I believe the assumption that the fields be positive is not enough (after all, by choice of origin any meteorological field can be assumed to have nonnegative values). All the examples mentioned by the authors are extensive quantities. Is there an issue with applying the approach to an intensive quantity such as, say, the temperature?
Minor CommentsI have only a few minor comments at this point but I believe that a quite a few more might pop up once the major issues above have been clarified
(a) It is not clear how the authors deal with comparing fields that do not have the same mass. Also, it is not clear what the cumbersome result in Fig. 5 has to do with the assumption whether or not the two fields have the same mass.
(b) The concept of the entropy regularisation is not clear. It is not even clear why this renders the problem convex or at least uniquely solvable.
(c) There are other ways to measure distances between meteorological fields that avoid or alleviate the double penalty error (depending on the smoothness of the fields). Have the authors compared their Wasserstein approach with other metrics, also given that the entropy regularised Wasserstein distances are not easy to calculate and optimise?
(d) On pg 6 the authors claim that ``our problem is not subject to the curse of dimensionality''. Although this is true, the curse of dimensionality (in the sense implied by the discussion here) is not actually a concern in 3DVar either which is what the method should be compared with. So this remark is misleading.
Citation: https://doi.org/10.5194/egusphere20232755RC1 
AC1: 'Reply on RC1', Marc Bocquet, 26 Feb 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere20232755/egusphere20232755AC1supplement.pdf

AC1: 'Reply on RC1', Marc Bocquet, 26 Feb 2024

RC2: 'Comment on egusphere20232755', Anonymous Referee #2, 09 Mar 2024
The paper proposed a different objective function for the data assimilation problem, which is a mixed sum between the classic cost function and the Wasserstein metrics. With assumptions on strictly convexity and linearity, the problem is convex. The paper then derives the optimization conditions for the new objective function by combining convex optimization analysis and the duality concerned about the Wasserstein metric with entropy regularization. It is further supported by several 1D and 2D data assimilation problems.
 My biggest complaint is the paper title. I think the title does not match the content of the paper. Based on the title, it sounds like a providing a theoretical connection between classical data assimilation and OT. However, it is more like introducing an application of OT theory for data assimilation problems. I suggest the authors change the title of this paper to a more descriptive one.
 The assumption in equation (11) is rather strong. H is linear, and all the costs are convex. That means the paper only deals with logconcave distributions, which does not apply to the multimodal distributions that are challenging to handle. The baseline DA (17) is a strictly convex problem that can be solved easily. Even if one is worried about noise overfitting, many existing good methods exist to handle this.
 Between (17) and (18), there are new introductions of x^b and x^o with Wasserstein metrics, turning the problem from strongly convex to a mixed problem. The motivation is not very clearly stated. The significantly increased computational cost associated with (18) has to be supported by very strong reasons. For example, what properties can we achieve by combining these two different cost functions?
 Section 2.4 has too many details about the derivation that are standard steps in convex optimization. I suggest putting many in an appendix and only stating the main formula.
 The numerical examples in Sections 3 & 4 are a bit too simple. Of course, 1D and 2D OT are not so costly. However, when the dimension becomes large, the extra two terms in (18) become increasingly cumbersome, and computational cost is forbidden. This work is for geoscience applications with often highdimensional state space.
Overall, I feel the paper title is too big of a summary for the paper, and the numerical examples, on the other hand, are elementary. While I can relate the formulation from (6) to (7), which is very neat and has a clear mathematical intuition, the hybrid sum in (18) seems to be a "cocktail" of two different metrics. Further understanding is necessary even if the authors don't plan on proving any mathematical properties.
Citation: https://doi.org/10.5194/egusphere20232755RC2 
AC2: 'Reply on RC2', Marc Bocquet, 13 Mar 2024
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere20232755/egusphere20232755AC2supplement.pdf

EC1: 'Comment on egusphere20232755', Olivier Talagrand, 13 Mar 2024
I thank the authors for their prompt response to the referees' comments, and I look forward to receiving the announced revised version of their paper. They may join to that new version any further comments they may have on the referees' reports or on the paper. I intend to submit the revised version to the two referees.
Citation: https://doi.org/10.5194/egusphere20232755EC1
Viewed
HTML  XML  Total  BibTeX  EndNote  

372  190  32  594  30  23 
 HTML: 372
 PDF: 190
 XML: 32
 Total: 594
 BibTeX: 30
 EndNote: 23
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1