Bridging classical data assimilation and optimal transport

Bocquet, Marc; Vanderbecken, Pierre J.; Farchi, Alban; Dumont Le Brazidec, Joffrey; Roustan, Yelva

doi:https://doi.org/10.5194/egusphere-2023-2755

Preprints

https://doi.org/10.5194/egusphere-2023-2755

Preprints

05 Dec 2023

| 05 Dec 2023

Bridging classical data assimilation and optimal transport

Marc Bocquet, Pierre J. Vanderbecken, Alban Farchi, Joffrey Dumont Le Brazidec, and Yelva Roustan

Abstract. Because optimal transport acts as displacement interpolation in physical space rather than as interpolation in value space, it can potentially avoid double penalty errors. As such it provides a very attractive metric for non-negative physical fields comparison – the Wasserstein distance – which could further be used in data assimilation for the geosciences. The algorithmic and numerical implementations of such distance are however not straightforward. Moreover, its theoretical formulation within typical data assimilation problems face conceptual challenges, resulting in scarce contributions on the topic in the literature.

We formulate the problem in a way that offers a unified view on both classical data assimilation and optimal transport. The resulting OTDA framework accounts for both the classical source of prior errors, background and observation, together with a Wasserstein barycentre in between states that stand for these background and observation. We show that the hybrid OTDA analysis can be decomposed as a simpler OTDA problem involving a single Wasserstein distance, followed by a Wasserstein barycentre problem which ignores the prior errors and can be seen as a McCann interpolant. We also propose a less enlightening but straightforward solution to the full OTDA problem, which includes the derivation of its analysis error covariance matrix. Thanks to these theoretical developments, we are able to extend the classical 3D-Var/BLUE paradigm at the core of most classical data assimilation schemes. The resulting formalism is very flexible and can account for sparse, noisy observations and non-Gaussian error statistics. It is illustrated by simple one– and two–dimensional examples that show the richness of the new types of analysis offered by this unification.

Received: 20 Nov 2023 – Discussion started: 05 Dec 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1386 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1386 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

12 Jul 2024

Bridging classical data assimilation and optimal transport: the 3D-Var case

Marc Bocquet, Pierre J. Vanderbecken, Alban Farchi, Joffrey Dumont Le Brazidec, and Yelva Roustan

Nonlin. Processes Geophys., 31, 335–357, https://doi.org/10.5194/npg-31-335-2024,https://doi.org/10.5194/npg-31-335-2024, 2024

Short summary

Marc Bocquet, Pierre J. Vanderbecken, Alban Farchi, Joffrey Dumont Le Brazidec, and Yelva Roustan

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2755', Anonymous Referee #1, 17 Feb 2024

MS egusphere-2023-2755

Title: Bridging classical data assimilation and optimal transport

Authors: Bocquet et al
Summary
The authors propose a modification of 3D-Var in which instead of using a classic quadratic error function in both the observational and background error, a variant of the Wasserstein distance is used. The authors argue that this could potentially improve results as certain problems with the quadratic error function (including what is known as the ``double penalty error'') are thus avoided. The authors motivate the detailed form of the cost function and discuss possible approaches to carry out the 3D-Var minimisation in order to obtain the analysis.

General Comments
The paper discusses a very interesting and pertinent problem and proposes a solution that is certainly worth looking at further. The continued use of quadratic error functional is somewhat anachronistic given that the main motivation for using it are simplicity of analytic computations and an assumption of Gaussian errors. The first argument carries less weight in the age of supercomputers, and the second was always known to be wrong except in (important) special cases.
I cannot, however, recommend the publication of the paper in its present form and believe that the paper requires a major revision. This is due to the following major concerns (see below for a few more minor concerns)
(1) I find the first section very confusing. The authors are discussing problems related to the so-called double penalty error and the non-overlap of functions (or distributions) that appear in data assimilation. It is not clear however on what level the authors are working. More specifically, it seems first that the authors want to work on the level of probability distributions for state variables. Later however it turns out that they want to work on the level of state variables directly, yet focus on those that represent meteorological fields which essentially have the character of distributions. This, however, has to be clear from the very beginning.
(2) I understand what the authors label as the first (of two) weaknesses of classical DA which is often termed the double penalty error. This problem however is ultimately an issue resulting from a mismatch between the employed distance and the smoothness of meteorological fields. If the correct metric is selected depending on the smoothness of the meteorological fields, there is no double penalty problem. Labelling this as a problem of ``classical DA'' however inplies that classical DA only uses the mean square error which is not correct (error covariances are important part of the error functional and have a strong influence on whether the double penalty problem occurs or not).
(3) Related to the previous question, I do NOT understand what the authors label as the second weakness of classical DA. In fact, the last paragraph of Sec. 1.1 hardly makes any sense when it talks about ``overlap in space and time'' between background and observations. The material seems to draw on intuition coming from the Bayes rule but that applies to probability densities; the 3D-Var analysis is an operator-convex combination of meteorological fields which is something completely different.
(4) As far as I can see, the technique can only assimilate meteorological fields that essentially behave like distributions, which is clearly a major restriction. I believe the assumption that the fields be positive is not enough (after all, by choice of origin any meteorological field can be assumed to have nonnegative values). All the examples mentioned by the authors are extensive quantities. Is there an issue with applying the approach to an intensive quantity such as, say, the temperature?

Minor Comments
I have only a few minor comments at this point but I believe that a quite a few more might pop up once the major issues above have been clarified
(a) It is not clear how the authors deal with comparing fields that do not have the same mass. Also, it is not clear what the cumbersome result in Fig. 5 has to do with the assumption whether or not the two fields have the same mass.
(b) The concept of the entropy regularisation is not clear. It is not even clear why this renders the problem convex or at least uniquely solvable.
(c) There are other ways to measure distances between meteorological fields that avoid or alleviate the double penalty error (depending on the smoothness of the fields). Have the authors compared their Wasserstein approach with other metrics, also given that the entropy regularised Wasserstein distances are not easy to calculate and optimise?
(d) On pg 6 the authors claim that ``our problem is not subject to the curse of dimensionality''. Although this is true, the curse of dimensionality (in the sense implied by the discussion here) is not actually a concern in 3D-Var either which is what the method should be compared with. So this remark is misleading.

Citation: https://doi.org/10.5194/egusphere-2023-2755-RC1
- AC1: 'Reply on RC1', Marc Bocquet, 26 Feb 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-2755/egusphere-2023-2755-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2023-2755-AC1
RC2:
'Comment on egusphere-2023-2755', Anonymous Referee #2, 09 Mar 2024
The paper proposed a different objective function for the data assimilation problem, which is a mixed sum between the classic cost function and the Wasserstein metrics. With assumptions on strictly convexity and linearity, the problem is convex. The paper then derives the optimization conditions for the new objective function by combining convex optimization analysis and the duality concerned about the Wasserstein metric with entropy regularization. It is further supported by several 1D and 2D data assimilation problems.
My biggest complaint is the paper title. I think the title does not match the content of the paper. Based on the title, it sounds like a providing a theoretical connection between classical data assimilation and OT. However, it is more like introducing an application of OT theory for data assimilation problems. I suggest the authors change the title of this paper to a more descriptive one.

The assumption in equation (11) is rather strong. H is linear, and all the costs are convex. That means the paper only deals with log-concave distributions, which does not apply to the multi-modal distributions that are challenging to handle. The baseline DA (17) is a strictly convex problem that can be solved easily. Even if one is worried about noise overfitting, many existing good methods exist to handle this.

Between (17) and (18), there are new introductions of x^b and x^o with Wasserstein metrics, turning the problem from strongly convex to a mixed problem. The motivation is not very clearly stated. The significantly increased computational cost associated with (18) has to be supported by very strong reasons. For example, what properties can we achieve by combining these two different cost functions?

Section 2.4 has too many details about the derivation that are standard steps in convex optimization. I suggest putting many in an appendix and only stating the main formula.

The numerical examples in Sections 3 & 4 are a bit too simple. Of course, 1D and 2D OT are not so costly. However, when the dimension becomes large, the extra two terms in (18) become increasingly cumbersome, and computational cost is forbidden. This work is for geoscience applications with often high-dimensional state space.

Overall, I feel the paper title is too big of a summary for the paper, and the numerical examples, on the other hand, are elementary. While I can relate the formulation from (6) to (7), which is very neat and has a clear mathematical intuition, the hybrid sum in (18) seems to be a "cocktail" of two different metrics. Further understanding is necessary even if the authors don't plan on proving any mathematical properties.
Citation: https://doi.org/10.5194/egusphere-2023-2755-RC2
- AC2: 'Reply on RC2', Marc Bocquet, 13 Mar 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-2755/egusphere-2023-2755-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2023-2755-AC2
EC1: 'Comment on egusphere-2023-2755', Olivier Talagrand, 13 Mar 2024

I thank the authors for their prompt response to the referees' comments, and I look forward to receiving the announced revised version of their paper. They may join to that new version any further comments they may have on the referees' reports or on the paper. I intend to submit the revised version to the two referees.

Citation: https://doi.org/10.5194/egusphere-2023-2755-EC1

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2755', Anonymous Referee #1, 17 Feb 2024

MS egusphere-2023-2755

Title: Bridging classical data assimilation and optimal transport

Authors: Bocquet et al
Summary
The authors propose a modification of 3D-Var in which instead of using a classic quadratic error function in both the observational and background error, a variant of the Wasserstein distance is used. The authors argue that this could potentially improve results as certain problems with the quadratic error function (including what is known as the ``double penalty error'') are thus avoided. The authors motivate the detailed form of the cost function and discuss possible approaches to carry out the 3D-Var minimisation in order to obtain the analysis.

General Comments
The paper discusses a very interesting and pertinent problem and proposes a solution that is certainly worth looking at further. The continued use of quadratic error functional is somewhat anachronistic given that the main motivation for using it are simplicity of analytic computations and an assumption of Gaussian errors. The first argument carries less weight in the age of supercomputers, and the second was always known to be wrong except in (important) special cases.
I cannot, however, recommend the publication of the paper in its present form and believe that the paper requires a major revision. This is due to the following major concerns (see below for a few more minor concerns)
(1) I find the first section very confusing. The authors are discussing problems related to the so-called double penalty error and the non-overlap of functions (or distributions) that appear in data assimilation. It is not clear however on what level the authors are working. More specifically, it seems first that the authors want to work on the level of probability distributions for state variables. Later however it turns out that they want to work on the level of state variables directly, yet focus on those that represent meteorological fields which essentially have the character of distributions. This, however, has to be clear from the very beginning.
(2) I understand what the authors label as the first (of two) weaknesses of classical DA which is often termed the double penalty error. This problem however is ultimately an issue resulting from a mismatch between the employed distance and the smoothness of meteorological fields. If the correct metric is selected depending on the smoothness of the meteorological fields, there is no double penalty problem. Labelling this as a problem of ``classical DA'' however inplies that classical DA only uses the mean square error which is not correct (error covariances are important part of the error functional and have a strong influence on whether the double penalty problem occurs or not).
(3) Related to the previous question, I do NOT understand what the authors label as the second weakness of classical DA. In fact, the last paragraph of Sec. 1.1 hardly makes any sense when it talks about ``overlap in space and time'' between background and observations. The material seems to draw on intuition coming from the Bayes rule but that applies to probability densities; the 3D-Var analysis is an operator-convex combination of meteorological fields which is something completely different.
(4) As far as I can see, the technique can only assimilate meteorological fields that essentially behave like distributions, which is clearly a major restriction. I believe the assumption that the fields be positive is not enough (after all, by choice of origin any meteorological field can be assumed to have nonnegative values). All the examples mentioned by the authors are extensive quantities. Is there an issue with applying the approach to an intensive quantity such as, say, the temperature?

Minor Comments
I have only a few minor comments at this point but I believe that a quite a few more might pop up once the major issues above have been clarified
(a) It is not clear how the authors deal with comparing fields that do not have the same mass. Also, it is not clear what the cumbersome result in Fig. 5 has to do with the assumption whether or not the two fields have the same mass.
(b) The concept of the entropy regularisation is not clear. It is not even clear why this renders the problem convex or at least uniquely solvable.
(c) There are other ways to measure distances between meteorological fields that avoid or alleviate the double penalty error (depending on the smoothness of the fields). Have the authors compared their Wasserstein approach with other metrics, also given that the entropy regularised Wasserstein distances are not easy to calculate and optimise?
(d) On pg 6 the authors claim that ``our problem is not subject to the curse of dimensionality''. Although this is true, the curse of dimensionality (in the sense implied by the discussion here) is not actually a concern in 3D-Var either which is what the method should be compared with. So this remark is misleading.

Citation: https://doi.org/10.5194/egusphere-2023-2755-RC1
- AC1: 'Reply on RC1', Marc Bocquet, 26 Feb 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-2755/egusphere-2023-2755-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2023-2755-AC1
RC2:
'Comment on egusphere-2023-2755', Anonymous Referee #2, 09 Mar 2024
The paper proposed a different objective function for the data assimilation problem, which is a mixed sum between the classic cost function and the Wasserstein metrics. With assumptions on strictly convexity and linearity, the problem is convex. The paper then derives the optimization conditions for the new objective function by combining convex optimization analysis and the duality concerned about the Wasserstein metric with entropy regularization. It is further supported by several 1D and 2D data assimilation problems.
My biggest complaint is the paper title. I think the title does not match the content of the paper. Based on the title, it sounds like a providing a theoretical connection between classical data assimilation and OT. However, it is more like introducing an application of OT theory for data assimilation problems. I suggest the authors change the title of this paper to a more descriptive one.

The assumption in equation (11) is rather strong. H is linear, and all the costs are convex. That means the paper only deals with log-concave distributions, which does not apply to the multi-modal distributions that are challenging to handle. The baseline DA (17) is a strictly convex problem that can be solved easily. Even if one is worried about noise overfitting, many existing good methods exist to handle this.

Between (17) and (18), there are new introductions of x^b and x^o with Wasserstein metrics, turning the problem from strongly convex to a mixed problem. The motivation is not very clearly stated. The significantly increased computational cost associated with (18) has to be supported by very strong reasons. For example, what properties can we achieve by combining these two different cost functions?

Section 2.4 has too many details about the derivation that are standard steps in convex optimization. I suggest putting many in an appendix and only stating the main formula.

The numerical examples in Sections 3 & 4 are a bit too simple. Of course, 1D and 2D OT are not so costly. However, when the dimension becomes large, the extra two terms in (18) become increasingly cumbersome, and computational cost is forbidden. This work is for geoscience applications with often high-dimensional state space.

Overall, I feel the paper title is too big of a summary for the paper, and the numerical examples, on the other hand, are elementary. While I can relate the formulation from (6) to (7), which is very neat and has a clear mathematical intuition, the hybrid sum in (18) seems to be a "cocktail" of two different metrics. Further understanding is necessary even if the authors don't plan on proving any mathematical properties.
Citation: https://doi.org/10.5194/egusphere-2023-2755-RC2
- AC2: 'Reply on RC2', Marc Bocquet, 13 Mar 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-2755/egusphere-2023-2755-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2023-2755-AC2
EC1: 'Comment on egusphere-2023-2755', Olivier Talagrand, 13 Mar 2024

I thank the authors for their prompt response to the referees' comments, and I look forward to receiving the announced revised version of their paper. They may join to that new version any further comments they may have on the referees' reports or on the paper. I intend to submit the revised version to the two referees.

Citation: https://doi.org/10.5194/egusphere-2023-2755-EC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Marc Bocquet on behalf of the Authors (25 Mar 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (29 Mar 2024) by Olivier Talagrand

RR by Anonymous Referee #2 (02 Apr 2024)

RR by Anonymous Referee #1 (29 Apr 2024)

Suggestions for revision or reasons for rejection

MS egusphere-2023-2755, revision 1
Title: Bridging classical data assimilation and optimal transport: The 3DVAR case
Authors: Bocquet et al

General Comments

The authors have addressed most of the comments, but from my point of view several issues remain. I leave it to the editor to decide whether these require another revision.

(1) Related to my previous comment 1 and the authors' response, I agree that the manuscript ultimately clarifies that the theory applies to fields and not probability distributions. But in the original version at least, this is not stated until Sec. 1.4 on pg 5. The new MS makes this somewhat clearer although I personally believe it should be said earlier and more prominently.

(2) Related to my previous comments 2 and 4c and what types of distances are able to ``cope'' with ``distortions'', I believe it depends on what one means with ``to cope'' and with ``distortions'', and I also believe that there is no simple binary answer. The authors do not analyse this important issue in any depth which I feel is the main shortcoming of this paper.

(3) Related to previous comment (4a), it is still not clear to me how the authors deal with comparing fields that do not have the same mass.

(4) Related to previous comment (4b), I now see that the regularised problem is convex but as a matter of fact, the original problem is a linear program and thus convex, albeit potentially not strictly convex. This should be clarified. In addition, it cannot hurt to comment on why the KL approach is appropriate, given that it strongly penalises fields that are rather localised with respect to the measure \nu. Dealing with strongly localised fields was, as far as I understand, a motivation for the authors to propose the optimal transport methodology in the first place.

Hide

ED: Reconsider after major revisions (further review by editor and referees) (07 May 2024) by Olivier Talagrand

AR by Marc Bocquet on behalf of the Authors (17 May 2024) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (21 May 2024) by Olivier Talagrand

AR by Marc Bocquet on behalf of the Authors (22 May 2024) Author's response Manuscript

Journal article(s) based on this preprint

12 Jul 2024

Bridging classical data assimilation and optimal transport: the 3D-Var case

Marc Bocquet, Pierre J. Vanderbecken, Alban Farchi, Joffrey Dumont Le Brazidec, and Yelva Roustan

Nonlin. Processes Geophys., 31, 335–357, https://doi.org/10.5194/npg-31-335-2024,https://doi.org/10.5194/npg-31-335-2024, 2024

Short summary

Marc Bocquet, Pierre J. Vanderbecken, Alban Farchi, Joffrey Dumont Le Brazidec, and Yelva Roustan

Viewed

Total article views: 631 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
399	198	34	631	31	24

HTML: 399
PDF: 198
XML: 34
Total: 631
BibTeX: 31
EndNote: 24

Views and downloads (calculated since 05 Dec 2023)

Month	HTML	PDF	XML	Total
Dec 2023	124	58	8	190
Jan 2024	59	16	4	79
Feb 2024	73	27	4	104
Mar 2024	50	27	8	85
Apr 2024	17	24	4	45
May 2024	32	26	3	61
Jun 2024	41	15	2	58
Jul 2024	3	5	1	9

Cumulative views and downloads (calculated since 05 Dec 2023)

Month	HTML	PDF	XML	Total
Dec 2023	124	58	8	190
Jan 2024	59	16	4	79
Feb 2024	73	27	4	104
Mar 2024	50	27	8	85
Apr 2024	17	24	4	45
May 2024	32	26	3	61
Jun 2024	41	15	2	58
Jul 2024	3	5	1	9

Viewed (geographical distribution)

Total article views: 623 (including HTML, PDF, and XML) Thereof 623 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 12 Jul 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1386 KB)
Metadata XML

Short summary

A novel approach, the Optimal Transport Data Assimilation, is introduced to merge data assimilation and optimal transport concepts. By leveraging optimal transport's displacement interpolation in space, it minimises mislocation errors within data assimilation applied to physical fields, such as water vapour, hydrometeors, chemical species, etc. Its richness and flexibility are showcased through one- and two-dimensional illustrations.


Total:	0
HTML:	0
PDF:	0
XML:	0