the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: Posterior Uncertainty Estimation via a Monte Carlo Procedure Specialized for Data Assimilation
Abstract. Through the Bayesian lens of data assimilation, uncertainty on model parameters is traditionally quantified through the posterior covariance matrix. However, in modern settings involving highdimensional and computationally expensive forward models, posterior covariance knowledge must be relaxed to deterministic or stochastic approximations. In the carbon flux inversion literature, Chevallier et al. (2007) proposed a stochastic method capable of approximating posterior variances of linear functionals of the model parameters that is particularly wellsuited for largescale Earthsystem data assimilation tasks. This note formalizes this algorithm and clarifies its properties. We provide a formal statement of the algorithm, demonstrate why it converges to the desired posterior variance quantity of interest, and provide additional uncertainty quantification allowing incorporation of the Monte Carlo sampling uncertainty into the method's Bayesian credible intervals. The methodology is demonstrated using toy simulations and a realistic carbon flux inversion observing system simulation experiment.
 Preprint
(2603 KB)  Metadata XML
 BibTeX
 EndNote
Status: final response (author comments only)

RC1: 'Comment on egusphere20232675', Anonymous Referee #1, 25 Jan 2024
In this technical note the authors present an analysis of a Monte Carlo method proposed by Chevallier et al. (2007) for approximating posterior variances of functionals of trace gas flux fields. The method involves drawing unconditional samples from both the observation distribution and the prior flux distribution, and substituting those in to the optimal flux estimation procedure in order to generate samples of the flux. They provide an analysis of the method, showing that these samples have the same covariance as the true posterior variance.
The method has been around for a long time without being formally checked and it is good that the authors have analysed its properties. The analysis in the note has been done well and thoroughly. What the authors may not know is that the method, or something very similar to it, has already been used in the geostatistics field as a strategy for turning unconditional random field simulations into conditional simulations; see Chapter 2, Section 3.6.2 of Cressie (1993), or Chapter 7, Section 7.3.1 of Chiles and Delfiner (2012), who attributed the method to Matheron. In the geostatistics setting, one has a random field Y(.) observed at locations s_1, ..., s_n, and conditional simulations are desired for one or more unobserved locations. For illustration, consider a single unknown location s*. In the method, an unconditional simulation is made of (Y_NS(s_1), ..., Y_NS(s_n))', where Y_NS(.) is another spatial field with identical covariance to Y(.) but independent of it. Using this simulation, the kriging prediction of Y_NS(s*) is calculated; this can be shown to have the same variance as the kriging prediction of Y(s*). When Y(.) is Gaussian, it can be shown that the predictor, being linear, is also Gaussian. More details can be found in the references.
To map this to the data assimilation case and also to the notation of the paper: a prediction is sought of c from the observations y, which are related through E(y) = A_\mu c. The joint distribution in question is z = (y',c')', where c is unobserved, and due to the Gaussian assumption for y and c the optimal (MAP) prediction of c given y is equivalent to the kriging prediction. Unlike the original method due to Matheron, in both Chevallier et al. (2007) and the present note the joint distribution over the unconditional simulations, (y_k', c_k')', is not exactly the same as (y',c')'. However, the method is essentially the same: unconditional simulations y_k and c_k with appropriate covariances are substituted into the kriging predictor for c (which is computed using gradientbased methods rather than analytically).
I think it is very important that the authors draw the connections to this prior work, even if it was rediscovered independently by Chevallier et al. (2007). This may also point towards generalisations of the method, because the original method was not constructed under a Gaussian assumption but it can still be used to estimate prediction covariances.
Beyond this major point, I think the authors have done a good job in documenting the method and in giving practical guidance for its use. My main criticism is that the mathematical framework used is too strongly tied to flux inversion rather than a broader class of data assimulation methods as promised in the title and abstract. I discuss this further in my detailed comments below. Similarly, the circumstances under which the method are useful are not really spelled out explicitly; this would also increase the applicability of the method and the note.
Detailed comments:
 In Section 2.1:
 As I wrote above, I think here it is not as clear here as it could be that the procedure is not limited to flux inversion, but can work for any system for which (i) the mean of the observations are a linear function of the unknown parameters, (ii) the prior and observation distributions are Gaussian, and (iii) the posterior mode/posterior mean can be computed efficiently. Currently, the section is very tailored towards flux inversion, which I think does the generality of the method a disservice. An example of this is the use of the scaling factor parameterisation c o \mu, commonly used in flux inversion, which must be carried through the derivations but which has essentially nothing to do with the underlying Monte Carlo procedure. I think this section could be substantially simplified and made more general at the same time, and the details of the parameterisation could be deferred to Section 3.
 On the topic of generality, the simplification B = b^2 I_m is introduced here that are not necessary for the mathematical results. Can this be left until section 3?
 The situation in which the Monte Carlo method is applicable is the one where the posterior mean can be computed efficiently. In the flux inversion case, this arises due to the equivalence of the posterior mean and mode due to the Gaussian assumption, so that efficient gradient based methods that target the posterior mode can be used to compute the posterior mean (which one would typically compute analytically). The feasibility of this computation is critical to whether it's worth using this method, but as it stands this doesn't come out clearly in the note.
 In Section 2.2:
 For the unconditional simulation the authors choose y_k ~ N(A\mu,R) and, as the authors show, this gives the appropriate covariance to the samples c_MAP^k . However, Chevallier et al. (2007) took the mean of y_k to be y, and as a consequence had E(c_MAP^k) = \alpha. In that case, the distribution of c_MAP^k is precisely equal to the posterior distribution of c given y, which is elegant. In practice, any value can be chosen for the mean of y. There’s no need to change this choice, but the difference to Chevallier and the freedom of this choice should be noted.
 Since it follows immediately from linearity, it may be worthing stating here that c_MAP^k is Gaussian distributed, rather than deferring this to Section 2.3.
References
Chevallier, F., Breon, F.M., and Rayner, P. J. (2007). Contribution of the Orbiting Carbon Observatory to the estimation of CO2 sources and sinks: Theoretical study in a variational data assimilation framework. Journal of Geophysical Research: Atmospheres, 112(D9).
Chiles, J.P. and Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, 2nd edition. John Wiley & Sons, New York, NY.
Cressie, N. (1993). Statistics for Spatial Data, rev. edn. John Wiley & Sons, New York, NY.Citation: https://doi.org/10.5194/egusphere20232675RC1  In Section 2.1:

RC2: 'Comment on egusphere20232675', Anonymous Referee #2, 19 Feb 2024
Review on “Technical Note: Posterior Uncertainty Estimation via a Monte Carlo Procedure Specialized for Data Assimilation”
In this technical note, the method used by Chevalier et al., 2007, in which the posterior uncertainty estimate is approximated by an ensemble of 5 perturbed model inversions, is formalized. The article further details the derivation on how the method provides information about an “uncertainty of the uncertainty” estimate. The article demonstrates the applicability to a simple two dimensional model to proof the general concept and to an OSSE in the field of carbon flux estimation.
I congratulate the authors to a well written article. The mathematical formulation appears sound and the description of the method is proper.
My main concern is the strong focus on CO2 flux inversions. The method’s derivation does not require the link to CO2 at all, thus I would appreciate a more general description of the method. E.g., in line 98 ff., the observations are described as X_{CO2} retrievals although any observations should be feasible for the DA application. I suggest to rewrite the section 2 by avoiding the link to CO2 flux inversions. Further, a detailed discussion on the properties of the method is needed. For example, the 4Dvar inversion method is described as it is important for the MC method. From what I can see, any inversion method should be applicable in this context. Further, the requirements of the method remain unclear. Linear models and observation operators are applied in the demonstrator simulations as well as in the derivation of the method. In many field of application (e.g., NWP, air quality), this linear model is not applicable and the use of tangentlinear models is established. Would the method be also applicable in these areas? Finally, I suggest to add a discussion on the ensemble type to which the method can be applied. E.g., does the method also work for multimodel or multiparameter ensembles, which may contain more sources of uncertainty than only the flux scaling factors.
I recommend publication after the addition of some critical discussion on the previous points. Further, see below for some detailed comments.
General comments:
 please add a short description of particle filter methods, that provide uncertainty estimates without the need for linear model or Gaussian error statistic, in the introduction
Specific comments:
 line: 24: put “infeasible” to the end of the sentence
 line 39: change “(Vasco et al., 1993)” to “Vasco et al. (1993)”; please revise the article and change other intext citations accordingly
 line 68: The second statement on the UQ method (i.e., its independence on the prior biases) appears unproofed. Please include a dedicated paragraph to discuss the algorithms properties.
 line 339: This is not clear to me. The derivation of the MC algorithm aims at defining a formula for the posterior uncertainty estimation. Why can’t this estimate be used?
 line 367: The uncertainty obtained with the ensemble approach is an estimate of the posterior uncertainty. Thus, this statement is not clear to me. Why should the uncertainty of the approach be less than the posterior?.
 line 375: please define [x]_i is element I of vector x
 line 382: change [ ]_j to [ ]_i
 table 3: the number of ensemble members compared to the dimension of the problem is unrealistically large. Please provide a discussion on how the size of the ensemble improves the estimate of the MAP and its uncertainty. Specifically, how does the error in your estimate grow when taking only 10100 samples of the prior PDF?
Citation: https://doi.org/10.5194/egusphere20232675RC2
Model code and software
GEOSChem Adjoint D. Henze et al. http://wiki.seas.harvard.edu/geoschem/index.php/GEOSChem_Adjoint
Viewed
HTML  XML  Total  BibTeX  EndNote  

134  31  11  176  5  4 
 HTML: 134
 PDF: 31
 XML: 11
 Total: 176
 BibTeX: 5
 EndNote: 4
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1