the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: Posterior Uncertainty Estimation via a Monte Carlo Procedure Specialized for Data Assimilation
Abstract. Through the Bayesian lens of data assimilation, uncertainty on model parameters is traditionally quantified through the posterior covariance matrix. However, in modern settings involving highdimensional and computationally expensive forward models, posterior covariance knowledge must be relaxed to deterministic or stochastic approximations. In the carbon flux inversion literature, Chevallier et al. (2007) proposed a stochastic method capable of approximating posterior variances of linear functionals of the model parameters that is particularly wellsuited for largescale Earthsystem data assimilation tasks. This note formalizes this algorithm and clarifies its properties. We provide a formal statement of the algorithm, demonstrate why it converges to the desired posterior variance quantity of interest, and provide additional uncertainty quantification allowing incorporation of the Monte Carlo sampling uncertainty into the method's Bayesian credible intervals. The methodology is demonstrated using toy simulations and a realistic carbon flux inversion observing system simulation experiment.

Notice on discussion status
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint
(2603 KB)

The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(2603 KB)  Metadata XML
 BibTeX
 EndNote
 Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed

RC1: 'Comment on egusphere20232675', Anonymous Referee #1, 25 Jan 2024
In this technical note the authors present an analysis of a Monte Carlo method proposed by Chevallier et al. (2007) for approximating posterior variances of functionals of trace gas flux fields. The method involves drawing unconditional samples from both the observation distribution and the prior flux distribution, and substituting those in to the optimal flux estimation procedure in order to generate samples of the flux. They provide an analysis of the method, showing that these samples have the same covariance as the true posterior variance.
The method has been around for a long time without being formally checked and it is good that the authors have analysed its properties. The analysis in the note has been done well and thoroughly. What the authors may not know is that the method, or something very similar to it, has already been used in the geostatistics field as a strategy for turning unconditional random field simulations into conditional simulations; see Chapter 2, Section 3.6.2 of Cressie (1993), or Chapter 7, Section 7.3.1 of Chiles and Delfiner (2012), who attributed the method to Matheron. In the geostatistics setting, one has a random field Y(.) observed at locations s_1, ..., s_n, and conditional simulations are desired for one or more unobserved locations. For illustration, consider a single unknown location s*. In the method, an unconditional simulation is made of (Y_NS(s_1), ..., Y_NS(s_n))', where Y_NS(.) is another spatial field with identical covariance to Y(.) but independent of it. Using this simulation, the kriging prediction of Y_NS(s*) is calculated; this can be shown to have the same variance as the kriging prediction of Y(s*). When Y(.) is Gaussian, it can be shown that the predictor, being linear, is also Gaussian. More details can be found in the references.
To map this to the data assimilation case and also to the notation of the paper: a prediction is sought of c from the observations y, which are related through E(y) = A_\mu c. The joint distribution in question is z = (y',c')', where c is unobserved, and due to the Gaussian assumption for y and c the optimal (MAP) prediction of c given y is equivalent to the kriging prediction. Unlike the original method due to Matheron, in both Chevallier et al. (2007) and the present note the joint distribution over the unconditional simulations, (y_k', c_k')', is not exactly the same as (y',c')'. However, the method is essentially the same: unconditional simulations y_k and c_k with appropriate covariances are substituted into the kriging predictor for c (which is computed using gradientbased methods rather than analytically).
I think it is very important that the authors draw the connections to this prior work, even if it was rediscovered independently by Chevallier et al. (2007). This may also point towards generalisations of the method, because the original method was not constructed under a Gaussian assumption but it can still be used to estimate prediction covariances.
Beyond this major point, I think the authors have done a good job in documenting the method and in giving practical guidance for its use. My main criticism is that the mathematical framework used is too strongly tied to flux inversion rather than a broader class of data assimulation methods as promised in the title and abstract. I discuss this further in my detailed comments below. Similarly, the circumstances under which the method are useful are not really spelled out explicitly; this would also increase the applicability of the method and the note.
Detailed comments:
 In Section 2.1:
 As I wrote above, I think here it is not as clear here as it could be that the procedure is not limited to flux inversion, but can work for any system for which (i) the mean of the observations are a linear function of the unknown parameters, (ii) the prior and observation distributions are Gaussian, and (iii) the posterior mode/posterior mean can be computed efficiently. Currently, the section is very tailored towards flux inversion, which I think does the generality of the method a disservice. An example of this is the use of the scaling factor parameterisation c o \mu, commonly used in flux inversion, which must be carried through the derivations but which has essentially nothing to do with the underlying Monte Carlo procedure. I think this section could be substantially simplified and made more general at the same time, and the details of the parameterisation could be deferred to Section 3.
 On the topic of generality, the simplification B = b^2 I_m is introduced here that are not necessary for the mathematical results. Can this be left until section 3?
 The situation in which the Monte Carlo method is applicable is the one where the posterior mean can be computed efficiently. In the flux inversion case, this arises due to the equivalence of the posterior mean and mode due to the Gaussian assumption, so that efficient gradient based methods that target the posterior mode can be used to compute the posterior mean (which one would typically compute analytically). The feasibility of this computation is critical to whether it's worth using this method, but as it stands this doesn't come out clearly in the note.
 In Section 2.2:
 For the unconditional simulation the authors choose y_k ~ N(A\mu,R) and, as the authors show, this gives the appropriate covariance to the samples c_MAP^k . However, Chevallier et al. (2007) took the mean of y_k to be y, and as a consequence had E(c_MAP^k) = \alpha. In that case, the distribution of c_MAP^k is precisely equal to the posterior distribution of c given y, which is elegant. In practice, any value can be chosen for the mean of y. There’s no need to change this choice, but the difference to Chevallier and the freedom of this choice should be noted.
 Since it follows immediately from linearity, it may be worthing stating here that c_MAP^k is Gaussian distributed, rather than deferring this to Section 2.3.
References
Chevallier, F., Breon, F.M., and Rayner, P. J. (2007). Contribution of the Orbiting Carbon Observatory to the estimation of CO2 sources and sinks: Theoretical study in a variational data assimilation framework. Journal of Geophysical Research: Atmospheres, 112(D9).
Chiles, J.P. and Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, 2nd edition. John Wiley & Sons, New York, NY.
Cressie, N. (1993). Statistics for Spatial Data, rev. edn. John Wiley & Sons, New York, NY.Citation: https://doi.org/10.5194/egusphere20232675RC1 
AC1: 'Reply on RC1', Michael Stanley, 01 Apr 2024
Dear Referee 1,
Thank you for your review and detailed comments. We found the spatial statistics connection particularly illuminating. Please see below for a pointbypoint response. Your original comments are in bold/italics with our responses immediately following.
What the authors may not know is that the method, or something very similar to it, has already been used in the geostatistics field as a strategy for turning unconditional random field simulations into conditional simulations… This may also point towards generalisations of the method, because the original method was not constructed under a Gaussian assumption but it can still be used to estimate prediction covariances.
Again, thank you for making this connection! This sampling method appears to be very similar to the one we discuss. In particular, both methods rely upon a covariance equivalence between the distribution of the conditional distribution of interest and the simulator to which one has access. As such, we have added your provided geostatistics references in Section 2.2, after articulating the necessity of the covariance equivalence. We also added a note in the Conclusion to articulate your point about this connection providing possible generalization beyond the Gaussian/linear assumptions. This is a good point, but we add that demonstrating the covariance equivalence would be the key to making this connection work in general. The Gaussian/linear assumptions allow us to prove this equivalence in this paper, but it is unclear how the proof would proceed otherwise.
My main criticism is that the mathematical framework used is too strongly tied to flux inversion rather than a broader class of data assim[i]lation methods as promised in the title and abstract…Currently, the section is very tailored towards flux inversion, which I think does the generality of the method a disservice… An example of this is the use of the scaling factor parameterisation c o \mu, commonly used in flux inversion, which must be carried through the derivations but which has essentially nothing to do with the underlying Monte Carlo procedure. I think this section could be substantially simplified and made more general at the same time, and the details of the parameterisation could be deferred to Section 3.
We agree that the original mathematical setup in Section 2 was too narrowly focused on the scaling factor specifics of the carbon flux inversion application. As such, we have reformulated Section 2 and the lowdimensional numerical example in Section 3 to not include the scaling factors. This reformulation has streamlined the mathematical presentation in Section 2 and defers the scaling factor formulation to Section 3, as suggested.
Regarding the applicability of this method to a broader class of data assimilation methods, we have adjusted the title and language throughout to specify that we are analyzing this method for 4DVar data assimilation. Of course, understanding the broader applicability of this method to other data assimilation methods would be interesting, but we believe it is beyond the scope of this technical note.
On the topic of generality, the simplification B = b^2 I_m is introduced here that are not necessary for the mathematical results. Can this be left until section 3?
Following the broader presentation adjustment of Section 2, we have moved this particular simplification to Section 3 for the carbon flux inversion example which uses this simplification.
The situation in which the Monte Carlo method is applicable is the one where the posterior mean can be computed efficiently. In the flux inversion case, this arises due to the equivalence of the posterior mean and mode due to the Gaussian assumption, so that efficient gradient based methods that target the posterior mode can be used to compute the posterior mean (which one would typically compute analytically). The feasibility of this computation is critical to whether it's worth using this method, but as it stands this doesn't come out clearly in the note.
To address the point of method assumptions and potential generalizations, we have added Section 2.2.1 (Considerations for when f is nonlinear and other critical assumptions). This section articulates the efficient computation assumption above in addition to the other Gaussian/linear assumptions and their necessity in our results. Regarding the linearity assumption, we provide two potential strategies to apply this method when the forward model is nonlinear.
For the unconditional simulation the authors choose y_k ~ N(A\mu,R) and, as the authors show, this gives the appropriate covariance to the samples c_MAP^k . However, Chevallier et al. (2007) took the mean of y_k to be y, and as a consequence had E(c_MAP^k) = \alpha. In that case, the distribution of c_MAP^k is precisely equal to the posterior distribution of c given y, which is elegant. In practice, any value can be chosen for the mean of y. There’s no need to change this choice, but the difference to Chevallier and the freedom of this choice should be noted.
Chevallier et al. (2007) indeed choose the mean of the unconditional observation distribution to be the “true” unperturbed observation (i.e., the forward model evaluated at the true state). We have added mention of this original choice in Section 2.2, but also make the point that since the the goal is to sample from a procedure with the correct covariance structure and this choice does not impact the covariance of the MAP estimator, this choice does not affect the validity of the uncertainties.
Since it follows immediately from linearity, it may be worthing stating here that c_MAP^k is Gaussian distributed, rather than deferring this to Section 2.3.
We have moved this point that the MAP estimator is Gaussian to Section 2.2, directly after its definition (Equation 11).
We thank you again for your feedback which helped us improve the manuscript!
Sincerely,
Michael Stanley, Mikael Kuusela, Brendan Byrne and Junjie Liu
Citation: https://doi.org/10.5194/egusphere20232675AC1
 In Section 2.1:

RC2: 'Comment on egusphere20232675', Anonymous Referee #2, 19 Feb 2024
Review on “Technical Note: Posterior Uncertainty Estimation via a Monte Carlo Procedure Specialized for Data Assimilation”
In this technical note, the method used by Chevalier et al., 2007, in which the posterior uncertainty estimate is approximated by an ensemble of 5 perturbed model inversions, is formalized. The article further details the derivation on how the method provides information about an “uncertainty of the uncertainty” estimate. The article demonstrates the applicability to a simple two dimensional model to proof the general concept and to an OSSE in the field of carbon flux estimation.
I congratulate the authors to a well written article. The mathematical formulation appears sound and the description of the method is proper.
My main concern is the strong focus on CO2 flux inversions. The method’s derivation does not require the link to CO2 at all, thus I would appreciate a more general description of the method. E.g., in line 98 ff., the observations are described as X_{CO2} retrievals although any observations should be feasible for the DA application. I suggest to rewrite the section 2 by avoiding the link to CO2 flux inversions. Further, a detailed discussion on the properties of the method is needed. For example, the 4Dvar inversion method is described as it is important for the MC method. From what I can see, any inversion method should be applicable in this context. Further, the requirements of the method remain unclear. Linear models and observation operators are applied in the demonstrator simulations as well as in the derivation of the method. In many field of application (e.g., NWP, air quality), this linear model is not applicable and the use of tangentlinear models is established. Would the method be also applicable in these areas? Finally, I suggest to add a discussion on the ensemble type to which the method can be applied. E.g., does the method also work for multimodel or multiparameter ensembles, which may contain more sources of uncertainty than only the flux scaling factors.
I recommend publication after the addition of some critical discussion on the previous points. Further, see below for some detailed comments.
General comments:
 please add a short description of particle filter methods, that provide uncertainty estimates without the need for linear model or Gaussian error statistic, in the introduction
Specific comments:
 line: 24: put “infeasible” to the end of the sentence
 line 39: change “(Vasco et al., 1993)” to “Vasco et al. (1993)”; please revise the article and change other intext citations accordingly
 line 68: The second statement on the UQ method (i.e., its independence on the prior biases) appears unproofed. Please include a dedicated paragraph to discuss the algorithms properties.
 line 339: This is not clear to me. The derivation of the MC algorithm aims at defining a formula for the posterior uncertainty estimation. Why can’t this estimate be used?
 line 367: The uncertainty obtained with the ensemble approach is an estimate of the posterior uncertainty. Thus, this statement is not clear to me. Why should the uncertainty of the approach be less than the posterior?.
 line 375: please define [x]_i is element I of vector x
 line 382: change [ ]_j to [ ]_i
 table 3: the number of ensemble members compared to the dimension of the problem is unrealistically large. Please provide a discussion on how the size of the ensemble improves the estimate of the MAP and its uncertainty. Specifically, how does the error in your estimate grow when taking only 10100 samples of the prior PDF?
Citation: https://doi.org/10.5194/egusphere20232675RC2 
AC2: 'Reply on RC2', Michael Stanley, 01 Apr 2024
Dear Referee 2,
Thank you for your review and detailed comments. We greatly appreciate your comments on the framing of the method and your suggestion to add more discussion on the method’s properties. Please see below for a pointbypoint response. Your original comments are in bold/italics with our responses immediately following.
My main concern is the strong focus on CO2 flux inversions. The method’s derivation does not require the link to CO2 at all, thus I would appreciate a more general description of the method. E.g., in line 98 ff., the observations are described as X_{CO2} retrievals although any observations should be feasible for the DA application. I suggest to rewrite the section 2 by avoiding the link to CO2 flux inversions.
We agree that the mathematical presentation in Section 2 was too strongly linked to the original carbon flux application. As such, we have reformulated Section 2 to a more general presentation by removing the setup based on scaling factors and references to the variables only in the carbon flux inversion context (e.g., the description of the observation vector). The rewrite has substantially streamlined the mathematical presentation and sharpened our main points.
Further, a detailed discussion on the properties of the method is needed. For example, the 4Dvar inversion method is described as it is important for the MC method. From what I can see, any inversion method should be applicable in this context. Further, the requirements of the method remain unclear. Linear models and observation operators are applied in the demonstrator simulations as well as in the derivation of the method. In many field of application (e.g., NWP, air quality), this linear model is not applicable and the use of tangentlinear models is established. Would the method be also applicable in these areas?
While it is possible that this Monte Carlo method is applicable to other data assimilation methods, we have changed the title and language throughout to specify that this technical note is discussing its applicability to 4DVar data assimilation. However, this broader applicability is interesting and we have added a note about it in the conclusion. Within that context, we agree that the original version of this technical note lacked sufficient discussion around the method assumptions and properties. As such, we have added Section 2.2.1 (Considerations for when f is nonlinear and other critical assumptions) to address these points. In particular, we discuss some options of how this method can still be applicable when the forward model is nonlinear. We additionally included a discussion regarding the covariance equivalence property that must hold for this method to work and how there are potential generalizations that can be made via a similar sampling method from the geostatistics literature.
Finally, I suggest to add a discussion on the ensemble type to which the method can be applied. E.g., does the method also work for multimodel or multiparameter ensembles, which may contain more sources of uncertainty than only the flux scaling factors.
The applicability of this method to other ensemble types to account for other types of uncertainty is a great question. Although we believe it is beyond the scope of this technical note to thoroughly answer that question, we have added a brief note addressing it in the conclusion.
[P]lease add a short description of particle filter methods, that provide uncertainty estimates without the need for linear model or Gaussian error statistic, in the introduction.
Since we have amended this technical note to specifically address the scenario of UQ with 4DVar data assimilation, we have addressed the possibility of relaxing the linear and Gaussian assumptions both by the inclusion of Section 2.2.1, and the connection to the geostatistics literature in Section 2.2. However, since we agree that mentioning particle filtering methods nicely rounds out the landscape of UQ methods for data assimilation, we have added a brief note in the Introduction on their utility in relaxing model and error assumptions.
[L]ine: 24: put “infeasible” to the end of the sentence.
This has been fixed.
[L]ine 39: change “(Vasco et al., 1993)” to “Vasco et al. (1993)”; please revise the article and change other intext citations accordingly.
We have made this change and done a scan of the paper to look for other similar instances.
[L]ine 68: The second statement on the UQ method (i.e., its independence on the prior biases) appears unproofed. Please include a dedicated paragraph to discuss the algorithms properties.
We note in the derivation of Equation 12 that the choice of the prior mean does not affect the covariance of the MAP estimator, which is the primary equivalence required to make the method work correctly. This point is made explicitly in line 152 and is seen by the proof of the covariance equivalence.
[L]ine 339: This is not clear to me. The derivation of the MC algorithm aims at defining a formula for the posterior uncertainty estimation. Why can’t this estimate be used?
The Monte Carlo procedure only provides an estimate of the posterior variance. The true posterior variance is unknown due to the forward model being only implicitly defined via a computational simulator (so it is for example not possible to evaluate Eq. (7) directly).
[L]ine 367: The uncertainty obtained with the ensemble approach is an estimate of the posterior uncertainty. Thus, this statement is not clear to me. Why should the uncertainty of the approach be less than the posterior?
Since there is uncertainty in our estimate of the posterior uncertainty from the Monte Carlo procedure, we ideally want to sample enough ensemble elements such that the uncertainty from the Monte Carlo procedure is sufficiently smaller than the actual posterior uncertainty. Otherwise, it is impossible to distinguish if the length of the resulting interval is primarily coming from the real posterior variance or the Monte Carlo uncertainty.
line 375: please define [x]_i is element I of vector x/line 382: change [ ]_j to [ ]_i
These changes have been made.
Table 3: the number of ensemble members compared to the dimension of the problem is unrealistically large. Please provide a discussion on how the size of the ensemble improves the estimate of the MAP and its uncertainty. Specifically, how does the error in your estimate grow when taking only 10100 samples of the prior PDF?
We have addressed this in two ways. One, we have updated the experiment to have several orders of magnitude fewer samples to reduce the difference between the parameter dimension and the ensemble size. Two, we have included a new figure (Figure 1) showing how the discrepancy between the estimated covariance matrix and the true covariance matrix diminishes as a function of ensemble size.
We thank you again for your feedback which helped us improve the manuscript!
Sincerely,
Michael Stanley, Mikael Kuusela, Brendan Byrne and Junjie Liu
Citation: https://doi.org/10.5194/egusphere20232675AC2

AC2: 'Reply on RC2', Michael Stanley, 01 Apr 2024
Interactive discussion
Status: closed

RC1: 'Comment on egusphere20232675', Anonymous Referee #1, 25 Jan 2024
In this technical note the authors present an analysis of a Monte Carlo method proposed by Chevallier et al. (2007) for approximating posterior variances of functionals of trace gas flux fields. The method involves drawing unconditional samples from both the observation distribution and the prior flux distribution, and substituting those in to the optimal flux estimation procedure in order to generate samples of the flux. They provide an analysis of the method, showing that these samples have the same covariance as the true posterior variance.
The method has been around for a long time without being formally checked and it is good that the authors have analysed its properties. The analysis in the note has been done well and thoroughly. What the authors may not know is that the method, or something very similar to it, has already been used in the geostatistics field as a strategy for turning unconditional random field simulations into conditional simulations; see Chapter 2, Section 3.6.2 of Cressie (1993), or Chapter 7, Section 7.3.1 of Chiles and Delfiner (2012), who attributed the method to Matheron. In the geostatistics setting, one has a random field Y(.) observed at locations s_1, ..., s_n, and conditional simulations are desired for one or more unobserved locations. For illustration, consider a single unknown location s*. In the method, an unconditional simulation is made of (Y_NS(s_1), ..., Y_NS(s_n))', where Y_NS(.) is another spatial field with identical covariance to Y(.) but independent of it. Using this simulation, the kriging prediction of Y_NS(s*) is calculated; this can be shown to have the same variance as the kriging prediction of Y(s*). When Y(.) is Gaussian, it can be shown that the predictor, being linear, is also Gaussian. More details can be found in the references.
To map this to the data assimilation case and also to the notation of the paper: a prediction is sought of c from the observations y, which are related through E(y) = A_\mu c. The joint distribution in question is z = (y',c')', where c is unobserved, and due to the Gaussian assumption for y and c the optimal (MAP) prediction of c given y is equivalent to the kriging prediction. Unlike the original method due to Matheron, in both Chevallier et al. (2007) and the present note the joint distribution over the unconditional simulations, (y_k', c_k')', is not exactly the same as (y',c')'. However, the method is essentially the same: unconditional simulations y_k and c_k with appropriate covariances are substituted into the kriging predictor for c (which is computed using gradientbased methods rather than analytically).
I think it is very important that the authors draw the connections to this prior work, even if it was rediscovered independently by Chevallier et al. (2007). This may also point towards generalisations of the method, because the original method was not constructed under a Gaussian assumption but it can still be used to estimate prediction covariances.
Beyond this major point, I think the authors have done a good job in documenting the method and in giving practical guidance for its use. My main criticism is that the mathematical framework used is too strongly tied to flux inversion rather than a broader class of data assimulation methods as promised in the title and abstract. I discuss this further in my detailed comments below. Similarly, the circumstances under which the method are useful are not really spelled out explicitly; this would also increase the applicability of the method and the note.
Detailed comments:
 In Section 2.1:
 As I wrote above, I think here it is not as clear here as it could be that the procedure is not limited to flux inversion, but can work for any system for which (i) the mean of the observations are a linear function of the unknown parameters, (ii) the prior and observation distributions are Gaussian, and (iii) the posterior mode/posterior mean can be computed efficiently. Currently, the section is very tailored towards flux inversion, which I think does the generality of the method a disservice. An example of this is the use of the scaling factor parameterisation c o \mu, commonly used in flux inversion, which must be carried through the derivations but which has essentially nothing to do with the underlying Monte Carlo procedure. I think this section could be substantially simplified and made more general at the same time, and the details of the parameterisation could be deferred to Section 3.
 On the topic of generality, the simplification B = b^2 I_m is introduced here that are not necessary for the mathematical results. Can this be left until section 3?
 The situation in which the Monte Carlo method is applicable is the one where the posterior mean can be computed efficiently. In the flux inversion case, this arises due to the equivalence of the posterior mean and mode due to the Gaussian assumption, so that efficient gradient based methods that target the posterior mode can be used to compute the posterior mean (which one would typically compute analytically). The feasibility of this computation is critical to whether it's worth using this method, but as it stands this doesn't come out clearly in the note.
 In Section 2.2:
 For the unconditional simulation the authors choose y_k ~ N(A\mu,R) and, as the authors show, this gives the appropriate covariance to the samples c_MAP^k . However, Chevallier et al. (2007) took the mean of y_k to be y, and as a consequence had E(c_MAP^k) = \alpha. In that case, the distribution of c_MAP^k is precisely equal to the posterior distribution of c given y, which is elegant. In practice, any value can be chosen for the mean of y. There’s no need to change this choice, but the difference to Chevallier and the freedom of this choice should be noted.
 Since it follows immediately from linearity, it may be worthing stating here that c_MAP^k is Gaussian distributed, rather than deferring this to Section 2.3.
References
Chevallier, F., Breon, F.M., and Rayner, P. J. (2007). Contribution of the Orbiting Carbon Observatory to the estimation of CO2 sources and sinks: Theoretical study in a variational data assimilation framework. Journal of Geophysical Research: Atmospheres, 112(D9).
Chiles, J.P. and Delfiner, P. (2012). Geostatistics: Modeling Spatial Uncertainty, 2nd edition. John Wiley & Sons, New York, NY.
Cressie, N. (1993). Statistics for Spatial Data, rev. edn. John Wiley & Sons, New York, NY.Citation: https://doi.org/10.5194/egusphere20232675RC1 
AC1: 'Reply on RC1', Michael Stanley, 01 Apr 2024
Dear Referee 1,
Thank you for your review and detailed comments. We found the spatial statistics connection particularly illuminating. Please see below for a pointbypoint response. Your original comments are in bold/italics with our responses immediately following.
What the authors may not know is that the method, or something very similar to it, has already been used in the geostatistics field as a strategy for turning unconditional random field simulations into conditional simulations… This may also point towards generalisations of the method, because the original method was not constructed under a Gaussian assumption but it can still be used to estimate prediction covariances.
Again, thank you for making this connection! This sampling method appears to be very similar to the one we discuss. In particular, both methods rely upon a covariance equivalence between the distribution of the conditional distribution of interest and the simulator to which one has access. As such, we have added your provided geostatistics references in Section 2.2, after articulating the necessity of the covariance equivalence. We also added a note in the Conclusion to articulate your point about this connection providing possible generalization beyond the Gaussian/linear assumptions. This is a good point, but we add that demonstrating the covariance equivalence would be the key to making this connection work in general. The Gaussian/linear assumptions allow us to prove this equivalence in this paper, but it is unclear how the proof would proceed otherwise.
My main criticism is that the mathematical framework used is too strongly tied to flux inversion rather than a broader class of data assim[i]lation methods as promised in the title and abstract…Currently, the section is very tailored towards flux inversion, which I think does the generality of the method a disservice… An example of this is the use of the scaling factor parameterisation c o \mu, commonly used in flux inversion, which must be carried through the derivations but which has essentially nothing to do with the underlying Monte Carlo procedure. I think this section could be substantially simplified and made more general at the same time, and the details of the parameterisation could be deferred to Section 3.
We agree that the original mathematical setup in Section 2 was too narrowly focused on the scaling factor specifics of the carbon flux inversion application. As such, we have reformulated Section 2 and the lowdimensional numerical example in Section 3 to not include the scaling factors. This reformulation has streamlined the mathematical presentation in Section 2 and defers the scaling factor formulation to Section 3, as suggested.
Regarding the applicability of this method to a broader class of data assimilation methods, we have adjusted the title and language throughout to specify that we are analyzing this method for 4DVar data assimilation. Of course, understanding the broader applicability of this method to other data assimilation methods would be interesting, but we believe it is beyond the scope of this technical note.
On the topic of generality, the simplification B = b^2 I_m is introduced here that are not necessary for the mathematical results. Can this be left until section 3?
Following the broader presentation adjustment of Section 2, we have moved this particular simplification to Section 3 for the carbon flux inversion example which uses this simplification.
The situation in which the Monte Carlo method is applicable is the one where the posterior mean can be computed efficiently. In the flux inversion case, this arises due to the equivalence of the posterior mean and mode due to the Gaussian assumption, so that efficient gradient based methods that target the posterior mode can be used to compute the posterior mean (which one would typically compute analytically). The feasibility of this computation is critical to whether it's worth using this method, but as it stands this doesn't come out clearly in the note.
To address the point of method assumptions and potential generalizations, we have added Section 2.2.1 (Considerations for when f is nonlinear and other critical assumptions). This section articulates the efficient computation assumption above in addition to the other Gaussian/linear assumptions and their necessity in our results. Regarding the linearity assumption, we provide two potential strategies to apply this method when the forward model is nonlinear.
For the unconditional simulation the authors choose y_k ~ N(A\mu,R) and, as the authors show, this gives the appropriate covariance to the samples c_MAP^k . However, Chevallier et al. (2007) took the mean of y_k to be y, and as a consequence had E(c_MAP^k) = \alpha. In that case, the distribution of c_MAP^k is precisely equal to the posterior distribution of c given y, which is elegant. In practice, any value can be chosen for the mean of y. There’s no need to change this choice, but the difference to Chevallier and the freedom of this choice should be noted.
Chevallier et al. (2007) indeed choose the mean of the unconditional observation distribution to be the “true” unperturbed observation (i.e., the forward model evaluated at the true state). We have added mention of this original choice in Section 2.2, but also make the point that since the the goal is to sample from a procedure with the correct covariance structure and this choice does not impact the covariance of the MAP estimator, this choice does not affect the validity of the uncertainties.
Since it follows immediately from linearity, it may be worthing stating here that c_MAP^k is Gaussian distributed, rather than deferring this to Section 2.3.
We have moved this point that the MAP estimator is Gaussian to Section 2.2, directly after its definition (Equation 11).
We thank you again for your feedback which helped us improve the manuscript!
Sincerely,
Michael Stanley, Mikael Kuusela, Brendan Byrne and Junjie Liu
Citation: https://doi.org/10.5194/egusphere20232675AC1
 In Section 2.1:

RC2: 'Comment on egusphere20232675', Anonymous Referee #2, 19 Feb 2024
Review on “Technical Note: Posterior Uncertainty Estimation via a Monte Carlo Procedure Specialized for Data Assimilation”
In this technical note, the method used by Chevalier et al., 2007, in which the posterior uncertainty estimate is approximated by an ensemble of 5 perturbed model inversions, is formalized. The article further details the derivation on how the method provides information about an “uncertainty of the uncertainty” estimate. The article demonstrates the applicability to a simple two dimensional model to proof the general concept and to an OSSE in the field of carbon flux estimation.
I congratulate the authors to a well written article. The mathematical formulation appears sound and the description of the method is proper.
My main concern is the strong focus on CO2 flux inversions. The method’s derivation does not require the link to CO2 at all, thus I would appreciate a more general description of the method. E.g., in line 98 ff., the observations are described as X_{CO2} retrievals although any observations should be feasible for the DA application. I suggest to rewrite the section 2 by avoiding the link to CO2 flux inversions. Further, a detailed discussion on the properties of the method is needed. For example, the 4Dvar inversion method is described as it is important for the MC method. From what I can see, any inversion method should be applicable in this context. Further, the requirements of the method remain unclear. Linear models and observation operators are applied in the demonstrator simulations as well as in the derivation of the method. In many field of application (e.g., NWP, air quality), this linear model is not applicable and the use of tangentlinear models is established. Would the method be also applicable in these areas? Finally, I suggest to add a discussion on the ensemble type to which the method can be applied. E.g., does the method also work for multimodel or multiparameter ensembles, which may contain more sources of uncertainty than only the flux scaling factors.
I recommend publication after the addition of some critical discussion on the previous points. Further, see below for some detailed comments.
General comments:
 please add a short description of particle filter methods, that provide uncertainty estimates without the need for linear model or Gaussian error statistic, in the introduction
Specific comments:
 line: 24: put “infeasible” to the end of the sentence
 line 39: change “(Vasco et al., 1993)” to “Vasco et al. (1993)”; please revise the article and change other intext citations accordingly
 line 68: The second statement on the UQ method (i.e., its independence on the prior biases) appears unproofed. Please include a dedicated paragraph to discuss the algorithms properties.
 line 339: This is not clear to me. The derivation of the MC algorithm aims at defining a formula for the posterior uncertainty estimation. Why can’t this estimate be used?
 line 367: The uncertainty obtained with the ensemble approach is an estimate of the posterior uncertainty. Thus, this statement is not clear to me. Why should the uncertainty of the approach be less than the posterior?.
 line 375: please define [x]_i is element I of vector x
 line 382: change [ ]_j to [ ]_i
 table 3: the number of ensemble members compared to the dimension of the problem is unrealistically large. Please provide a discussion on how the size of the ensemble improves the estimate of the MAP and its uncertainty. Specifically, how does the error in your estimate grow when taking only 10100 samples of the prior PDF?
Citation: https://doi.org/10.5194/egusphere20232675RC2 
AC2: 'Reply on RC2', Michael Stanley, 01 Apr 2024
Dear Referee 2,
Thank you for your review and detailed comments. We greatly appreciate your comments on the framing of the method and your suggestion to add more discussion on the method’s properties. Please see below for a pointbypoint response. Your original comments are in bold/italics with our responses immediately following.
My main concern is the strong focus on CO2 flux inversions. The method’s derivation does not require the link to CO2 at all, thus I would appreciate a more general description of the method. E.g., in line 98 ff., the observations are described as X_{CO2} retrievals although any observations should be feasible for the DA application. I suggest to rewrite the section 2 by avoiding the link to CO2 flux inversions.
We agree that the mathematical presentation in Section 2 was too strongly linked to the original carbon flux application. As such, we have reformulated Section 2 to a more general presentation by removing the setup based on scaling factors and references to the variables only in the carbon flux inversion context (e.g., the description of the observation vector). The rewrite has substantially streamlined the mathematical presentation and sharpened our main points.
Further, a detailed discussion on the properties of the method is needed. For example, the 4Dvar inversion method is described as it is important for the MC method. From what I can see, any inversion method should be applicable in this context. Further, the requirements of the method remain unclear. Linear models and observation operators are applied in the demonstrator simulations as well as in the derivation of the method. In many field of application (e.g., NWP, air quality), this linear model is not applicable and the use of tangentlinear models is established. Would the method be also applicable in these areas?
While it is possible that this Monte Carlo method is applicable to other data assimilation methods, we have changed the title and language throughout to specify that this technical note is discussing its applicability to 4DVar data assimilation. However, this broader applicability is interesting and we have added a note about it in the conclusion. Within that context, we agree that the original version of this technical note lacked sufficient discussion around the method assumptions and properties. As such, we have added Section 2.2.1 (Considerations for when f is nonlinear and other critical assumptions) to address these points. In particular, we discuss some options of how this method can still be applicable when the forward model is nonlinear. We additionally included a discussion regarding the covariance equivalence property that must hold for this method to work and how there are potential generalizations that can be made via a similar sampling method from the geostatistics literature.
Finally, I suggest to add a discussion on the ensemble type to which the method can be applied. E.g., does the method also work for multimodel or multiparameter ensembles, which may contain more sources of uncertainty than only the flux scaling factors.
The applicability of this method to other ensemble types to account for other types of uncertainty is a great question. Although we believe it is beyond the scope of this technical note to thoroughly answer that question, we have added a brief note addressing it in the conclusion.
[P]lease add a short description of particle filter methods, that provide uncertainty estimates without the need for linear model or Gaussian error statistic, in the introduction.
Since we have amended this technical note to specifically address the scenario of UQ with 4DVar data assimilation, we have addressed the possibility of relaxing the linear and Gaussian assumptions both by the inclusion of Section 2.2.1, and the connection to the geostatistics literature in Section 2.2. However, since we agree that mentioning particle filtering methods nicely rounds out the landscape of UQ methods for data assimilation, we have added a brief note in the Introduction on their utility in relaxing model and error assumptions.
[L]ine: 24: put “infeasible” to the end of the sentence.
This has been fixed.
[L]ine 39: change “(Vasco et al., 1993)” to “Vasco et al. (1993)”; please revise the article and change other intext citations accordingly.
We have made this change and done a scan of the paper to look for other similar instances.
[L]ine 68: The second statement on the UQ method (i.e., its independence on the prior biases) appears unproofed. Please include a dedicated paragraph to discuss the algorithms properties.
We note in the derivation of Equation 12 that the choice of the prior mean does not affect the covariance of the MAP estimator, which is the primary equivalence required to make the method work correctly. This point is made explicitly in line 152 and is seen by the proof of the covariance equivalence.
[L]ine 339: This is not clear to me. The derivation of the MC algorithm aims at defining a formula for the posterior uncertainty estimation. Why can’t this estimate be used?
The Monte Carlo procedure only provides an estimate of the posterior variance. The true posterior variance is unknown due to the forward model being only implicitly defined via a computational simulator (so it is for example not possible to evaluate Eq. (7) directly).
[L]ine 367: The uncertainty obtained with the ensemble approach is an estimate of the posterior uncertainty. Thus, this statement is not clear to me. Why should the uncertainty of the approach be less than the posterior?
Since there is uncertainty in our estimate of the posterior uncertainty from the Monte Carlo procedure, we ideally want to sample enough ensemble elements such that the uncertainty from the Monte Carlo procedure is sufficiently smaller than the actual posterior uncertainty. Otherwise, it is impossible to distinguish if the length of the resulting interval is primarily coming from the real posterior variance or the Monte Carlo uncertainty.
line 375: please define [x]_i is element I of vector x/line 382: change [ ]_j to [ ]_i
These changes have been made.
Table 3: the number of ensemble members compared to the dimension of the problem is unrealistically large. Please provide a discussion on how the size of the ensemble improves the estimate of the MAP and its uncertainty. Specifically, how does the error in your estimate grow when taking only 10100 samples of the prior PDF?
We have addressed this in two ways. One, we have updated the experiment to have several orders of magnitude fewer samples to reduce the difference between the parameter dimension and the ensemble size. Two, we have included a new figure (Figure 1) showing how the discrepancy between the estimated covariance matrix and the true covariance matrix diminishes as a function of ensemble size.
We thank you again for your feedback which helped us improve the manuscript!
Sincerely,
Michael Stanley, Mikael Kuusela, Brendan Byrne and Junjie Liu
Citation: https://doi.org/10.5194/egusphere20232675AC2

AC2: 'Reply on RC2', Michael Stanley, 01 Apr 2024
Peer review completion
Journal article(s) based on this preprint
Model code and software
GEOSChem Adjoint D. Henze et al. http://wiki.seas.harvard.edu/geoschem/index.php/GEOSChem_Adjoint
Viewed
HTML  XML  Total  BibTeX  EndNote  

314  84  37  435  25  21 
 HTML: 314
 PDF: 84
 XML: 37
 Total: 435
 BibTeX: 25
 EndNote: 21
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1
Mikael Kuusela
Brendan Byrne
Junjie Liu
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(2603 KB)  Metadata XML