the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Extending Ensemble Kalman Filter Algorithms to Assimilate Observations with an Unknown Time Offset
Elia Gorokhovsky
Jeffrey L. Anderson
Abstract. Data assimilation (DA), the statistical combination of computer models with measurements, is applied in a variety of scientific fields involving forecasting of dynamical systems, most prominently in atmospheric and ocean sciences. The existence of misreported or unknown observation times (time error) poses a unique and interesting problem for DA. Mapping observations to incorrect times causes bias in the prior state and affects assimilation. Algorithms that can improve the performance of ensemble Kalman filter DA in the presence of observing time error are described. Algorithms that can estimate the distribution of time error are also developed. These algorithms are then combined to produce extensions to ensemble Kalman filters that can both estimate and correct for observation time errors. A loworder dynamical system is used to evaluate the performance of these methods for a range of magnitudes of observation time error. The most successful algorithms must explicitly account for the nonlinearity in the evolution of the prediction model.

Notice on discussion status
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint
(755 KB)

The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(755 KB)  BibTeX
 EndNote
 Final revised paper
Journal article(s) based on this preprint
Elia Gorokhovsky and Jeffrey L. Anderson
Interactive discussion
Status: closed

RC1: 'Comment on egusphere2022536', Anonymous Referee #1, 16 Aug 2022
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2022/egusphere2022536/egusphere2022536RC1supplement.pdf

RC2: 'Comment on egusphere2022536', Anonymous Referee #2, 20 Aug 2022
This manuscript explores data assimilation in the presence of observation time errors (such errors are present in handwritten data used in historical reanalyses). The authors tackle an interesting question and present intriguing solutions. The presentation is at times unclear and can be improved. I suggest that the authors make the revisions suggested below and resubmit.
Major comments:
1. This paper makes numerous assumptions which are not clearly stated. Please go through the manuscript and clearly state all assumptions. Some examples are described below (and also in reviewer #1’s comments).
2. The notation in this manuscript is at times inconsistent and not all notation is defined. Again, some specifics are given below, but I have not documented all cases.
3. This one is a bit more openended. In this manuscript you incorporate time errors through a correction to the ensemble mean and variance of the prior estimate of the observations. Could you instead update your ensemble members directly (rather than their summary statistics)? For example, instead of using y_n = h(x_n(t_i)) could you use y_n = h(x_n(t_i + e_i,n)) where e_i,n is drawn from a distribution of time errors? How would this compare to your current method in terms of computational cost and ease of implementation?Minor comments:
— L 75: Why is it helpful to think of t^a_k as being equal to t_k * P? I agree this is true (assuming t_0=t^a_0=0, which is a reasonable), but the statement is confusing to me.
— L 77: “where linear interpolation is used to compute \chi between the discrete times”. I would rephrase this. You may use linear interpolation to compute \chi at nonintegermultiples of the time step, but this an approximation (unless the dynamics are linear). Something like the following would be more correct: “The kth observation is y_k \sim N(\chi(t_k^o), R). In general \chi need not be linear and since \chi is modeled only at discrete time steps we do not necessarily know \chi exactly at any noninteger multiple of the time step. In this study we make the assumption that the time steps are small enough so that the dynamics are approximately linear between two adjacent time steps. Note that without this assumption the performance of an ensemble Kalman filter may not be very good anyway. In practice, we use linear interpolation to compute \chi between the discrete times {t_i}.”
— L 77: I would explicitly state that your forward operator/observation operator is the identity. (Also noted by reviewer #1).
— L 81: The notation for time offset here is different from the notation in L76. I suggest making the notation consistent.
— L 83: I suggest defining “ensemble prior estimate of observations” sooner since it may not be clear to all readers. I see that you define it in equation (2).
— L 110: Is \tau known, or is the distribution of \tau known?
— L 120: Please explain the equation for the distance.
— Eq. (4): Define the variables used in this equation. I think it is worth writing out what you are doing here in a few steps (you can condense some of the algebra below if you need more space). As I see it, you are looking for the conditional probability that \epsilon^t = \tau given that the difference (which is a random variable itself, call it D) is equal to d. From conditional probability this is proportional to the probability that \epsilon^t = \tau and D=d, or equivalently, \epsilon^x = d  \tau \nu. What you have here is somewhat confusing because \epsilon^t and \epsilon^x are independent, but conditioned on D=d they are not at all independent.
— Eq. (5): Reminder the reader that you are assuming a normal distribution.
— L 129: If I understand correctly you introduce this term so that you can complete the square and simplify the expression. The word “absorbed” is used in different contexts here and in L 126. Consider using a different word here.
— L 141: This assumption is tricky with the time offset, but perhaps it is okay with your assumption of linearity between two adjacent time steps?
— L 146: Observation error generally means the difference between the observation and the truth, but that is not how I understand \epsilon^p. Please explain.
— Eq. (12): See comment about Eq. (4)
— Eq. (15): I don’t follow this equation. Please explain and be specific about your notation.
— L 171: Why do you make the choice to use a smaller timestep?
— L 175: Is 100 time steps enough to reach a statistically steady state?
— Eq. (17): “.” Is used instead of “,”
— Fig. 1: Check colors for consistency with text. Also check that they are colorblindfriendly.Citation: https://doi.org/10.5194/egusphere2022536RC2  AC1: 'Comment on egusphere2022536', Elia Gorokhovsky, 19 Oct 2022
Interactive discussion
Status: closed

RC1: 'Comment on egusphere2022536', Anonymous Referee #1, 16 Aug 2022
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2022/egusphere2022536/egusphere2022536RC1supplement.pdf

RC2: 'Comment on egusphere2022536', Anonymous Referee #2, 20 Aug 2022
This manuscript explores data assimilation in the presence of observation time errors (such errors are present in handwritten data used in historical reanalyses). The authors tackle an interesting question and present intriguing solutions. The presentation is at times unclear and can be improved. I suggest that the authors make the revisions suggested below and resubmit.
Major comments:
1. This paper makes numerous assumptions which are not clearly stated. Please go through the manuscript and clearly state all assumptions. Some examples are described below (and also in reviewer #1’s comments).
2. The notation in this manuscript is at times inconsistent and not all notation is defined. Again, some specifics are given below, but I have not documented all cases.
3. This one is a bit more openended. In this manuscript you incorporate time errors through a correction to the ensemble mean and variance of the prior estimate of the observations. Could you instead update your ensemble members directly (rather than their summary statistics)? For example, instead of using y_n = h(x_n(t_i)) could you use y_n = h(x_n(t_i + e_i,n)) where e_i,n is drawn from a distribution of time errors? How would this compare to your current method in terms of computational cost and ease of implementation?Minor comments:
— L 75: Why is it helpful to think of t^a_k as being equal to t_k * P? I agree this is true (assuming t_0=t^a_0=0, which is a reasonable), but the statement is confusing to me.
— L 77: “where linear interpolation is used to compute \chi between the discrete times”. I would rephrase this. You may use linear interpolation to compute \chi at nonintegermultiples of the time step, but this an approximation (unless the dynamics are linear). Something like the following would be more correct: “The kth observation is y_k \sim N(\chi(t_k^o), R). In general \chi need not be linear and since \chi is modeled only at discrete time steps we do not necessarily know \chi exactly at any noninteger multiple of the time step. In this study we make the assumption that the time steps are small enough so that the dynamics are approximately linear between two adjacent time steps. Note that without this assumption the performance of an ensemble Kalman filter may not be very good anyway. In practice, we use linear interpolation to compute \chi between the discrete times {t_i}.”
— L 77: I would explicitly state that your forward operator/observation operator is the identity. (Also noted by reviewer #1).
— L 81: The notation for time offset here is different from the notation in L76. I suggest making the notation consistent.
— L 83: I suggest defining “ensemble prior estimate of observations” sooner since it may not be clear to all readers. I see that you define it in equation (2).
— L 110: Is \tau known, or is the distribution of \tau known?
— L 120: Please explain the equation for the distance.
— Eq. (4): Define the variables used in this equation. I think it is worth writing out what you are doing here in a few steps (you can condense some of the algebra below if you need more space). As I see it, you are looking for the conditional probability that \epsilon^t = \tau given that the difference (which is a random variable itself, call it D) is equal to d. From conditional probability this is proportional to the probability that \epsilon^t = \tau and D=d, or equivalently, \epsilon^x = d  \tau \nu. What you have here is somewhat confusing because \epsilon^t and \epsilon^x are independent, but conditioned on D=d they are not at all independent.
— Eq. (5): Reminder the reader that you are assuming a normal distribution.
— L 129: If I understand correctly you introduce this term so that you can complete the square and simplify the expression. The word “absorbed” is used in different contexts here and in L 126. Consider using a different word here.
— L 141: This assumption is tricky with the time offset, but perhaps it is okay with your assumption of linearity between two adjacent time steps?
— L 146: Observation error generally means the difference between the observation and the truth, but that is not how I understand \epsilon^p. Please explain.
— Eq. (12): See comment about Eq. (4)
— Eq. (15): I don’t follow this equation. Please explain and be specific about your notation.
— L 171: Why do you make the choice to use a smaller timestep?
— L 175: Is 100 time steps enough to reach a statistically steady state?
— Eq. (17): “.” Is used instead of “,”
— Fig. 1: Check colors for consistency with text. Also check that they are colorblindfriendly.Citation: https://doi.org/10.5194/egusphere2022536RC2  AC1: 'Comment on egusphere2022536', Elia Gorokhovsky, 19 Oct 2022
Peer review completion
Journal article(s) based on this preprint
Elia Gorokhovsky and Jeffrey L. Anderson
Elia Gorokhovsky and Jeffrey L. Anderson
Viewed
HTML  XML  Total  BibTeX  EndNote  

274  85  13  372  6  6 
 HTML: 274
 PDF: 85
 XML: 13
 Total: 372
 BibTeX: 6
 EndNote: 6
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1
The requested preprint has a corresponding peerreviewed final revised paper. You are encouraged to refer to the final revised version.
 Preprint
(755 KB)  Metadata XML