Extending Ensemble Kalman Filter Algorithms to Assimilate Observations with an Unknown Time Offset

Gorokhovsky, Elia; Anderson, Jeffrey L.

doi:https://doi.org/10.5194/egusphere-2022-536

Preprints

https://doi.org/10.5194/egusphere-2022-536

Preprints

12 Jul 2022

| 12 Jul 2022

Extending Ensemble Kalman Filter Algorithms to Assimilate Observations with an Unknown Time Offset

Elia Gorokhovsky and Jeffrey L. Anderson

Abstract. Data assimilation (DA), the statistical combination of computer models with measurements, is applied in a variety of scientific fields involving forecasting of dynamical systems, most prominently in atmospheric and ocean sciences. The existence of misreported or unknown observation times (time error) poses a unique and interesting problem for DA. Mapping observations to incorrect times causes bias in the prior state and affects assimilation. Algorithms that can improve the performance of ensemble Kalman filter DA in the presence of observing time error are described. Algorithms that can estimate the distribution of time error are also developed. These algorithms are then combined to produce extensions to ensemble Kalman filters that can both estimate and correct for observation time errors. A low-order dynamical system is used to evaluate the performance of these methods for a range of magnitudes of observation time error. The most successful algorithms must explicitly account for the nonlinearity in the evolution of the prediction model.

Received: 23 Jun 2022 – Discussion started: 12 Jul 2022

Download & links

Preprint (PDF, 755 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (755 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

07 Feb 2023

Extending ensemble Kalman filter algorithms to assimilate observations with an unknown time offset

Elia Gorokhovsky and Jeffrey L. Anderson

Nonlin. Processes Geophys., 30, 37–47, https://doi.org/10.5194/npg-30-37-2023,https://doi.org/10.5194/npg-30-37-2023, 2023

Short summary

Elia Gorokhovsky and Jeffrey L. Anderson

Interactive discussion

Status: closed

RC1: 'Comment on egusphere-2022-536', Anonymous Referee #1, 16 Aug 2022

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2022/egusphere-2022-536/egusphere-2022-536-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2022-536-RC1
RC2: 'Comment on egusphere-2022-536', Anonymous Referee #2, 20 Aug 2022

This manuscript explores data assimilation in the presence of observation time errors (such errors are present in handwritten data used in historical reanalyses). The authors tackle an interesting question and present intriguing solutions. The presentation is at times unclear and can be improved. I suggest that the authors make the revisions suggested below and resubmit.

Major comments:

1. This paper makes numerous assumptions which are not clearly stated. Please go through the manuscript and clearly state all assumptions. Some examples are described below (and also in reviewer #1’s comments).

2. The notation in this manuscript is at times inconsistent and not all notation is defined. Again, some specifics are given below, but I have not documented all cases.

3. This one is a bit more open-ended. In this manuscript you incorporate time errors through a correction to the ensemble mean and variance of the prior estimate of the observations. Could you instead update your ensemble members directly (rather than their summary statistics)? For example, instead of using y_n = h(x_n(t_i)) could you use y_n = h(x_n(t_i + e_i,n)) where e_i,n is drawn from a distribution of time errors? How would this compare to your current method in terms of computational cost and ease of implementation?

Minor comments:

— L 75: Why is it helpful to think of t^a_k as being equal to t_k * P? I agree this is true (assuming t_0=t^a_0=0, which is a reasonable), but the statement is confusing to me.

— L 77: “where linear interpolation is used to compute \chi between the discrete times”. I would rephrase this. You may use linear interpolation to compute \chi at non-integer-multiples of the time step, but this an approximation (unless the dynamics are linear). Something like the following would be more correct: “The kth observation is y_k \sim N(\chi(t_k^o), R). In general \chi need not be linear and since \chi is modeled only at discrete time steps we do not necessarily know \chi exactly at any non-integer multiple of the time step. In this study we make the assumption that the time steps are small enough so that the dynamics are approximately linear between two adjacent time steps. Note that without this assumption the performance of an ensemble Kalman filter may not be very good anyway. In practice, we use linear interpolation to compute \chi between the discrete times {t_i}.”

— L 77: I would explicitly state that your forward operator/observation operator is the identity. (Also noted by reviewer #1).

— L 81: The notation for time offset here is different from the notation in L76. I suggest making the notation consistent.

— L 83: I suggest defining “ensemble prior estimate of observations” sooner since it may not be clear to all readers. I see that you define it in equation (2).

— L 110: Is \tau known, or is the distribution of \tau known?

— L 120: Please explain the equation for the distance.

— Eq. (4): Define the variables used in this equation. I think it is worth writing out what you are doing here in a few steps (you can condense some of the algebra below if you need more space). As I see it, you are looking for the conditional probability that \epsilon^t = \tau given that the difference (which is a random variable itself, call it D) is equal to d. From conditional probability this is proportional to the probability that \epsilon^t = \tau and D=d, or equivalently, \epsilon^x = d - \tau \nu. What you have here is somewhat confusing because \epsilon^t and \epsilon^x are independent, but conditioned on D=d they are not at all independent.

— Eq. (5): Reminder the reader that you are assuming a normal distribution.

— L 129: If I understand correctly you introduce this term so that you can complete the square and simplify the expression. The word “absorbed” is used in different contexts here and in L 126. Consider using a different word here.

— L 141: This assumption is tricky with the time offset, but perhaps it is okay with your assumption of linearity between two adjacent time steps?

— L 146: Observation error generally means the difference between the observation and the truth, but that is not how I understand \epsilon^p. Please explain.

— Eq. (12): See comment about Eq. (4)

— Eq. (15): I don’t follow this equation. Please explain and be specific about your notation.

— L 171: Why do you make the choice to use a smaller timestep?

— L 175: Is 100 time steps enough to reach a statistically steady state?

— Eq. (17): “.” Is used instead of “,”

— Fig. 1: Check colors for consistency with text. Also check that they are colorblind-friendly.

Citation: https://doi.org/10.5194/egusphere-2022-536-RC2
AC1: 'Comment on egusphere-2022-536', Elia Gorokhovsky, 19 Oct 2022

We thank the referees for their thorough feedback on our manuscript. The attached document contains our responses to the comments made by both referees, including descriptions of changes to the manuscript in response to these comments.

Citation: https://doi.org/10.5194/egusphere-2022-536-AC1

Interactive discussion

Status: closed

RC1: 'Comment on egusphere-2022-536', Anonymous Referee #1, 16 Aug 2022

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2022/egusphere-2022-536/egusphere-2022-536-RC1-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2022-536-RC1
RC2: 'Comment on egusphere-2022-536', Anonymous Referee #2, 20 Aug 2022

This manuscript explores data assimilation in the presence of observation time errors (such errors are present in handwritten data used in historical reanalyses). The authors tackle an interesting question and present intriguing solutions. The presentation is at times unclear and can be improved. I suggest that the authors make the revisions suggested below and resubmit.

Major comments:

1. This paper makes numerous assumptions which are not clearly stated. Please go through the manuscript and clearly state all assumptions. Some examples are described below (and also in reviewer #1’s comments).

2. The notation in this manuscript is at times inconsistent and not all notation is defined. Again, some specifics are given below, but I have not documented all cases.

3. This one is a bit more open-ended. In this manuscript you incorporate time errors through a correction to the ensemble mean and variance of the prior estimate of the observations. Could you instead update your ensemble members directly (rather than their summary statistics)? For example, instead of using y_n = h(x_n(t_i)) could you use y_n = h(x_n(t_i + e_i,n)) where e_i,n is drawn from a distribution of time errors? How would this compare to your current method in terms of computational cost and ease of implementation?

Minor comments:

— L 75: Why is it helpful to think of t^a_k as being equal to t_k * P? I agree this is true (assuming t_0=t^a_0=0, which is a reasonable), but the statement is confusing to me.

— L 77: “where linear interpolation is used to compute \chi between the discrete times”. I would rephrase this. You may use linear interpolation to compute \chi at non-integer-multiples of the time step, but this an approximation (unless the dynamics are linear). Something like the following would be more correct: “The kth observation is y_k \sim N(\chi(t_k^o), R). In general \chi need not be linear and since \chi is modeled only at discrete time steps we do not necessarily know \chi exactly at any non-integer multiple of the time step. In this study we make the assumption that the time steps are small enough so that the dynamics are approximately linear between two adjacent time steps. Note that without this assumption the performance of an ensemble Kalman filter may not be very good anyway. In practice, we use linear interpolation to compute \chi between the discrete times {t_i}.”

— L 77: I would explicitly state that your forward operator/observation operator is the identity. (Also noted by reviewer #1).

— L 81: The notation for time offset here is different from the notation in L76. I suggest making the notation consistent.

— L 83: I suggest defining “ensemble prior estimate of observations” sooner since it may not be clear to all readers. I see that you define it in equation (2).

— L 110: Is \tau known, or is the distribution of \tau known?

— L 120: Please explain the equation for the distance.

— Eq. (4): Define the variables used in this equation. I think it is worth writing out what you are doing here in a few steps (you can condense some of the algebra below if you need more space). As I see it, you are looking for the conditional probability that \epsilon^t = \tau given that the difference (which is a random variable itself, call it D) is equal to d. From conditional probability this is proportional to the probability that \epsilon^t = \tau and D=d, or equivalently, \epsilon^x = d - \tau \nu. What you have here is somewhat confusing because \epsilon^t and \epsilon^x are independent, but conditioned on D=d they are not at all independent.

— Eq. (5): Reminder the reader that you are assuming a normal distribution.

— L 129: If I understand correctly you introduce this term so that you can complete the square and simplify the expression. The word “absorbed” is used in different contexts here and in L 126. Consider using a different word here.

— L 141: This assumption is tricky with the time offset, but perhaps it is okay with your assumption of linearity between two adjacent time steps?

— L 146: Observation error generally means the difference between the observation and the truth, but that is not how I understand \epsilon^p. Please explain.

— Eq. (12): See comment about Eq. (4)

— Eq. (15): I don’t follow this equation. Please explain and be specific about your notation.

— L 171: Why do you make the choice to use a smaller timestep?

— L 175: Is 100 time steps enough to reach a statistically steady state?

— Eq. (17): “.” Is used instead of “,”

— Fig. 1: Check colors for consistency with text. Also check that they are colorblind-friendly.

Citation: https://doi.org/10.5194/egusphere-2022-536-RC2
AC1: 'Comment on egusphere-2022-536', Elia Gorokhovsky, 19 Oct 2022

We thank the referees for their thorough feedback on our manuscript. The attached document contains our responses to the comments made by both referees, including descriptions of changes to the manuscript in response to these comments.

Citation: https://doi.org/10.5194/egusphere-2022-536-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Elia Gorokhovsky on behalf of the Authors (19 Oct 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (17 Nov 2022) by Amit Apte

RR by Anonymous Referee #1 (23 Nov 2022)

RR by Anonymous Referee #2 (12 Dec 2022)

ED: Publish subject to technical corrections (09 Jan 2023) by Amit Apte

AR by Elia Gorokhovsky on behalf of the Authors (15 Jan 2023) Author's response Manuscript

Journal article(s) based on this preprint

07 Feb 2023

Extending ensemble Kalman filter algorithms to assimilate observations with an unknown time offset

Elia Gorokhovsky and Jeffrey L. Anderson

Nonlin. Processes Geophys., 30, 37–47, https://doi.org/10.5194/npg-30-37-2023,https://doi.org/10.5194/npg-30-37-2023, 2023

Short summary

Elia Gorokhovsky and Jeffrey L. Anderson

Viewed

Total article views: 372 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
274	85	13	372	6	6

HTML: 274
PDF: 85
XML: 13
Total: 372
BibTeX: 6
EndNote: 6

Views and downloads (calculated since 12 Jul 2022)

Month	HTML	PDF	XML	Total
Jul 2022	70	21	7	98
Aug 2022	99	26	4	129
Sep 2022	18	7	0	25
Oct 2022	38	13	2	53
Nov 2022	19	4	0	23
Dec 2022	16	5	0	21
Jan 2023	10	8	0	18
Feb 2023	4	1	0	5
Mar 2023	0

Cumulative views and downloads (calculated since 12 Jul 2022)

Month	HTML	PDF	XML	Total
Jul 2022	70	21	7	98
Aug 2022	99	26	4	129
Sep 2022	18	7	0	25
Oct 2022	38	13	2	53
Nov 2022	19	4	0	23
Dec 2022	16	5	0	21
Jan 2023	10	8	0	18
Feb 2023	4	1	0	5
Mar 2023	0

Viewed (geographical distribution)

Total article views: 342 (including HTML, PDF, and XML) Thereof 342 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 24 Mar 2023

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (755 KB)
Metadata XML

Short summary

Older observations of the Earth system sometimes lack information about the time they were taken, posing problems for analyses of past climate. To begin to ameliorate this problem, we propose new methods of varying complexity, including methods to estimate the distribution of the offsets between true and reported observation times. The most successful method accounts for the nonlinearity in the system, but even the less expensive ones can improve data assimilation in the presence of time error.


Total:	0
HTML:	0
PDF:	0
XML:	0