Observation error estimation in climate proxies with data assimilation and innovation statistics

Okazaki, Atsushi; Carrio, Diego; Dalaiden, Quentin; Harrison-Lofthouse, Jarrah; Kotsuki, Shunji; Yoshimura, Kei

doi:10.5194/egusphere-2025-1389

Preprints

https://doi.org/10.5194/egusphere-2025-1389

Preprints

08 Apr 2025

| 08 Apr 2025

Observation error estimation in climate proxies with data assimilation and innovation statistics

Atsushi Okazaki, Diego Carrio, Quentin Dalaiden, Jarrah Harrison-Lofthouse, Shunji Kotsuki, and Kei Yoshimura

Abstract. Data assimilation (DA) has been successfully applied in paleoclimate reconstruction. DA combines model simulations and climate proxies based on their error sizes. Therefore, the error information is crucial for DA to work optimally. However, little attention has been paid to the observation errors in the previous studies, especially when the proxies are assimilated directly. This study assessed the feasibility of innovation statistics, a method developed for numerical weather prediction, for estimating observation errors in climate reconstruction and its impact on reconstruction skills. For this purpose, we conducted offline-DA experiments over 1870–2000. Here, we assimilated stable water isotope records from ice cores, tree-ring cellulose, and corals. We found that the innovation statistics-based approach correctly estimated the observation errors, even with the offline-DA scheme. Although the accuracy of the estimation depended on the sample size and accuracy of the prior error covariance, the estimation generally improved the reconstruction skills. The reconstruction skills with the estimated observation errors were comparable to those with errors defined differently. In contrast with those other methods, however, the innovation statistics-based approach offers an objective and systematic way to estimate observation errors with light computational cost. As such, the innovation statistics-based approach should contribute to improving the reconstruction skills and observation networks.

Received: 24 Mar 2025 – Discussion started: 08 Apr 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 4479 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (4479 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

22 Oct 2025

Observation error estimation in climate proxies with data assimilation and innovation statistics

Atsushi Okazaki, Diego S. Carrió, Quentin Dalaiden, Jarrah Harrison-Lofthouse, Shunji Kotsuki, and Kei Yoshimura

Clim. Past, 21, 1801–1819, https://doi.org/10.5194/cp-21-1801-2025,https://doi.org/10.5194/cp-21-1801-2025, 2025

Short summary

Atsushi Okazaki, Diego Carrio, Quentin Dalaiden, Jarrah Harrison-Lofthouse, Shunji Kotsuki, and Kei Yoshimura

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1389', Lili Lei, 19 Apr 2025
Summary
This is a very interesting manuscript. The observation error variance is essential for data assimilation, but it is very hard to estimate for paleoclimate data assimilation. This manuscript applies the commonly used methods, especially the Desroziers one, to estimate the error variance for proxies. As expected, the more accurately estimated observation error variances lead to improved climate reconstruction. I have a few comments as below.
Lines 195-200, it is hard to follow from (18) to (19). How the innovation statistics link to the covariance inflation? Using (17), is the numerator the same as the denominator, which give delta = 1? Moreover, in the following discussions, the role of the inflation, especially the relation with the observation error variance, is not clearly discussed.

Lines 218-220, do you mean 136 annual mean simulations are used as ensemble priors? Are the simulations or anomalies used?

Lines 246-247, this is unclear. Do you mean the climatological mean is computed as a smoothing averaging with adjacent years? If yes, how many years are used to compute the climatological mean?

Lines 269-275, till now, it is unclear why ‘BIAS’ is designed?

Lines 355-360, with too large (small) R, small (large) inflation values are expected. It would be nice to show the estimated inflation given different R.

Figure 9, please give some potential explanations for the regions with negative skills.

Lines 414, ‘remarkably’ -> ‘remarkably worse’?
Citation: https://doi.org/10.5194/egusphere-2025-1389-RC1
- AC1: 'Reply on RC1', Atsushi Okazaki, 01 Jul 2025
  
  Thank you very much for reviewing our manuscript in detail and providing us with valuable feedback. We have addressed your comments and questions point by point and proposed several changes to the manuscript. We believe these revisions will significantly enhance the quality and clarity of our work. Please see the PDF file for details.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1389-AC1
RC2:
'Comment on egusphere-2025-1389', Anonymous Referee #2, 27 Apr 2025

This paper provides an important contribution to our understanding of the uncertainty parameter (R) in paleoclimate applications of DA. As the authors note, this is an often-minor consideration of previous researchers, but nonetheless an important parameter. The innovation statistic application that is tested provides a useful alternative to the linear regression methods which requires significant overlap with 19th-21^st century climate observations, but future tests will eventually be required to ensure that the proposed method is skillful for deep time reconstructions. The manuscript is well-structured and methodologically sound, and I recommend minor revisions prior to publication to further strengthen the clarity and accessibility of the work, as well as to understand impacts to the posterior ensemble.

Specifically, the text could be improved by additional plain language description of the innovation statistic method (see comment for line 126) and impact of age uncertainty (see comment for Line 554). Furthermore, an important role of R in paleoclimate studies is to properly quantify changes to the ensemble spread, and it would be beneficially to include additional analysis or commentary on how the innovation statistic impacts the ensemble range rather than just the mean.

Additionally, I’ve listed a series of minor comments which should be addressed prior to publication:
Line 126: This paragraph would benefit from expansion to provide a plain language summary of innovation statistics. The methods section is quite technical and will be difficult to follow for readers who are unfamiliar with the method.
Line 144: Could you provide additional clarification on the difference between the LETKF and an EnKF implemented with a localization radius.
Line 203: Please explicitly state whether the OSSE is equivalent to a pseudoproxy experiment, or explain the differences if not.
Line 215: Does “MIROC5” refer to “MIROC5-iso” or a different simulation?
Line 216: Why was only r1i1p1 used to create the prior? If this is the CMIP5 MIROC5 simulations, wouldn’t more ensemble members be available?
Line 221: Were proxies records filtered to span a certain amount of the 1870-2000 study period?
Line 224: What do you mean by “complementary”?
Line 224: It’s unclear why just the documentary data were used for temperature.
Line 229: Could you describe the linear interpolation method. Is this an interpolation between two grid center points? How does this work in two-dimensional space?
Line 248: What metric(s) was optimized that resulted in a half-localization scale of 8,000 km?
Line 295: Do these skill metrics consider the ensemble spread or just the ensemble mean?
Line 295: Do these skill metrics consider the spatial correlation or interannual variability? If the former, how does the innovation statistic impact interannual variability in the posterior?
Line 315: Please clarify the units within Figure 2.
Line 378: Is this because no PSM was applied?
Line 380: Also important to note that the small sample size of 3 records.
Line 523: Please clarify what the 5%-30% improvement is measured against.
Line 554: Age uncertainty is a very important consideration for deep time. Not only is the exact date uncertain, but also the amount of time that each measurement represents, which will impact the variance and therefore the estimation of R. Given the authors highlight deep-time applications as a key motivation, a more extensive discussion — or a small pilot analysis (e.g., assimilating non-annual records (i.e., speleothem) with age uncertainty) — would strengthen the case for broader applicability.

Citation: https://doi.org/10.5194/egusphere-2025-1389-RC2
- AC2: 'Reply on RC2', Atsushi Okazaki, 01 Jul 2025
  
  Thank you very much for reviewing our manuscript in detail and providing us with valuable feedback. We have addressed your comments and questions point by point and proposed several changes to the manuscript. We believe these revisions will significantly enhance the quality and clarity of our work. Please see the PDF file for details.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1389-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-1389', Lili Lei, 19 Apr 2025
Summary
This is a very interesting manuscript. The observation error variance is essential for data assimilation, but it is very hard to estimate for paleoclimate data assimilation. This manuscript applies the commonly used methods, especially the Desroziers one, to estimate the error variance for proxies. As expected, the more accurately estimated observation error variances lead to improved climate reconstruction. I have a few comments as below.
Lines 195-200, it is hard to follow from (18) to (19). How the innovation statistics link to the covariance inflation? Using (17), is the numerator the same as the denominator, which give delta = 1? Moreover, in the following discussions, the role of the inflation, especially the relation with the observation error variance, is not clearly discussed.

Lines 218-220, do you mean 136 annual mean simulations are used as ensemble priors? Are the simulations or anomalies used?

Lines 246-247, this is unclear. Do you mean the climatological mean is computed as a smoothing averaging with adjacent years? If yes, how many years are used to compute the climatological mean?

Lines 269-275, till now, it is unclear why ‘BIAS’ is designed?

Lines 355-360, with too large (small) R, small (large) inflation values are expected. It would be nice to show the estimated inflation given different R.

Figure 9, please give some potential explanations for the regions with negative skills.

Lines 414, ‘remarkably’ -> ‘remarkably worse’?
Citation: https://doi.org/10.5194/egusphere-2025-1389-RC1
- AC1: 'Reply on RC1', Atsushi Okazaki, 01 Jul 2025
  
  Thank you very much for reviewing our manuscript in detail and providing us with valuable feedback. We have addressed your comments and questions point by point and proposed several changes to the manuscript. We believe these revisions will significantly enhance the quality and clarity of our work. Please see the PDF file for details.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1389-AC1
RC2:
'Comment on egusphere-2025-1389', Anonymous Referee #2, 27 Apr 2025

This paper provides an important contribution to our understanding of the uncertainty parameter (R) in paleoclimate applications of DA. As the authors note, this is an often-minor consideration of previous researchers, but nonetheless an important parameter. The innovation statistic application that is tested provides a useful alternative to the linear regression methods which requires significant overlap with 19th-21^st century climate observations, but future tests will eventually be required to ensure that the proposed method is skillful for deep time reconstructions. The manuscript is well-structured and methodologically sound, and I recommend minor revisions prior to publication to further strengthen the clarity and accessibility of the work, as well as to understand impacts to the posterior ensemble.

Specifically, the text could be improved by additional plain language description of the innovation statistic method (see comment for line 126) and impact of age uncertainty (see comment for Line 554). Furthermore, an important role of R in paleoclimate studies is to properly quantify changes to the ensemble spread, and it would be beneficially to include additional analysis or commentary on how the innovation statistic impacts the ensemble range rather than just the mean.

Additionally, I’ve listed a series of minor comments which should be addressed prior to publication:
Line 126: This paragraph would benefit from expansion to provide a plain language summary of innovation statistics. The methods section is quite technical and will be difficult to follow for readers who are unfamiliar with the method.
Line 144: Could you provide additional clarification on the difference between the LETKF and an EnKF implemented with a localization radius.
Line 203: Please explicitly state whether the OSSE is equivalent to a pseudoproxy experiment, or explain the differences if not.
Line 215: Does “MIROC5” refer to “MIROC5-iso” or a different simulation?
Line 216: Why was only r1i1p1 used to create the prior? If this is the CMIP5 MIROC5 simulations, wouldn’t more ensemble members be available?
Line 221: Were proxies records filtered to span a certain amount of the 1870-2000 study period?
Line 224: What do you mean by “complementary”?
Line 224: It’s unclear why just the documentary data were used for temperature.
Line 229: Could you describe the linear interpolation method. Is this an interpolation between two grid center points? How does this work in two-dimensional space?
Line 248: What metric(s) was optimized that resulted in a half-localization scale of 8,000 km?
Line 295: Do these skill metrics consider the ensemble spread or just the ensemble mean?
Line 295: Do these skill metrics consider the spatial correlation or interannual variability? If the former, how does the innovation statistic impact interannual variability in the posterior?
Line 315: Please clarify the units within Figure 2.
Line 378: Is this because no PSM was applied?
Line 380: Also important to note that the small sample size of 3 records.
Line 523: Please clarify what the 5%-30% improvement is measured against.
Line 554: Age uncertainty is a very important consideration for deep time. Not only is the exact date uncertain, but also the amount of time that each measurement represents, which will impact the variance and therefore the estimation of R. Given the authors highlight deep-time applications as a key motivation, a more extensive discussion — or a small pilot analysis (e.g., assimilating non-annual records (i.e., speleothem) with age uncertainty) — would strengthen the case for broader applicability.

Citation: https://doi.org/10.5194/egusphere-2025-1389-RC2
- AC2: 'Reply on RC2', Atsushi Okazaki, 01 Jul 2025
  
  Thank you very much for reviewing our manuscript in detail and providing us with valuable feedback. We have addressed your comments and questions point by point and proposed several changes to the manuscript. We believe these revisions will significantly enhance the quality and clarity of our work. Please see the PDF file for details.
  
  Citation: https://doi.org/10.5194/egusphere-2025-1389-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to minor revisions (review by editor) (14 Jul 2025) by Francesco Muschitiello

AR by Atsushi Okazaki on behalf of the Authors (07 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (12 Aug 2025) by Francesco Muschitiello

AR by Atsushi Okazaki on behalf of the Authors (12 Aug 2025) Manuscript

Post-review adjustments

AA – Author's adjustment | EA – Editor approval

AA by Atsushi Okazaki on behalf of the Authors (20 Oct 2025) Author's adjustment Manuscript

EA: Adjustments approved (20 Oct 2025) by Francesco Muschitiello

Journal article(s) based on this preprint

22 Oct 2025

Observation error estimation in climate proxies with data assimilation and innovation statistics

Atsushi Okazaki, Diego S. Carrió, Quentin Dalaiden, Jarrah Harrison-Lofthouse, Shunji Kotsuki, and Kei Yoshimura

Clim. Past, 21, 1801–1819, https://doi.org/10.5194/cp-21-1801-2025,https://doi.org/10.5194/cp-21-1801-2025, 2025

Short summary

Atsushi Okazaki, Diego Carrio, Quentin Dalaiden, Jarrah Harrison-Lofthouse, Shunji Kotsuki, and Kei Yoshimura

Viewed

Total article views: 2,725 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,341	302	82	2,725	93	153

HTML: 2,341
PDF: 302
XML: 82
Total: 2,725
BibTeX: 93
EndNote: 153

Views and downloads (calculated since 08 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	252	40	14	306
May 2025	114	24	4	142
Jun 2025	100	10	0	110
Jul 2025	90	32	16	138
Aug 2025	244	20	0	264
Sep 2025	1,078	32	2	1,112
Oct 2025	92	18	4	114
Nov 2025	50	24	8	82
Dec 2025	62	18	4	84
Jan 2026	54	28	16	98
Feb 2026	72	10	6	88
Mar 2026	80	24	6	110
Apr 2026	47	15	0	62
May 2026	6	7	2	15

Cumulative views and downloads (calculated since 08 Apr 2025)

Month	HTML	PDF	XML	Total
Apr 2025	252	40	14	306
May 2025	114	24	4	142
Jun 2025	100	10	0	110
Jul 2025	90	32	16	138
Aug 2025	244	20	0	264
Sep 2025	1,078	32	2	1,112
Oct 2025	92	18	4	114
Nov 2025	50	24	8	82
Dec 2025	62	18	4	84
Jan 2026	54	28	16	98
Feb 2026	72	10	6	88
Mar 2026	80	24	6	110
Apr 2026	47	15	0	62
May 2026	6	7	2	15

Viewed (geographical distribution)

Total article views: 2,725 (including HTML, PDF, and XML) Thereof 2,725 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 25 May 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (4479 KB)
Metadata XML

Short summary

Data assimilation (DA) has been used to reconstruct paleoclimate fields. DA integrates model simulations and climate proxies based on their error sizes. Consequently, error information is vital for DA to function optimally. This study estimated observation errors using "innovation statistics" and demonstrated DA with estimated errors outperformed previous studies.


Total:	0
HTML:	0
PDF:	0
XML:	0