the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Observation error estimation in climate proxies with data assimilation and innovation statistics
Abstract. Data assimilation (DA) has been successfully applied in paleoclimate reconstruction. DA combines model simulations and climate proxies based on their error sizes. Therefore, the error information is crucial for DA to work optimally. However, little attention has been paid to the observation errors in the previous studies, especially when the proxies are assimilated directly. This study assessed the feasibility of innovation statistics, a method developed for numerical weather prediction, for estimating observation errors in climate reconstruction and its impact on reconstruction skills. For this purpose, we conducted offline-DA experiments over 1870–2000. Here, we assimilated stable water isotope records from ice cores, tree-ring cellulose, and corals. We found that the innovation statistics-based approach correctly estimated the observation errors, even with the offline-DA scheme. Although the accuracy of the estimation depended on the sample size and accuracy of the prior error covariance, the estimation generally improved the reconstruction skills. The reconstruction skills with the estimated observation errors were comparable to those with errors defined differently. In contrast with those other methods, however, the innovation statistics-based approach offers an objective and systematic way to estimate observation errors with light computational cost. As such, the innovation statistics-based approach should contribute to improving the reconstruction skills and observation networks.
- Preprint
(4479 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 03 Jun 2025)
-
RC1: 'Comment on egusphere-2025-1389', Lili Lei, 19 Apr 2025
reply
Summary
This is a very interesting manuscript. The observation error variance is essential for data assimilation, but it is very hard to estimate for paleoclimate data assimilation. This manuscript applies the commonly used methods, especially the Desroziers one, to estimate the error variance for proxies. As expected, the more accurately estimated observation error variances lead to improved climate reconstruction. I have a few comments as below.
- Lines 195-200, it is hard to follow from (18) to (19). How the innovation statistics link to the covariance inflation? Using (17), is the numerator the same as the denominator, which give delta = 1? Moreover, in the following discussions, the role of the inflation, especially the relation with the observation error variance, is not clearly discussed.
- Lines 218-220, do you mean 136 annual mean simulations are used as ensemble priors? Are the simulations or anomalies used?
- Lines 246-247, this is unclear. Do you mean the climatological mean is computed as a smoothing averaging with adjacent years? If yes, how many years are used to compute the climatological mean?
- Lines 269-275, till now, it is unclear why ‘BIAS’ is designed?
- Lines 355-360, with too large (small) R, small (large) inflation values are expected. It would be nice to show the estimated inflation given different R.
- Figure 9, please give some potential explanations for the regions with negative skills.
- Lines 414, ‘remarkably’ -> ‘remarkably worse’?
Citation: https://doi.org/10.5194/egusphere-2025-1389-RC1 -
RC2: 'Comment on egusphere-2025-1389', Anonymous Referee #2, 27 Apr 2025
reply
This paper provides an important contribution to our understanding of the uncertainty parameter (R) in paleoclimate applications of DA. As the authors note, this is an often-minor consideration of previous researchers, but nonetheless an important parameter. The innovation statistic application that is tested provides a useful alternative to the linear regression methods which requires significant overlap with 19th-21st century climate observations, but future tests will eventually be required to ensure that the proposed method is skillful for deep time reconstructions. The manuscript is well-structured and methodologically sound, and I recommend minor revisions prior to publication to further strengthen the clarity and accessibility of the work, as well as to understand impacts to the posterior ensemble.
Specifically, the text could be improved by additional plain language description of the innovation statistic method (see comment for line 126) and impact of age uncertainty (see comment for Line 554). Furthermore, an important role of R in paleoclimate studies is to properly quantify changes to the ensemble spread, and it would be beneficially to include additional analysis or commentary on how the innovation statistic impacts the ensemble range rather than just the mean.
Additionally, I’ve listed a series of minor comments which should be addressed prior to publication:
Line 126: This paragraph would benefit from expansion to provide a plain language summary of innovation statistics. The methods section is quite technical and will be difficult to follow for readers who are unfamiliar with the method.
Line 144: Could you provide additional clarification on the difference between the LETKF and an EnKF implemented with a localization radius.
Line 203: Please explicitly state whether the OSSE is equivalent to a pseudoproxy experiment, or explain the differences if not.
Line 215: Does “MIROC5” refer to “MIROC5-iso” or a different simulation?
Line 216: Why was only r1i1p1 used to create the prior? If this is the CMIP5 MIROC5 simulations, wouldn’t more ensemble members be available?
Line 221: Were proxies records filtered to span a certain amount of the 1870-2000 study period?
Line 224: What do you mean by “complementary”?
Line 224: It’s unclear why just the documentary data were used for temperature.
Line 229: Could you describe the linear interpolation method. Is this an interpolation between two grid center points? How does this work in two-dimensional space?
Line 248: What metric(s) was optimized that resulted in a half-localization scale of 8,000 km?
Line 295: Do these skill metrics consider the ensemble spread or just the ensemble mean?
Line 295: Do these skill metrics consider the spatial correlation or interannual variability? If the former, how does the innovation statistic impact interannual variability in the posterior?
Line 315: Please clarify the units within Figure 2.
Line 378: Is this because no PSM was applied?
Line 380: Also important to note that the small sample size of 3 records.
Line 523: Please clarify what the 5%-30% improvement is measured against.
Line 554: Age uncertainty is a very important consideration for deep time. Not only is the exact date uncertain, but also the amount of time that each measurement represents, which will impact the variance and therefore the estimation of R. Given the authors highlight deep-time applications as a key motivation, a more extensive discussion — or a small pilot analysis (e.g., assimilating non-annual records (i.e., speleothem) with age uncertainty) — would strengthen the case for broader applicability.
Citation: https://doi.org/10.5194/egusphere-2025-1389-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
112 | 16 | 6 | 134 | 4 | 4 |
- HTML: 112
- PDF: 16
- XML: 6
- Total: 134
- BibTeX: 4
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1