the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Improving Ensemble Data Assimilation through Probit-space Ensemble Size Expansion for Gaussian Copulas (PESE-GC)
Abstract. Small forecast ensemble sizes (< 100) are common in the ensemble data assimilation (EnsDA) component of geophysical forecast systems, thus limiting the error-constraining power of EnsDA. This study proposes an efficient and embarrassingly parallel method to generate additional ensemble members: the Probit-space Ensemble Size Expansion for Gaussian Copulas (PESE-GC; "peace gee see"). Such members are called "virtual members". PESE-GC utilizes the users' knowledge of the marginal distributions of forecast model variables. Virtual members can be generated from any (potentially non-Gaussian) multivariate forecast distribution that has a Gaussian copula. PESE-GC's impact on EnsDA is evaluated using the 40-variable Lorenz 1996 model, several EnsDA algorithms, several observation operators, a range of EnsDA cycling intervals and a range of forecast ensemble sizes. Significant improvements to EnsDA (p < 0.01) are observed when either 1) the forecast ensemble size is small (≤20 members), 2) the user selects marginal distributions that improves the forecast model variable statistics, and/or 3) the rank histogram filter is used with non-parametric priors in high forecast spread situations. These results motivate development and testing of PESE-GC for EnsDA with high-order geophysical models.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(5938 KB)
-
Supplement
(332 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(5938 KB) - Metadata XML
-
Supplement
(332 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2699', Ian Grooms, 02 Jan 2024
This paper presents a method for increasing the ensemble size during the assimilation step of ensemble DA. The method is interesting and the results show that it can improve performance. I have some suggestions for improving the clarity of the discussion though.
Major Comments:
Section 3 could be improved. It attempts to show how PECE-GC could be valuable across a wide range of ensemble DA methods and gives 3 mechanisms whereby PECE-GC could improve performance. I found the whole section to be a bit too vague though. It might help to pick a few specific algorithms and show how PECE-GC could improve each one. For example, the RHF uses a piecewise-linear approximation of the likelihood function; PECE-GC improves this representation. As another example, many EnKFs assume joint Gaussianity of the state and observation vectors, and then approximate the means and cross-covariances of this distribution using an ensemble. If the prior is accurately represented then PECE-GC leads to improved estimates of these means and covariances.
Some things I found confusing about section 3: It starts with two-step ensemble DA, but not all EnKFs operate in a two-step manner. The three mechanisms seem to overlap; e.g. the likelihood function (3.1) is related to the observation operator (3.2), and to observation space ensemble statistics (3.3). Section 4 explains performance in light of these 3 mechanisms, so I also found it confusing.
Minor Comments:
'Gaussian copula' is in the name of the method, but the concept of a Gaussian copula is not explained until section 5.
Line 15/16: I struggle to see how 'small forecast ensembles result in limited representation of observation likelihood functions' is a distinct concept from sampling errors. PECE-GC seems to me to be a way to mitigate certain kinds of sampling errors.
Lines 33/34: It might be more accurate to say that ensemble modulation preserves the first two moments of the ensemble, not that it assumes Gaussian statistics.
Line 218: I think 0.05 model time units is usually interpreted as 6 hours, not 1 hour.
Line 259: The fifth-order GC function is rational, not polynomial.
Line 272: Missing citation for RTPS.
I do not understand why PECE-GC has any impact on the performance of the stochastic EnKF with linear obs. With linear obs the stochastic EnKF update is entirely controlled by the ensemble covariance B, and PECE-GC does not change B (except in the PR configuration).
Line 402: 1,000,00 should probably be 1,000,000?
The GC part of PECE-GC seems to only have been used in the PR configuration, is that right? The EAKF, EnKF, and RHF all use PECE without the GC part?
Citation: https://doi.org/10.5194/egusphere-2023-2699-RC1 - RC2: 'Comment on egusphere-2023-2699', Anonymous Referee #2, 27 Jan 2024
- EC1: 'Comment on egusphere-2023-2699', Olivier Talagrand, 05 Feb 2024
-
AC1: 'Responses to Reviewers and Editor', Man-Yau Chan, 24 Mar 2024
Dear Dr Olivier Talagrand, Dr Ian Groom, and Anonymous Reviewer,
Thank you for your thorough review, commentary, feedback and kind words.
This review process has helped me substantially improve the quality and clarity of the manuscript. I have made every effort to address all of the comments and feedbacks and hope that my efforts will bring this manuscript closer to being accepted for publication in NPG.Â
My responses to the feedback and comments are attached.
Please do not hesitate to contact me if you have further questions and concerns.
Yours Sincerely,
Man-Yau Chan
Assistant Professor
Department of Geography
The Ohio State University
Columbus, Ohio, USAÂ
Â
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2699', Ian Grooms, 02 Jan 2024
This paper presents a method for increasing the ensemble size during the assimilation step of ensemble DA. The method is interesting and the results show that it can improve performance. I have some suggestions for improving the clarity of the discussion though.
Major Comments:
Section 3 could be improved. It attempts to show how PECE-GC could be valuable across a wide range of ensemble DA methods and gives 3 mechanisms whereby PECE-GC could improve performance. I found the whole section to be a bit too vague though. It might help to pick a few specific algorithms and show how PECE-GC could improve each one. For example, the RHF uses a piecewise-linear approximation of the likelihood function; PECE-GC improves this representation. As another example, many EnKFs assume joint Gaussianity of the state and observation vectors, and then approximate the means and cross-covariances of this distribution using an ensemble. If the prior is accurately represented then PECE-GC leads to improved estimates of these means and covariances.
Some things I found confusing about section 3: It starts with two-step ensemble DA, but not all EnKFs operate in a two-step manner. The three mechanisms seem to overlap; e.g. the likelihood function (3.1) is related to the observation operator (3.2), and to observation space ensemble statistics (3.3). Section 4 explains performance in light of these 3 mechanisms, so I also found it confusing.
Minor Comments:
'Gaussian copula' is in the name of the method, but the concept of a Gaussian copula is not explained until section 5.
Line 15/16: I struggle to see how 'small forecast ensembles result in limited representation of observation likelihood functions' is a distinct concept from sampling errors. PECE-GC seems to me to be a way to mitigate certain kinds of sampling errors.
Lines 33/34: It might be more accurate to say that ensemble modulation preserves the first two moments of the ensemble, not that it assumes Gaussian statistics.
Line 218: I think 0.05 model time units is usually interpreted as 6 hours, not 1 hour.
Line 259: The fifth-order GC function is rational, not polynomial.
Line 272: Missing citation for RTPS.
I do not understand why PECE-GC has any impact on the performance of the stochastic EnKF with linear obs. With linear obs the stochastic EnKF update is entirely controlled by the ensemble covariance B, and PECE-GC does not change B (except in the PR configuration).
Line 402: 1,000,00 should probably be 1,000,000?
The GC part of PECE-GC seems to only have been used in the PR configuration, is that right? The EAKF, EnKF, and RHF all use PECE without the GC part?
Citation: https://doi.org/10.5194/egusphere-2023-2699-RC1 - RC2: 'Comment on egusphere-2023-2699', Anonymous Referee #2, 27 Jan 2024
- EC1: 'Comment on egusphere-2023-2699', Olivier Talagrand, 05 Feb 2024
-
AC1: 'Responses to Reviewers and Editor', Man-Yau Chan, 24 Mar 2024
Dear Dr Olivier Talagrand, Dr Ian Groom, and Anonymous Reviewer,
Thank you for your thorough review, commentary, feedback and kind words.
This review process has helped me substantially improve the quality and clarity of the manuscript. I have made every effort to address all of the comments and feedbacks and hope that my efforts will bring this manuscript closer to being accepted for publication in NPG.Â
My responses to the feedback and comments are attached.
Please do not hesitate to contact me if you have further questions and concerns.
Yours Sincerely,
Man-Yau Chan
Assistant Professor
Department of Geography
The Ohio State University
Columbus, Ohio, USAÂ
Â
Peer review completion
Journal article(s) based on this preprint
Model code and software
Code for PESE-GC Lorenz 96 study Man-Yau Chan https://doi.org/10.5281/zenodo.10126956
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
189 | 80 | 22 | 291 | 30 | 13 | 12 |
- HTML: 189
- PDF: 80
- XML: 22
- Total: 291
- Supplement: 30
- BibTeX: 13
- EndNote: 12
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(5938 KB) - Metadata XML
-
Supplement
(332 KB) - BibTeX
- EndNote
- Final revised paper