the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Bias Correction of Climate Models using a Bayesian Hierarchical Model
Abstract. Climate models, derived from process understanding, are essential tools in the study of climate change and its wide-ranging impacts on the biosphere. Hindcast and future simulations provide comprehensive spatiotemporal estimates of climatology that are frequently employed within the environmental sciences community, although the output can be afflicted with bias that impedes direct interpretation. Bias correction approaches using observational data aim to address this challenge. However, approaches are typically criticised for not being physically justified and not considering uncertainty in the correction. These aspects are particularly important in cases where observations are sparse, such as for weather station data over Antarctica. This paper attempts to address both of these issues through the development of a novel Bayesian hierarchical model for bias prediction. The model propagates uncertainty robustly and uses latent Gaussian process distributions to capture underlying spatial covariance patterns, partially preserving the covariance structure from the climate model which is based on well-established physical laws. The Bayesian framework can handle complex modelling structures and provides an approach that is flexible and adaptable to specific areas of application, even increasing the scope of the work to data assimilation tasks more generally. Results in this paper are presented for one-dimensional simulated examples for clarity, although the method implementation has been developed to also work on multidimensional data as found in most real applications. Performance under different simulated scenarios is examined, with the method providing most value added over alternative approaches in the case of sparse observations and smooth underlying bias. A major benefit of the model is the robust propagation of uncertainty, which is of key importance to a range of stakeholders, from climate scientists engaged in impact studies, decision makers trying to understand the likelihood of particular scenarios and individuals involved in climate change adaption strategies where accurate risk assessment is required for optimal resource allocation.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(1021 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1021 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2536', Anonymous Referee #1, 31 Jan 2024
The authors propose an interesting framework for bias prediction and correction. The framework models the bias in climate model outputs, which may have resulted from any approximations that were made in modeling the behavior of the system. The novelty of the bias correction framework comes from its ability to model spatial association among different locations and uncertainty estimation. The climate model outputs, observed data, and the bias are all modeled as Gaussian processes. This enables bias prediction and uncertainty estimation in these separate components, providing additional utility to the framework in real-life decision-making. The framework is evaluated on simulated data.
Strengths:
- The paper provides a clear overview of the methodology. Figure 11 is especially interesting and useful in understanding the capability of the bias correction framework.
- Experiments under different data distributions are very helpful in understanding the benefits of the bias correction framework.
- The supplementary information is useful. Section A1, especially, explained the GP modeling framework very well.
Concerns:
- The experiments and results are only on simulated data. While the experiments are beneficial in understanding how the model would perform under different data distributions, the benefits of the framework on real observed / climate model data have not been validated.
- The clarity of the paper can be improved. Addressing these concerns may make it easier to follow the discussion in the paper.
- In several places, the term “model” can be clarified. For instance, if it is the climate model or the GP model that is being discussed.
- Phrases like “spatially varying parameter of the PDF for the climate model output” are a little unclear. Does this refer to the hyper-parameters from the GP model?
- Figures 3 and 5 have not been discussed as part of the main results/discussion. It is unclear how these figures factor into the main discussion.
- It is unclear if the authors are separating the train and test set to evaluate the model performance for the GP models. Training on test set may result in overfitting and high R2 scores.
- The abstract and introduction talk about several potential benefits of the model. In the current structuring, it is a little difficult to follow along which statements correspond with validated claims in the paper and which statements are just potential benefits / future directions. More emphasis on statements that support the results presented in the result section may strengthen the case for the proposed framework.
- Similarly, in the discussion section, it may be helpful to separate the potential directions / future work in a separate paragraph as opposed to the
Questions:
- Why is the uncertainty not being plotted for the phi z case in Figures 7 and 9?
- Is the model performance evaluated on the train set itself?
- How is the “robustness” of uncertainty propagation being evaluated?
- There may be concerns relating to the computational complexity of the framework. Since the three terms are modeled as exact GPs, the time and space complexity will grow quickly for larger datasets. Are there any variants/modifications to the framework that are being studied to overcome these challenges?
- Since simulated data is being used, are the model performances averaged over several repetitions?
- Does the proposed framework use quantile mapping along with GP modeling for bias correction?
- How many iterations were used for the MCMC sampling?
Other Comments:
- The paper can benefit from more discussion on how uncertainty estimates are impacted by bias correction.
- It is unclear to what extent the quantile mapping mitigates the bias and to what extent the proposed GP framework is useful in bias correction.
- In the abstract, it is suggested that the model provides value addition that is more than that of “alternative approaches.” A comparative analysis with the alternative approaches may help substantiate this claim.
- Appendix equations A1 to A11 were really helpful and can potentially be included in the main section
Citation: https://doi.org/10.5194/egusphere-2023-2536-RC1 -
RC2: 'Comment on egusphere-2023-2536', Anonymous Referee #2, 31 Jan 2024
Review comments on the “Bias Correction of Climate Models using a Bayesian Hierarchical Model” submitted to Geoscientific Model Development by Carter et al.
In the submitted manuscript, the authors proposed a Bayesian Hierarchical framework for bias correction of the climate model outputs. According to the authors, the proposed framework has the advantage of conserving the spatial covariance structure of the model bias. Further, the proposed technique can also consider the uncertainties during the bias correction. In the submitted manuscript, the authors spent many words explaining the concept of the proposed approach. The proposed technique, a hierarchical shared latent generating process, was tested on synthetic data against two baseline techniques: a non-hierarchical, shared latent generating process, and a non-hierarchical, non-shared latent generating process. The results showed that the proposed technique does deliver better performance than the baseline techniques in terms of preserving the spatial covariance structure of the climate model and allowing the propagation of uncertainty. Per the author's statements, the novelty of this paper lies in applying a Bayesian hierarchical model to mitigate some of the limitations of pre-existed bias correction techniques. Despite this enhancement, the reviewer believes the submitted manuscript should undergo at least one round of major revision before potential publication for reasons listed below:
- Manuscript Structure: The reviewer recommends a comprehensive revision of the submitted manuscript to adhere to a conventional structure, encompassing introduction, data, methodology, results, discussion, and conclusion. The current manuscript structure leads to repetitive content, with similar discussions occurring in sections such as 2.3 and later in section 3, particularly regarding the limitations of pre-existing bias correction approaches. To improve continuity, the reviewer suggests maintaining focus within each section, as the current nested content creates a discontinuous logic flow that may be challenging for readers to follow.
- Novelty: The reviewer observes that the novelty of the submitted manuscript is not adequately emphasized. In the submitted manuscript, only a handful of bias correction related work were mentioned in the introduction section. However, none of the cited works applied the Bayesian-related technique for bias correction. The reviewer believes that there should be more literature available that are related to this study. Notably, the authors themselves acknowledge the foundation of the submitted work on Lima et al. (2021) in section 3. However, the reviewer is left questioning the relationship between Lima et al. (2021) and the current study. To mitigate potential confusion and provide a more comprehensive background, the reviewer recommends introducing this information more explicitly. This comment echoes concerns outlined in detail in Comment #1.
- Experiment design: Asides from comment#1 where it is suggested the author re-organize the manuscript, the reviewer suggests conducting experiments with the proposed technique using real climate model output. Experiments on climate model output could better highlight the strength and further verify the effectiveness of the proposed techniques.
- Methodology: It appears to the reviewer that the methodology of the submitted study should be better introduced and clearly explained. It appears to the reviewer that important information is missing which prevents the audience from following and telling the scientific and technical values of the proposed bias correction approach. For instance, to the reviewer’s understanding the non-hierarchical single process model and the non-hierarchical shared process model are the benchmarks to compare with the proposed hierarchical shared process Bayes model. However, within the entire methodology section (i.e., Sections 2 and 3), the definition of “single process model” is not clearly explained (only described with one sentence in line 331). Then later in the result section, the term “single process model” suddenly appeared which caused the reviewer to be confused. The reviewer went to the Appendix A to look for an answer but did success as the formulation of the “single hierarchical model” seems undescribed. The reviewer also finds it hard to understand why the link function is necessary to transform the parameter space of the standard deviation to the same as the sample space (Section 3.2). Is that a common practice for Bayesian models? The reviewer thinks a more detailed explanation will be helpful.
Minor comment:
The time series plot in Figure 11 exclusively illustrates the performance of the proposed Bayesian framework. The reviewer suggests including the performance of benchmark techniques in the same figure to facilitate comparisons.
Citation: https://doi.org/10.5194/egusphere-2023-2536-RC2 -
RC3: 'Comment on egusphere-2023-2536', Anonymous Referee #3, 31 Jan 2024
Major comments:
1. I definitely agree with other two reviewers that the manuscript is wordy and lengthy in its current shape. Specifically, there are some technical assumptions that are more or less mentioned in every single section (e.g. sum of GP is still a GP, bias is assumed to be independent from the in-situ observations, etc). I suggest authors revise their manuscript such that there can be a section that focuses on describing the assumptions they use in their method.
2. Given that the authors are about to clarify and shorten their assumptions, it is still of great interest whether the proposed method can work if the assumptions are slightly violated. Now the experiments are conducted using fully synthetic/simulated data that strictly follows the assumptions. I suggest authors prepare an example where it is not abundantly clear whether the assumptions still hold (a real data example), or at least generate synthetic data that intentionally violates the assumption. Some really strict assumptions in my opinion are:
(i) the bias is independent of the in-situ observations
(ii) the bias is time independent
For example, if the authors can conduct still a synthetic experiment where the generated bias is slightly dependent on the time and climate model output, and run their algorithm against this case, this would make the proposed algorithm stronger. Another option for the authors is to refer to existing literature and argue that some of the assumptions hold for most cases in real applications.
3. In terms of the structure of the manuscript, I personally have to understand the contribution of the manuscript until I start reading line 290 in page 12. It would help if the authors can make this clear very early in the manuscript.
Minor comments:
1. Is "1 process" just doing a GP for Y and "2 process" doing both GP for Y and Z? It is not very clear when I read through Figure 7 and its associated experiment section.
2. Line 115: "the their" is a typo
3. I personally find that Appendix A is easier to follow than some of the text in the main body. Authors may want to consider restructuring the manuscript
Citation: https://doi.org/10.5194/egusphere-2023-2536-RC3 -
AC1: 'Author's Comment (AC) on egusphere-2023-2536 in Reply to Referee Comments', Jeremy Carter, 25 Mar 2024
Thank you for the review of the manuscript. We are very grateful for your careful and insightful comments, which have contributed to the improvement of the original manuscript. We have worked hard to incorporate the feedback into the revised manuscript and have detailed here our thoughts and any changes made for each comment individually in the attached 'egusphere-2023-2536_Authors_Response.pdf' document. We hope you find the response and changes satisfactory.
A revised manuscript has also been provided (egusphere-2023-2536_Revised_Manuscript.pdf), along with a LaTex-diff document highlighting changes (egusphere-2023-2536_Manuscript_Latexdiff.pdf). Due to the re-structuring of the manuscript the LaTex-diff document is difficult to follow, so a brief summary of the main changes is described in the 'egusphere-2023-2536_Authors_Response.pdf' document, along with specific replies for each comment.
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2536', Anonymous Referee #1, 31 Jan 2024
The authors propose an interesting framework for bias prediction and correction. The framework models the bias in climate model outputs, which may have resulted from any approximations that were made in modeling the behavior of the system. The novelty of the bias correction framework comes from its ability to model spatial association among different locations and uncertainty estimation. The climate model outputs, observed data, and the bias are all modeled as Gaussian processes. This enables bias prediction and uncertainty estimation in these separate components, providing additional utility to the framework in real-life decision-making. The framework is evaluated on simulated data.
Strengths:
- The paper provides a clear overview of the methodology. Figure 11 is especially interesting and useful in understanding the capability of the bias correction framework.
- Experiments under different data distributions are very helpful in understanding the benefits of the bias correction framework.
- The supplementary information is useful. Section A1, especially, explained the GP modeling framework very well.
Concerns:
- The experiments and results are only on simulated data. While the experiments are beneficial in understanding how the model would perform under different data distributions, the benefits of the framework on real observed / climate model data have not been validated.
- The clarity of the paper can be improved. Addressing these concerns may make it easier to follow the discussion in the paper.
- In several places, the term “model” can be clarified. For instance, if it is the climate model or the GP model that is being discussed.
- Phrases like “spatially varying parameter of the PDF for the climate model output” are a little unclear. Does this refer to the hyper-parameters from the GP model?
- Figures 3 and 5 have not been discussed as part of the main results/discussion. It is unclear how these figures factor into the main discussion.
- It is unclear if the authors are separating the train and test set to evaluate the model performance for the GP models. Training on test set may result in overfitting and high R2 scores.
- The abstract and introduction talk about several potential benefits of the model. In the current structuring, it is a little difficult to follow along which statements correspond with validated claims in the paper and which statements are just potential benefits / future directions. More emphasis on statements that support the results presented in the result section may strengthen the case for the proposed framework.
- Similarly, in the discussion section, it may be helpful to separate the potential directions / future work in a separate paragraph as opposed to the
Questions:
- Why is the uncertainty not being plotted for the phi z case in Figures 7 and 9?
- Is the model performance evaluated on the train set itself?
- How is the “robustness” of uncertainty propagation being evaluated?
- There may be concerns relating to the computational complexity of the framework. Since the three terms are modeled as exact GPs, the time and space complexity will grow quickly for larger datasets. Are there any variants/modifications to the framework that are being studied to overcome these challenges?
- Since simulated data is being used, are the model performances averaged over several repetitions?
- Does the proposed framework use quantile mapping along with GP modeling for bias correction?
- How many iterations were used for the MCMC sampling?
Other Comments:
- The paper can benefit from more discussion on how uncertainty estimates are impacted by bias correction.
- It is unclear to what extent the quantile mapping mitigates the bias and to what extent the proposed GP framework is useful in bias correction.
- In the abstract, it is suggested that the model provides value addition that is more than that of “alternative approaches.” A comparative analysis with the alternative approaches may help substantiate this claim.
- Appendix equations A1 to A11 were really helpful and can potentially be included in the main section
Citation: https://doi.org/10.5194/egusphere-2023-2536-RC1 -
RC2: 'Comment on egusphere-2023-2536', Anonymous Referee #2, 31 Jan 2024
Review comments on the “Bias Correction of Climate Models using a Bayesian Hierarchical Model” submitted to Geoscientific Model Development by Carter et al.
In the submitted manuscript, the authors proposed a Bayesian Hierarchical framework for bias correction of the climate model outputs. According to the authors, the proposed framework has the advantage of conserving the spatial covariance structure of the model bias. Further, the proposed technique can also consider the uncertainties during the bias correction. In the submitted manuscript, the authors spent many words explaining the concept of the proposed approach. The proposed technique, a hierarchical shared latent generating process, was tested on synthetic data against two baseline techniques: a non-hierarchical, shared latent generating process, and a non-hierarchical, non-shared latent generating process. The results showed that the proposed technique does deliver better performance than the baseline techniques in terms of preserving the spatial covariance structure of the climate model and allowing the propagation of uncertainty. Per the author's statements, the novelty of this paper lies in applying a Bayesian hierarchical model to mitigate some of the limitations of pre-existed bias correction techniques. Despite this enhancement, the reviewer believes the submitted manuscript should undergo at least one round of major revision before potential publication for reasons listed below:
- Manuscript Structure: The reviewer recommends a comprehensive revision of the submitted manuscript to adhere to a conventional structure, encompassing introduction, data, methodology, results, discussion, and conclusion. The current manuscript structure leads to repetitive content, with similar discussions occurring in sections such as 2.3 and later in section 3, particularly regarding the limitations of pre-existing bias correction approaches. To improve continuity, the reviewer suggests maintaining focus within each section, as the current nested content creates a discontinuous logic flow that may be challenging for readers to follow.
- Novelty: The reviewer observes that the novelty of the submitted manuscript is not adequately emphasized. In the submitted manuscript, only a handful of bias correction related work were mentioned in the introduction section. However, none of the cited works applied the Bayesian-related technique for bias correction. The reviewer believes that there should be more literature available that are related to this study. Notably, the authors themselves acknowledge the foundation of the submitted work on Lima et al. (2021) in section 3. However, the reviewer is left questioning the relationship between Lima et al. (2021) and the current study. To mitigate potential confusion and provide a more comprehensive background, the reviewer recommends introducing this information more explicitly. This comment echoes concerns outlined in detail in Comment #1.
- Experiment design: Asides from comment#1 where it is suggested the author re-organize the manuscript, the reviewer suggests conducting experiments with the proposed technique using real climate model output. Experiments on climate model output could better highlight the strength and further verify the effectiveness of the proposed techniques.
- Methodology: It appears to the reviewer that the methodology of the submitted study should be better introduced and clearly explained. It appears to the reviewer that important information is missing which prevents the audience from following and telling the scientific and technical values of the proposed bias correction approach. For instance, to the reviewer’s understanding the non-hierarchical single process model and the non-hierarchical shared process model are the benchmarks to compare with the proposed hierarchical shared process Bayes model. However, within the entire methodology section (i.e., Sections 2 and 3), the definition of “single process model” is not clearly explained (only described with one sentence in line 331). Then later in the result section, the term “single process model” suddenly appeared which caused the reviewer to be confused. The reviewer went to the Appendix A to look for an answer but did success as the formulation of the “single hierarchical model” seems undescribed. The reviewer also finds it hard to understand why the link function is necessary to transform the parameter space of the standard deviation to the same as the sample space (Section 3.2). Is that a common practice for Bayesian models? The reviewer thinks a more detailed explanation will be helpful.
Minor comment:
The time series plot in Figure 11 exclusively illustrates the performance of the proposed Bayesian framework. The reviewer suggests including the performance of benchmark techniques in the same figure to facilitate comparisons.
Citation: https://doi.org/10.5194/egusphere-2023-2536-RC2 -
RC3: 'Comment on egusphere-2023-2536', Anonymous Referee #3, 31 Jan 2024
Major comments:
1. I definitely agree with other two reviewers that the manuscript is wordy and lengthy in its current shape. Specifically, there are some technical assumptions that are more or less mentioned in every single section (e.g. sum of GP is still a GP, bias is assumed to be independent from the in-situ observations, etc). I suggest authors revise their manuscript such that there can be a section that focuses on describing the assumptions they use in their method.
2. Given that the authors are about to clarify and shorten their assumptions, it is still of great interest whether the proposed method can work if the assumptions are slightly violated. Now the experiments are conducted using fully synthetic/simulated data that strictly follows the assumptions. I suggest authors prepare an example where it is not abundantly clear whether the assumptions still hold (a real data example), or at least generate synthetic data that intentionally violates the assumption. Some really strict assumptions in my opinion are:
(i) the bias is independent of the in-situ observations
(ii) the bias is time independent
For example, if the authors can conduct still a synthetic experiment where the generated bias is slightly dependent on the time and climate model output, and run their algorithm against this case, this would make the proposed algorithm stronger. Another option for the authors is to refer to existing literature and argue that some of the assumptions hold for most cases in real applications.
3. In terms of the structure of the manuscript, I personally have to understand the contribution of the manuscript until I start reading line 290 in page 12. It would help if the authors can make this clear very early in the manuscript.
Minor comments:
1. Is "1 process" just doing a GP for Y and "2 process" doing both GP for Y and Z? It is not very clear when I read through Figure 7 and its associated experiment section.
2. Line 115: "the their" is a typo
3. I personally find that Appendix A is easier to follow than some of the text in the main body. Authors may want to consider restructuring the manuscript
Citation: https://doi.org/10.5194/egusphere-2023-2536-RC3 -
AC1: 'Author's Comment (AC) on egusphere-2023-2536 in Reply to Referee Comments', Jeremy Carter, 25 Mar 2024
Thank you for the review of the manuscript. We are very grateful for your careful and insightful comments, which have contributed to the improvement of the original manuscript. We have worked hard to incorporate the feedback into the revised manuscript and have detailed here our thoughts and any changes made for each comment individually in the attached 'egusphere-2023-2536_Authors_Response.pdf' document. We hope you find the response and changes satisfactory.
A revised manuscript has also been provided (egusphere-2023-2536_Revised_Manuscript.pdf), along with a LaTex-diff document highlighting changes (egusphere-2023-2536_Manuscript_Latexdiff.pdf). Due to the re-structuring of the manuscript the LaTex-diff document is difficult to follow, so a brief summary of the main changes is described in the 'egusphere-2023-2536_Authors_Response.pdf' document, along with specific replies for each comment.
Peer review completion
Journal article(s) based on this preprint
Data sets
Data used in generation of results in 'Bias Correction of Climate Models using a Bayesian Hierarchical Model' J. Carter et. al J. Carter https://doi.org/10.5281/zenodo.10053531
Model code and software
Code used in generation of results in 'Bias Correction of Climate Models using a Bayesian Hierarchical Model' J. Carter et. al J. Carter https://doi.org/10.5281/zenodo.10053653
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
405 | 117 | 29 | 551 | 22 | 20 |
- HTML: 405
- PDF: 117
- XML: 29
- Total: 551
- BibTeX: 22
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
1 citations as recorded by crossref.
Jeremy Daniel Carter
Erick Chacón-Montalván
Amber Leeson
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1021 KB) - Metadata XML