the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ibicus: a new open-source Python package and comprehensive interface for statistical bias adjustment and evaluation in climate modelling (v1.0.1)
Abstract. Statistical bias adjustment is commonly applied to climate models before using their results in impact studies. However, different methods, based on a distributional mapping between observational and model data, can change the simulated trends, as well as the spatiotemporal and inter-variable consistency of the model, and are prone to misuse if not evaluated thoroughly. Despite the importance of these fundamental issues, researchers who apply bias adjustment currently do not have the tools at hand to compare different methods or evaluate the results sufficiently to detect possible distortions. Because of this, widespread practice in statistical bias adjustment is not aligned with recommendations from the academic literature. To address the practical issues impeding this, we introduce ibicus, an open-source Python package for the implementation of eight different peer-reviewed and widely used bias adjustment methods in a common framework and their comprehensive evaluation. The evaluation framework introduced in ibicus allows the user to analyse changes to the marginal, spatiotemporal and inter-variable structure of user-defined climate indices and distributional properties, as well as any alteration of the climate change trend simulated in the model. Applying ibicus in a case study over the Mediterranean region using seven CMIP6 global circulation models, this study finds that the most appropriate bias adjustment method depends on the variable and impact studied and that even methods that aim to preserve the climate change trend can modify it. These findings highlight the importance of a use-case-specific choice of method and the need for a rigorous evaluation of results when applying statistical bias adjustment.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(2762 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2762 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1481', Anonymous Referee #1, 14 Sep 2023
I find the effort of making this tool for evaluating and inter-comparing bias correction methods important and highly relevant. The paper is well structured, and although I haven’t tested the software itself, it seems from the manuscript that it is producing many relevant statistics and useful plots. The manuscript lacks important details, and I have several suggestions for how to compare the bias correction methods in my major comments, which I wish the authors to consider and make a revision.
Major comments:
I think the idea of implementing multiple methods in one software is a good idea, as is the common framework for evaluation. One major remark I have is that some components are not tied to a certain method. One example is the treatment of dry days, where the threshold and the way they are bias corrected will have impacts on certain metrics that have nothing to do with the method applied to the rest of the distribution. An example is the results for your “QM” method. I suggest that this component is detached in a way that the same treatment is applied to all methods when making the inter-comparison. The dry day corrections themselves can be assessed separately. Another example is how extremes and data outside the calibration data range are handled. This is often not properly defined for different methods, but can have large consequences for indicators based on the extreme ends. When possible, the same tail handling should be applied to all (empirical) methods. If possible to implement in your software, this would make it a very useful tool to assess and find the method best suited for a particular case of bias correction. I am not expecting you to implement this for a new revision, but please think about it and add to a discussion section on future developments.
I would like if the authors can add some information on how a user can implement their own method in Ibicus, what are the steps? Is there a guide in the software documentation etc. And why is It called ibicus?
The method called “Quantile Mapping” is not properly defined and named. This is a category of methods that includes most of the methods that are used in this paper. A more precise name is necessary, and also a detailed description of how the quantile mapping is implemented, e.g. which quantiles, how extremes are dealt with, and especially for data outside of the calibration range.
Generally, the authors need to more clearly defined how each method deals with dry days as this can have a large impact on some of the statistics. Please revise with some statements about this in the main text and in Table A1.
It is not clear what future is for which the climate trends are calculated. Is all done between periods in the historical range 1959 to 2005? I cannot find any other information about time periods, nor any information about SSP-scenarios used. Please clarify this point. It is also important to state something about the magnitude of the climate trend, as it gives some information about the signal to noise levels and whether differences between methods are significant or not.
Detailed comments:
L22: ”an empirical transfer function” this could equally well be a parametric, so please remove the word “empirical”.
L26: This sentence is a bit difficult, which has to do with the vague part “ranging from” which lists generic reports which are not throughout using bias adjustment. Please reformulate and be more precise in your statement.
L61: please remove “empirical”.
L114: “MIdAS” according to the reference.
L150: Please clarify what is meant with “optional data information for running windows”.
L153: In which contexts are these methods “most widely used”? Can you quantify this?
L157: “Quantile Mapping”. I do not think this is a good way to describe this method in contrast to the others. They are all of the quantile mapping family, and there is no single clear definition of what quantile mapping is, but it needs to be clearly defined. If you are referring to detrended quantile mapping (as in Table A1), you could use the abbreviation DQM instead.
L170: The case is a bit more complicated when threshold-based indicators are used. Then it is not possible, or wanted, to preserve the original trend.
L195: It is not clear what time periods are used, and what the “future” is for which the trends are assessed. See major comment.
L209 and 210: There are 31 and 16 years in these periods. Please check your statements.
L215: Please defined “temperature” in this sentence. Is tasmin still indended, or some other temperature measure?
L222: Bias should be near zero for the calibration period, and it would be good to know if that is the case as it is a confirmation that the implementation is correct.
L232 and Figure 2 – dry days. It is necessary to explain how dry days are handled in the different methods to understand what is happening with “QM”.
L237: I do not understand the use of the word “assimilate” here. Please reformulate of explain.
Figure 3: This plot would be more efficient with model names only at the left and method names only on the top, and larger panels. If it is a direct output of the software, you can state that as it will justify the less optimal layout.
L260: Please defined the time periods used and if any emission scenario was used. Some measure of signal to noise or significant would be good to include as well.
L268: Note no single method is attempting to preserve trends for all possible indicators, but target a single or more moments or quantiles of a distribution.
L275: Again, please defined the dry day definitions and treatment for each method as it has large impacts on the results, and shall in my opinion not be confused with the general method for the rest of the distribution.
Figure6-caption: “change in the number of dry days” right?
Table A1: last sentence in CDFt “[SSR] can be applied.” But is it applied here?
Citation: https://doi.org/10.5194/egusphere-2023-1481-RC1 - AC1: 'Reply on RC1', Jakob Wessel, 13 Nov 2023
-
CC1: 'Comment on egusphere-2023-1481', Richard Chandler, 29 Sep 2023
This paper makes a welcome contribution to the sometimes murky world of statistical bias correction, by providing a publicly available software tool that allows users easily to assess the effects / unintended consequences of different "correction" methods. It will be interesting to see whether this makes a substantial change to current practice.
I have just three comments:
- If I understand correctly, the tool allows users to assess the effect of bias correction methods on a limited number of threshold-based indices. Some of the visualisations are linked to specific metrics (e.g. the cumulative distribution functions for the spell lengths in Figures 4 and 5). Others, such as the boxplots, are completely generic however. I wonder how easy it would be to link to, say, the xclim library (https://xclim.readthedocs.io/en/stable/) which defines a whole range of other climate indices? If you could import those index definitions and provide some core visualisations - such as boxplots - for them, then ibicus would become a really powerful tool.
- Section 4.2.4 claims that bias adjustment changes the uncertainty in an ensemble. Although this kind of claim is commonly made, it is not true: the uncertainty is what it is, and it doesn't change just by massaging the data. Bias adjustment changes the variation which is a symptom of the underlying uncertainty, but that's not the same as changing the uncertainty itself! This is connected to my final point, which is ...
- Users may feel as though you've provided a tool that is specifically designed to pull the rug out from under their feet, in the sense that it will almost certainly reveal that there are problems with their chosen bias adjustment method. I am personally rather supportive of any contribution that demonstrates the problems of bias adjustment, but it would perhaps be helpful to provide some constructive suggestions for how to proceed if your software reveals major problems. One such alternative, for example, is to postprocess the entire ensemble within a statistical framework that acknowledges the discrepancies between climate models and the real world, and that aims to derive defensible uncertainty assessments for the real world on the basis of all the available information. There is a fair bit of literature on this: I have made a limited contribution myself, but other authors include Michael Goldstein, Jonathan Rougier, Phil Sansom, Christoph Buser and Claudia Tebaldi (the list goes on!).
Citation: https://doi.org/10.5194/egusphere-2023-1481-CC1 - AC3: 'Reply on CC1', Jakob Wessel, 13 Nov 2023
-
RC2: 'Comment on egusphere-2023-1481', Jorn Van de Velde, 02 Oct 2023
First, I would like to say that I’m impressed by the paper. Further professionalization and evaluation of bias adjustment is clearly necessary, and the authors make an important step forward by providing this software package. In general, the paper is clearly written and provides good examples and results of the code. However, there are still some major and minor remarks that I would like to see discussed and implemented in the paper.
General comments
- Implications of software like this. Further standardizing (or at least standardizing evaluation) becomes clearly possible through this method. This has some consequences. First, it allows for answering questions on the seemingly ‘detailed’ components of methods, such as the applied time windows (e.g. seasonally vs. 90 days vs. 60 days), number of years for calibration and evaluation, number of data points selected. When implementing a new method, these questions are often sidelined, but they could affect the final result. Not to say that they do, but at least it should be evaluated through standardized tools. Second, building on one of the comments of Anonymous Referee #1, some method components are not tied to a certain method. This might be considered to be a slightly more philosophical note, but it is possible to consider a switch from methods as ‘packages’ to methods as ‘build from a set of elements’. As elements, I consider e.g. the choice of distribution(s), the choice of dry-day treatment, the order in which steps are taken, additional post-processing steps… Software like this may thus eventually help to disentangle methods and compare their elements (and changes to these elements). Even if they were not originally implemented as such (e.g. a distribution not foreseen by the original author, or an additional post-processing step applied in another software package). According to the documentation, it seems that the way the code is set up, allows (to some extent) for this kind of experimentation. To conclude, software like this could in time change and influence how we evaluate bias adjustment methods. Could you comment on this and discuss this in your paper? That would certainly further enhance the discussion/conclusions of this paper. Or would even merit a separate discussions section, as Richard Chandler also touches upon this point in comment #3.
- To take the previous point even one step further, it would be relevant to actually review and compare existing software packages. This is seriously out of scope for this paper, but it might be relevant mentioning this need in the discussion/conclusion.
- Although the authors have taken the time to get acquainted with some of the important discussions in bias adjustment/statistical downscaling literature and touch upon a lot of subjects, I think there is still a lot of ground left to cover. If a reader interested in applying bias adjustment software starts from your paper, it should be possible to track down most of the papers discussing issues and steps forward. So far this is not always possible. In the specific comments, I have given some references related to topics discussed at specific points, which I think are all relevant to refer to in the paper.
- Additionally to reading the paper, I also did a check of the documentation and tutorials. It seems like a lot of work went into this, for which I would like to congratulate you. Given the amount of information available, I hope a lot of potential users and contributors will find, apply and contribute to your package! However, take note of the changes and additional literature suggested for the paper, and also implement them in the documentation.
Note that 1) I agree with most of the comments of the other reviews (so far posted) and would like to see them addressed properly. Only where really necessary, I repeated a comment. 2) I consider this to be minor revisions, as the software, evaluation set-up and main conclusions are coherent and scientifically sound, but reading the suggested papers might off course take some time.
Detailed comments
L19: it might be good to provide a few examples for the interested reader. See e.g. Vautard et al. (2021) or Galmarini et al. (2019) for relatively recent papers discussing respectively model biases and the impact on agriculture.
L22: I would like to stress the comment by AR#1. There are many examples of parametric transfer functions out there.
L24: many multivariate methods as well build on quantile mapping (e.g. by first applying univariate quantile mapping and then a multivariate adjustment procedure, the so-called marginal/dependence multivariate bias adjustment)
L25: with regards to multivariate methods, I had to wait until L304 and further to find clarity on why multivariate methods where not implemented here. Although I understand the choice, it should be clear from the start, given the importance of multivariate methods (e.g. in relation to compound events).
L40 and further: I could nowhere find a clarity on the implementation of the bias adjustment methods. Did you copy-paste them from existing code, implement them yourselves, or mix them? Did you compare results with the original code (whenever available) or contact the original authors to check the original code? Given that small differences in code implementation can have a potentially large impact, this has to be clear from the start (especially in a journal like GMD)
L45: Did you consult Maraun et al. (2015) on the aspect of evaluation and the validation tree? They build heavily on the dimensions you mention here, and follow this up in all papers of the VALUE experiment (see e.g. Maraun et al. (2019)). Although this experiment focuses more heavily on statistical downscaling instead of bias adjustment, the latter is also accounted for and the general principles and lessons should at least be mentioned in a paper on bias adjustment evaluation.
L50: Here, you apply the standard ‘section’ titles, whereas further in the paper, you refer to sections as ‘chapters’. I prefer the former, as it is more standard.
L69: delta change is not limited to linear scaling. It is more correct to consider delta change as a principle of philosophy, where, in contrast to bias adjustment, not the climate model output is adjusted, but historical time series are adjusted. See e.g. Olsson et al. (2009) or Willems and Vrac (2011) for papers building on this principle.
L90: given the relative importance of trend preservation in your paper and evaluation, I think this concept should be discussed more in-depth. Consider for example Ivanov et al. (2018), which do not entirely seem to agree with Maraun (2016) (which you refer to), Hagemann et al. (2011) or Casanueva et al. (2018).
Table 1 : the literature concerning the bias stationarity assumption has been growing recently. In the context of evaluation, some of these papers should be referred to explicitly. Consider e.g. Dekens et al. (2017), Christensen et al. (2008), Chen et al. (2015), Hui et al. (2019), Chen et al. (2020), Wang et al. (2018), Van de Velde et al. (2022) and references therein.
L288: There is a very relevant discussion on the issue of uncertainty in Maraun and Widmann (2018). I think it would be a proper addition to your paper.
L311: François et al. (2020) (which you refer to earlier in the paragraph) should also be referenced w.r.t. the difficulties with multivariate methods, as could Van de Velde et al. (2022)
Table A1: 1) How were the experimental settings found and defined? Could you give a more expanded explanation? 2) are the references considered to be ‘the’ references, or just ‘standard’ references. Especially for linear scaling and delta change (but also quantile mapping), much older references are also available, but are also potentially less clear on the implementation. Please clarify this. 3) Wang and Chen (2014) also further expand on ECDFM and provided the first implementation of the relative version. 4) Cannon et al. (2015) discuss that ECDFM and QDM are practically equivalent. Is this also clear from your evaluation? If not, how come? 5) QDM is at the moment one the commonly applied quantile mapping methods (especially in multivariate methods, see e.g. Mehrotra and Sharma (2016), Nguyen et al. (2016), Cannon (2018)). This could be discussed in function of your evaluation.
References: please clean up your reference section. There are too many ‘book:’ and ‘publisher:’ in there, unless this is the current style adopted by GDM
References
Cannon, A. J. (2018). Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables. Climate Dynamics, 50(1-2):31–49
Casanueva, A. et al. (2018). Direct and component-wise bias correction of multi-variate climate indices: the percentile adjustment function diagnostic tool. Climatic Change, 147(3-4):411–425.
Chen, J. Et al. (2015). Assessing the limits of bias-correcting climate model outputs for climate change impact studies. Journal of Geophysical Research: Atmospheres, 120(3):1123–1136.
Christensen, J. H. et al. (2008). On the need for bias correction of regional climate change projections of temperature and precipitation. Geophysical Research Letters, 35(20):L20709.
Dekens, L. et al. (2017). Multivariate distribution correction of climate model outputs: A generalization of quantile mapping approaches. Environmetrics, 28(6):e2454.
Galmarini et al. (2019). Adjusting climate model bias for agricultural impact assessment: How to cut the mustard. Climate Services, 13
Hagemann, S. et al. (2011). Impact of a statistical bias correction on the projected hydrological changes obtained from three GCMs and two hydrology models. Journal of Hydrometeorology, 12(4):556–578.
Hui, Y. et al. (2019). Bias nonstationarity of global climate model outputs: The role of internal climate variability and climate model sensitivity. International Journal of Climatology, 39(4):2278–2294.
Hui, Y. Et al. (2020). Impacts of bias nonstationarity of climate model outputs on hydrological simulations. Hydrology Research, 51(5):925–941
Ivanov, M. A. et al. (2018). Climate model biases and modification of the climate change signal by intensity-dependent bias correction. Journal of Climate, 31(16):6591–6610.
Maraun, D. and Widmann, M. (2018). Statistical Downscaling and Bias Correction for Climate Research. Cambridge University Press.
Maraun, D. et al. (2015). VALUE: A framework to validate downscaling approaches for climate change studies. Earth’s Future, 3(1):1–14.
Maraun, D. et al. (2019). Statistical downscaling skill under present climate conditions: A synthesis of the VALUE perfect predictor experiment. International Journal of Climatology, 39(9):3692–3703.
Mehrotra, R. and Sharma, A. (2016). A multivariate quantile-matching bias correction approach with auto- and cross-dependence across multiple time scales: Implications for downscaling. Journal of Climate, 29(10):3519–3539.
Nguyen, H. et al. (2016). Correcting for systematic biases in GCM simulations in the frequency domain. Journal of Hydrology, 538:117–126.
Olsson, J. et al. (2009). Applying climate model precipitation scenarios for urban hydrological assessment: A case study in Kalmar City, Sweden. Atmospheric Research, 92(3):364–375.
Van de Velde, J. et al (2022). Impact of bias nonstationarity on the performance of uni- and multivariate bias-adjusting methods: a case study on data from Uccle, Belgium. Hydrology and Earth System Sciences 26 (9), 2319-2344
Note that this is one of my own papers. I think a reference fits in your paper, but please read it and apply your own judgement.
Vautard et al. (2021). Evaluation of the large EURO-CORDEX regional climate model ensemble. Journal of Geophysical Research: Atmospheres, 126(17).
Wang, L. and Chen, W. (2014). Equiratio cumulative distribution function matching as an improvement to the equidistant approach in bias correction of precipitation. Atmospheric Science Letters, 15(1):1–6.
Wang, Y. et al. (2018). The stationarity of two statistical downscaling methods for precipitation under different choices of cross-validation periods. International Journal of Climatology, 38:e330–e348.
Willems, P. and Vrac, M. (2011). Statistical precipitation downscaling for small-scale hydrological impact investigations of climate change. Journal of Hydrology, 402(3-4):193–205.
Citation: https://doi.org/10.5194/egusphere-2023-1481-RC2 - AC2: 'Reply on RC2', Jakob Wessel, 13 Nov 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1481', Anonymous Referee #1, 14 Sep 2023
I find the effort of making this tool for evaluating and inter-comparing bias correction methods important and highly relevant. The paper is well structured, and although I haven’t tested the software itself, it seems from the manuscript that it is producing many relevant statistics and useful plots. The manuscript lacks important details, and I have several suggestions for how to compare the bias correction methods in my major comments, which I wish the authors to consider and make a revision.
Major comments:
I think the idea of implementing multiple methods in one software is a good idea, as is the common framework for evaluation. One major remark I have is that some components are not tied to a certain method. One example is the treatment of dry days, where the threshold and the way they are bias corrected will have impacts on certain metrics that have nothing to do with the method applied to the rest of the distribution. An example is the results for your “QM” method. I suggest that this component is detached in a way that the same treatment is applied to all methods when making the inter-comparison. The dry day corrections themselves can be assessed separately. Another example is how extremes and data outside the calibration data range are handled. This is often not properly defined for different methods, but can have large consequences for indicators based on the extreme ends. When possible, the same tail handling should be applied to all (empirical) methods. If possible to implement in your software, this would make it a very useful tool to assess and find the method best suited for a particular case of bias correction. I am not expecting you to implement this for a new revision, but please think about it and add to a discussion section on future developments.
I would like if the authors can add some information on how a user can implement their own method in Ibicus, what are the steps? Is there a guide in the software documentation etc. And why is It called ibicus?
The method called “Quantile Mapping” is not properly defined and named. This is a category of methods that includes most of the methods that are used in this paper. A more precise name is necessary, and also a detailed description of how the quantile mapping is implemented, e.g. which quantiles, how extremes are dealt with, and especially for data outside of the calibration range.
Generally, the authors need to more clearly defined how each method deals with dry days as this can have a large impact on some of the statistics. Please revise with some statements about this in the main text and in Table A1.
It is not clear what future is for which the climate trends are calculated. Is all done between periods in the historical range 1959 to 2005? I cannot find any other information about time periods, nor any information about SSP-scenarios used. Please clarify this point. It is also important to state something about the magnitude of the climate trend, as it gives some information about the signal to noise levels and whether differences between methods are significant or not.
Detailed comments:
L22: ”an empirical transfer function” this could equally well be a parametric, so please remove the word “empirical”.
L26: This sentence is a bit difficult, which has to do with the vague part “ranging from” which lists generic reports which are not throughout using bias adjustment. Please reformulate and be more precise in your statement.
L61: please remove “empirical”.
L114: “MIdAS” according to the reference.
L150: Please clarify what is meant with “optional data information for running windows”.
L153: In which contexts are these methods “most widely used”? Can you quantify this?
L157: “Quantile Mapping”. I do not think this is a good way to describe this method in contrast to the others. They are all of the quantile mapping family, and there is no single clear definition of what quantile mapping is, but it needs to be clearly defined. If you are referring to detrended quantile mapping (as in Table A1), you could use the abbreviation DQM instead.
L170: The case is a bit more complicated when threshold-based indicators are used. Then it is not possible, or wanted, to preserve the original trend.
L195: It is not clear what time periods are used, and what the “future” is for which the trends are assessed. See major comment.
L209 and 210: There are 31 and 16 years in these periods. Please check your statements.
L215: Please defined “temperature” in this sentence. Is tasmin still indended, or some other temperature measure?
L222: Bias should be near zero for the calibration period, and it would be good to know if that is the case as it is a confirmation that the implementation is correct.
L232 and Figure 2 – dry days. It is necessary to explain how dry days are handled in the different methods to understand what is happening with “QM”.
L237: I do not understand the use of the word “assimilate” here. Please reformulate of explain.
Figure 3: This plot would be more efficient with model names only at the left and method names only on the top, and larger panels. If it is a direct output of the software, you can state that as it will justify the less optimal layout.
L260: Please defined the time periods used and if any emission scenario was used. Some measure of signal to noise or significant would be good to include as well.
L268: Note no single method is attempting to preserve trends for all possible indicators, but target a single or more moments or quantiles of a distribution.
L275: Again, please defined the dry day definitions and treatment for each method as it has large impacts on the results, and shall in my opinion not be confused with the general method for the rest of the distribution.
Figure6-caption: “change in the number of dry days” right?
Table A1: last sentence in CDFt “[SSR] can be applied.” But is it applied here?
Citation: https://doi.org/10.5194/egusphere-2023-1481-RC1 - AC1: 'Reply on RC1', Jakob Wessel, 13 Nov 2023
-
CC1: 'Comment on egusphere-2023-1481', Richard Chandler, 29 Sep 2023
This paper makes a welcome contribution to the sometimes murky world of statistical bias correction, by providing a publicly available software tool that allows users easily to assess the effects / unintended consequences of different "correction" methods. It will be interesting to see whether this makes a substantial change to current practice.
I have just three comments:
- If I understand correctly, the tool allows users to assess the effect of bias correction methods on a limited number of threshold-based indices. Some of the visualisations are linked to specific metrics (e.g. the cumulative distribution functions for the spell lengths in Figures 4 and 5). Others, such as the boxplots, are completely generic however. I wonder how easy it would be to link to, say, the xclim library (https://xclim.readthedocs.io/en/stable/) which defines a whole range of other climate indices? If you could import those index definitions and provide some core visualisations - such as boxplots - for them, then ibicus would become a really powerful tool.
- Section 4.2.4 claims that bias adjustment changes the uncertainty in an ensemble. Although this kind of claim is commonly made, it is not true: the uncertainty is what it is, and it doesn't change just by massaging the data. Bias adjustment changes the variation which is a symptom of the underlying uncertainty, but that's not the same as changing the uncertainty itself! This is connected to my final point, which is ...
- Users may feel as though you've provided a tool that is specifically designed to pull the rug out from under their feet, in the sense that it will almost certainly reveal that there are problems with their chosen bias adjustment method. I am personally rather supportive of any contribution that demonstrates the problems of bias adjustment, but it would perhaps be helpful to provide some constructive suggestions for how to proceed if your software reveals major problems. One such alternative, for example, is to postprocess the entire ensemble within a statistical framework that acknowledges the discrepancies between climate models and the real world, and that aims to derive defensible uncertainty assessments for the real world on the basis of all the available information. There is a fair bit of literature on this: I have made a limited contribution myself, but other authors include Michael Goldstein, Jonathan Rougier, Phil Sansom, Christoph Buser and Claudia Tebaldi (the list goes on!).
Citation: https://doi.org/10.5194/egusphere-2023-1481-CC1 - AC3: 'Reply on CC1', Jakob Wessel, 13 Nov 2023
-
RC2: 'Comment on egusphere-2023-1481', Jorn Van de Velde, 02 Oct 2023
First, I would like to say that I’m impressed by the paper. Further professionalization and evaluation of bias adjustment is clearly necessary, and the authors make an important step forward by providing this software package. In general, the paper is clearly written and provides good examples and results of the code. However, there are still some major and minor remarks that I would like to see discussed and implemented in the paper.
General comments
- Implications of software like this. Further standardizing (or at least standardizing evaluation) becomes clearly possible through this method. This has some consequences. First, it allows for answering questions on the seemingly ‘detailed’ components of methods, such as the applied time windows (e.g. seasonally vs. 90 days vs. 60 days), number of years for calibration and evaluation, number of data points selected. When implementing a new method, these questions are often sidelined, but they could affect the final result. Not to say that they do, but at least it should be evaluated through standardized tools. Second, building on one of the comments of Anonymous Referee #1, some method components are not tied to a certain method. This might be considered to be a slightly more philosophical note, but it is possible to consider a switch from methods as ‘packages’ to methods as ‘build from a set of elements’. As elements, I consider e.g. the choice of distribution(s), the choice of dry-day treatment, the order in which steps are taken, additional post-processing steps… Software like this may thus eventually help to disentangle methods and compare their elements (and changes to these elements). Even if they were not originally implemented as such (e.g. a distribution not foreseen by the original author, or an additional post-processing step applied in another software package). According to the documentation, it seems that the way the code is set up, allows (to some extent) for this kind of experimentation. To conclude, software like this could in time change and influence how we evaluate bias adjustment methods. Could you comment on this and discuss this in your paper? That would certainly further enhance the discussion/conclusions of this paper. Or would even merit a separate discussions section, as Richard Chandler also touches upon this point in comment #3.
- To take the previous point even one step further, it would be relevant to actually review and compare existing software packages. This is seriously out of scope for this paper, but it might be relevant mentioning this need in the discussion/conclusion.
- Although the authors have taken the time to get acquainted with some of the important discussions in bias adjustment/statistical downscaling literature and touch upon a lot of subjects, I think there is still a lot of ground left to cover. If a reader interested in applying bias adjustment software starts from your paper, it should be possible to track down most of the papers discussing issues and steps forward. So far this is not always possible. In the specific comments, I have given some references related to topics discussed at specific points, which I think are all relevant to refer to in the paper.
- Additionally to reading the paper, I also did a check of the documentation and tutorials. It seems like a lot of work went into this, for which I would like to congratulate you. Given the amount of information available, I hope a lot of potential users and contributors will find, apply and contribute to your package! However, take note of the changes and additional literature suggested for the paper, and also implement them in the documentation.
Note that 1) I agree with most of the comments of the other reviews (so far posted) and would like to see them addressed properly. Only where really necessary, I repeated a comment. 2) I consider this to be minor revisions, as the software, evaluation set-up and main conclusions are coherent and scientifically sound, but reading the suggested papers might off course take some time.
Detailed comments
L19: it might be good to provide a few examples for the interested reader. See e.g. Vautard et al. (2021) or Galmarini et al. (2019) for relatively recent papers discussing respectively model biases and the impact on agriculture.
L22: I would like to stress the comment by AR#1. There are many examples of parametric transfer functions out there.
L24: many multivariate methods as well build on quantile mapping (e.g. by first applying univariate quantile mapping and then a multivariate adjustment procedure, the so-called marginal/dependence multivariate bias adjustment)
L25: with regards to multivariate methods, I had to wait until L304 and further to find clarity on why multivariate methods where not implemented here. Although I understand the choice, it should be clear from the start, given the importance of multivariate methods (e.g. in relation to compound events).
L40 and further: I could nowhere find a clarity on the implementation of the bias adjustment methods. Did you copy-paste them from existing code, implement them yourselves, or mix them? Did you compare results with the original code (whenever available) or contact the original authors to check the original code? Given that small differences in code implementation can have a potentially large impact, this has to be clear from the start (especially in a journal like GMD)
L45: Did you consult Maraun et al. (2015) on the aspect of evaluation and the validation tree? They build heavily on the dimensions you mention here, and follow this up in all papers of the VALUE experiment (see e.g. Maraun et al. (2019)). Although this experiment focuses more heavily on statistical downscaling instead of bias adjustment, the latter is also accounted for and the general principles and lessons should at least be mentioned in a paper on bias adjustment evaluation.
L50: Here, you apply the standard ‘section’ titles, whereas further in the paper, you refer to sections as ‘chapters’. I prefer the former, as it is more standard.
L69: delta change is not limited to linear scaling. It is more correct to consider delta change as a principle of philosophy, where, in contrast to bias adjustment, not the climate model output is adjusted, but historical time series are adjusted. See e.g. Olsson et al. (2009) or Willems and Vrac (2011) for papers building on this principle.
L90: given the relative importance of trend preservation in your paper and evaluation, I think this concept should be discussed more in-depth. Consider for example Ivanov et al. (2018), which do not entirely seem to agree with Maraun (2016) (which you refer to), Hagemann et al. (2011) or Casanueva et al. (2018).
Table 1 : the literature concerning the bias stationarity assumption has been growing recently. In the context of evaluation, some of these papers should be referred to explicitly. Consider e.g. Dekens et al. (2017), Christensen et al. (2008), Chen et al. (2015), Hui et al. (2019), Chen et al. (2020), Wang et al. (2018), Van de Velde et al. (2022) and references therein.
L288: There is a very relevant discussion on the issue of uncertainty in Maraun and Widmann (2018). I think it would be a proper addition to your paper.
L311: François et al. (2020) (which you refer to earlier in the paragraph) should also be referenced w.r.t. the difficulties with multivariate methods, as could Van de Velde et al. (2022)
Table A1: 1) How were the experimental settings found and defined? Could you give a more expanded explanation? 2) are the references considered to be ‘the’ references, or just ‘standard’ references. Especially for linear scaling and delta change (but also quantile mapping), much older references are also available, but are also potentially less clear on the implementation. Please clarify this. 3) Wang and Chen (2014) also further expand on ECDFM and provided the first implementation of the relative version. 4) Cannon et al. (2015) discuss that ECDFM and QDM are practically equivalent. Is this also clear from your evaluation? If not, how come? 5) QDM is at the moment one the commonly applied quantile mapping methods (especially in multivariate methods, see e.g. Mehrotra and Sharma (2016), Nguyen et al. (2016), Cannon (2018)). This could be discussed in function of your evaluation.
References: please clean up your reference section. There are too many ‘book:’ and ‘publisher:’ in there, unless this is the current style adopted by GDM
References
Cannon, A. J. (2018). Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables. Climate Dynamics, 50(1-2):31–49
Casanueva, A. et al. (2018). Direct and component-wise bias correction of multi-variate climate indices: the percentile adjustment function diagnostic tool. Climatic Change, 147(3-4):411–425.
Chen, J. Et al. (2015). Assessing the limits of bias-correcting climate model outputs for climate change impact studies. Journal of Geophysical Research: Atmospheres, 120(3):1123–1136.
Christensen, J. H. et al. (2008). On the need for bias correction of regional climate change projections of temperature and precipitation. Geophysical Research Letters, 35(20):L20709.
Dekens, L. et al. (2017). Multivariate distribution correction of climate model outputs: A generalization of quantile mapping approaches. Environmetrics, 28(6):e2454.
Galmarini et al. (2019). Adjusting climate model bias for agricultural impact assessment: How to cut the mustard. Climate Services, 13
Hagemann, S. et al. (2011). Impact of a statistical bias correction on the projected hydrological changes obtained from three GCMs and two hydrology models. Journal of Hydrometeorology, 12(4):556–578.
Hui, Y. et al. (2019). Bias nonstationarity of global climate model outputs: The role of internal climate variability and climate model sensitivity. International Journal of Climatology, 39(4):2278–2294.
Hui, Y. Et al. (2020). Impacts of bias nonstationarity of climate model outputs on hydrological simulations. Hydrology Research, 51(5):925–941
Ivanov, M. A. et al. (2018). Climate model biases and modification of the climate change signal by intensity-dependent bias correction. Journal of Climate, 31(16):6591–6610.
Maraun, D. and Widmann, M. (2018). Statistical Downscaling and Bias Correction for Climate Research. Cambridge University Press.
Maraun, D. et al. (2015). VALUE: A framework to validate downscaling approaches for climate change studies. Earth’s Future, 3(1):1–14.
Maraun, D. et al. (2019). Statistical downscaling skill under present climate conditions: A synthesis of the VALUE perfect predictor experiment. International Journal of Climatology, 39(9):3692–3703.
Mehrotra, R. and Sharma, A. (2016). A multivariate quantile-matching bias correction approach with auto- and cross-dependence across multiple time scales: Implications for downscaling. Journal of Climate, 29(10):3519–3539.
Nguyen, H. et al. (2016). Correcting for systematic biases in GCM simulations in the frequency domain. Journal of Hydrology, 538:117–126.
Olsson, J. et al. (2009). Applying climate model precipitation scenarios for urban hydrological assessment: A case study in Kalmar City, Sweden. Atmospheric Research, 92(3):364–375.
Van de Velde, J. et al (2022). Impact of bias nonstationarity on the performance of uni- and multivariate bias-adjusting methods: a case study on data from Uccle, Belgium. Hydrology and Earth System Sciences 26 (9), 2319-2344
Note that this is one of my own papers. I think a reference fits in your paper, but please read it and apply your own judgement.
Vautard et al. (2021). Evaluation of the large EURO-CORDEX regional climate model ensemble. Journal of Geophysical Research: Atmospheres, 126(17).
Wang, L. and Chen, W. (2014). Equiratio cumulative distribution function matching as an improvement to the equidistant approach in bias correction of precipitation. Atmospheric Science Letters, 15(1):1–6.
Wang, Y. et al. (2018). The stationarity of two statistical downscaling methods for precipitation under different choices of cross-validation periods. International Journal of Climatology, 38:e330–e348.
Willems, P. and Vrac, M. (2011). Statistical precipitation downscaling for small-scale hydrological impact investigations of climate change. Journal of Hydrology, 402(3-4):193–205.
Citation: https://doi.org/10.5194/egusphere-2023-1481-RC2 - AC2: 'Reply on RC2', Jakob Wessel, 13 Nov 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
462 | 220 | 24 | 706 | 14 | 11 |
- HTML: 462
- PDF: 220
- XML: 24
- Total: 706
- BibTeX: 14
- EndNote: 11
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
Fiona Raphaela Spuler
Jakob Benjamin Wessel
Edward Comyn-Platt
James Varndell
Chiara Cagnazzo
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(2762 KB) - Metadata XML