the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multivariate adjustment of drizzle bias using machine learning in European climate projections
Abstract. Precipitation holds significant importance as a climate parameter in various applications, including studies on the impacts of climate change. However, its simulation or projection accuracy is low, primarily due to its high stochasticity. Specifically, climate models often overestimate the frequency of light rainy days while simultaneously underestimating the total amounts of extreme observed precipitation. This phenomenon, known as 'drizzle bias,' specifically refers to the model's tendency to overestimate the occurrence of light precipitation events. Consequently, even though the overall precipitation totals are generally well-represented, there is often a significant bias in the number of rainy days. The present study aims to minimize the "drizzle bias" in model output by developing and applying two statistical approaches. In the first approach, the number of rainy days is adjusted based on the assumption that the relationship between observed and simulated rainy days remains the same in time (thresholding). In the second, a machine learning method (Random Forests or RF) is used for the development of a statistical model that describes the relationship between several climate (modelled) variables and the observed number of wet days. The results demonstrate that employing a multivariate approach yields results that are comparable to the conventional thresholding approach when correcting sub-periods with similar climate characteristics. However, the importance of utilizing RF becomes evident when addressing periods exhibiting extreme events, marked by a significantly distinct frequency of rainy days. These disparities are particularly pronounced when considering higher temporal resolutions. Both methods are illustrated on data from three EURO-CORDEX climate models. The two approaches are trained during a calibration period and they are applied for the selected evaluation period.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(21908 KB)
-
Supplement
(26602 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(21908 KB) - Metadata XML
-
Supplement
(26602 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-45', Anonymous Referee #1, 25 Mar 2024
This paper does a good job of taking the reader through  bias corrections for three regional climate models which overestimate the number of rainy days and underestimate the precipitation amounts compared to annual and monthly observations.  They demonstate that random forest regression outperforms thresholding for station comparisons where the model-measurement difference exceeds 5%.  The figures and discussion are clear, and I have only minor comments:
1) I had hoped for some insight into why the random forest regression succeeded when thresholding failed.  Perhaps  a paragraph or two in the discussion could cover next steps for learning which of the 5 feature set variables mattered most, and how to use that information for model improvement?
2) Â On line 152 the authors state that they used scikit-learn for the random forest model, but as far as I can see the zenodo archive contains only R code?
3) The captions for Figures 5 and 6 say that the model and rf data are red and orange, but in the figures they are green and yellow
4) For the Q-Q plot in figure 5, standard case 48 thresholding does especially poorly -- any idea why this gridcell is an outlier?
5) the column headers for Table 2 and Table 3 use different labels for the same quantities -- case 0, case  1, "standard cases", "extreme deviation case" "div > 5%" - why not just adopt the unambiguous labels of Table 3?
6) it is stated that GridSearchCV was used to establish "the optimal set of hyper-parameters". Â It would be useful to state what the hyper-parameters were, along with the feature set varaibles
Citation: https://doi.org/10.5194/egusphere-2024-45-RC1 - AC2: 'Reply on RC1', Georgia Lazoglou, 26 Apr 2024
-
CEC1: 'Comment on egusphere-2024-45', Juan Antonio Añel, 27 Mar 2024
Dear authors,
I would like to know a minor issue in your manuscript. Currently, it contains a "Data availability" section and a "Code and data availability section". This is not according to the format of the manuscripts in our journal. You can have a "Code and Data Availability" section in your manuscript with the information for both types of asset, or as recommended in the guidelines for manuscripts, a "Code availability" and a "Data availability" section. Then, all the code should be mentioned in the first one, and the data in the second one.
Please, correct this issue in any reviewed version of your manuscript.
Regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2024-45-CEC1 -
AC3: 'Reply on CEC1', Georgia Lazoglou, 26 Apr 2024
We would like to thank the Chief Editor for taking the time to comment on our paper.Â
Attached is the zipped folder containing our response to the two reviews and the reply to your comment, as well as the revised manuscript with highlighted changes.
-
AC3: 'Reply on CEC1', Georgia Lazoglou, 26 Apr 2024
-
RC2: 'Comment on egusphere-2024-45', Anonymous Referee #2, 12 Apr 2024
Hi editor,
Please feel free to check the attachment.
- AC1: 'Reply on RC2', Georgia Lazoglou, 26 Apr 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2024-45', Anonymous Referee #1, 25 Mar 2024
This paper does a good job of taking the reader through  bias corrections for three regional climate models which overestimate the number of rainy days and underestimate the precipitation amounts compared to annual and monthly observations.  They demonstate that random forest regression outperforms thresholding for station comparisons where the model-measurement difference exceeds 5%.  The figures and discussion are clear, and I have only minor comments:
1) I had hoped for some insight into why the random forest regression succeeded when thresholding failed.  Perhaps  a paragraph or two in the discussion could cover next steps for learning which of the 5 feature set variables mattered most, and how to use that information for model improvement?
2) Â On line 152 the authors state that they used scikit-learn for the random forest model, but as far as I can see the zenodo archive contains only R code?
3) The captions for Figures 5 and 6 say that the model and rf data are red and orange, but in the figures they are green and yellow
4) For the Q-Q plot in figure 5, standard case 48 thresholding does especially poorly -- any idea why this gridcell is an outlier?
5) the column headers for Table 2 and Table 3 use different labels for the same quantities -- case 0, case  1, "standard cases", "extreme deviation case" "div > 5%" - why not just adopt the unambiguous labels of Table 3?
6) it is stated that GridSearchCV was used to establish "the optimal set of hyper-parameters". Â It would be useful to state what the hyper-parameters were, along with the feature set varaibles
Citation: https://doi.org/10.5194/egusphere-2024-45-RC1 - AC2: 'Reply on RC1', Georgia Lazoglou, 26 Apr 2024
-
CEC1: 'Comment on egusphere-2024-45', Juan Antonio Añel, 27 Mar 2024
Dear authors,
I would like to know a minor issue in your manuscript. Currently, it contains a "Data availability" section and a "Code and data availability section". This is not according to the format of the manuscripts in our journal. You can have a "Code and Data Availability" section in your manuscript with the information for both types of asset, or as recommended in the guidelines for manuscripts, a "Code availability" and a "Data availability" section. Then, all the code should be mentioned in the first one, and the data in the second one.
Please, correct this issue in any reviewed version of your manuscript.
Regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2024-45-CEC1 -
AC3: 'Reply on CEC1', Georgia Lazoglou, 26 Apr 2024
We would like to thank the Chief Editor for taking the time to comment on our paper.Â
Attached is the zipped folder containing our response to the two reviews and the reply to your comment, as well as the revised manuscript with highlighted changes.
-
AC3: 'Reply on CEC1', Georgia Lazoglou, 26 Apr 2024
-
RC2: 'Comment on egusphere-2024-45', Anonymous Referee #2, 12 Apr 2024
Hi editor,
Please feel free to check the attachment.
- AC1: 'Reply on RC2', Georgia Lazoglou, 26 Apr 2024
Peer review completion
Journal article(s) based on this preprint
Data sets
Supplementary Material and Scripts for "Multivariate adjustment of drizzle bias using machine learning in European climate projections" Georgia Lazoglou https://doi.org/10.5281/zenodo.10468125
Model code and software
Supplementary Material and Scripts for "Multivariate adjustment of drizzle bias using machine learning in European climate projections" Georgia Lazoglou https://doi.org/10.5281/zenodo.10468125
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
282 | 62 | 23 | 367 | 29 | 12 | 9 |
- HTML: 282
- PDF: 62
- XML: 23
- Total: 367
- Supplement: 29
- BibTeX: 12
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Theo Economou
Christina Anagnostopoulou
George Zittis
Anna Tzyrkalli
Pantelis Georgiades
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(21908 KB) - Metadata XML
-
Supplement
(26602 KB) - BibTeX
- EndNote
- Final revised paper