Evaluating a hierarchy of bias correction methods for ERA5-Land SWE in northern Canada

Kanda, Neha; Fletcher, Christopher G.

doi:https://doi.org/10.5194/egusphere-2024-639

Preprints

https://doi.org/10.5194/egusphere-2024-639

Preprints

27 Mar 2024

| 27 Mar 2024

Evaluating a hierarchy of bias correction methods for ERA5-Land SWE in northern Canada

Neha Kanda and Christopher G. Fletcher

Abstract. Precise estimates of Snow Water Equivalent (SWE) are crucial for informed decision-making in regions like Northern Canada, where snow cover significantly contributes to springtime discharge. However, the sparse nature of the existing SWE monitoring network poses a challenge to comprehensively understanding the SWE distribution and variability. Reanalysis products like ERA5-Land provide long-term continuous SWE estimates, but our evaluation identified a negative bias (-61 mm) in the estimated SWE and maximum underestimation was observed at high elevation (>1500 m) areas. To correct these biases, we applied four correction methods: Mean Bias Subtraction (MBS), Simple Linear Regression (SLR), Multiple Linear Regression (MLR), and Random Forest (RF). RF exhibited the highest performance, reducing the Root Mean Square Error (RMSE) by 78 % and minimizing the annual mean bias from 61.2 mm to 0.01 mm. However, RF did not produce reliable SWE estimates for unseen spatial and temporal domains due to its limitation of not extrapolating beyond the training data.

Received: 04 Mar 2024 – Discussion started: 27 Mar 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Neha Kanda and Christopher G. Fletcher

Status: closed

RC1:
'Comment on egusphere-2024-639', Anonymous Referee #1, 02 Apr 2024
The manuscript tests three bias correction methods of increasing complexity to bias-correct ERA5-Land over Canada north of 49°N. The latter two methods incorporate various predictors related to physiography and time. The authors use CanSWE as the in situ reference. In my opinion, the manuscript is not ready for peer-review. The text contains numerous typographical errors, colloquialisms, language and grammatical errors that could be corrected with a careful read-through by all co-authors. There are several factual errors, and the paper lacks any real physical basis for parameter selection and interpretation of results. The work itself is linked to prior work of King et al. 2020 who investigated various methods to bias correct SNODAS SWE. Please state more clearly if and how these two works are related and whether or how your work builds upon the previous study. I only skimmed the results and discussion because the paper, in its current form, is not ready for review. The manuscript provides limited new information. The analysis and interpretations are inadequate compared to what is typically published in The Cryosphere.
The authors provide some limited interpretations from a model fitting perspective. However, I was looking for more insightful interpretations of results that link back to SWE and potential deficiencies in the ERA5L product being corrected. For example, if slope, aspect, and NDVI were found to be least important predictors in the MDI model. Are there any physical explanations for this or is it simply a function of the model? The authors should provide a brief rationale for parameter selection in terms of SWE and the physical processes that govern it (and how this might link to the ERA5L product).
Some specific comments
Please provide links to the data used.

Data is plural – i.e. data were used

Add space before in-text citation

Review use of capitalization

L17: you provided the full term ‘Snow Water Equivalent (SWE)’ two sentences prior.
L18-19: Incorrect, snow surveys are not single point measurements. The references you cite discuss this.
L 54: ‘approximately’ not ‘approx’
L69 – SD in m w.e. is SWE. Similarly, revise the description of geopotential height – just state you used the geopotential height variable.
L78: What do you mean ‘specific to each product’? Aren’t you only evaluating 1 product? Or do you mean ERA5L and the reference data?
L85 – there is an extra period
L82 – missing period at end of sentence.
L162-164 – you should consider whether there are more appropriate ways to use the data to reduce these so-called ‘sampling issues’.
Data and methods
How did you decide on your spatial domain? I have never seen ‘Northern Canada’ defined as north of 49°N. Why not just analyze all of Canada? If you are truly interested in ‘Northern Canada’ maybe use a more stringent minimum latitude.
Consider that your reference dataset includes 3 different types of measurements which have different temporal sampling frequencies. In CanSWE, observations for the two automated methods (GMON and snow pillow) are provided at a daily frequency whereas the manual snow surveys vary from weekly to once per year. You provided several supplemental figures which indirectly describe this, but you did not address it in your experimental setup.
Is 1 year a stringent enough threshold given that some sites might only have 1 observation per year?
How were the predictors selected? What was the background rationale or physical basis used to identify the predictors?
Unclear how exactly the NDVI data were used. Specifically, did you use a static layer derived from the 16-day composites or temporally varying (16-day composites)? Did you restrict it to the snow-covered period? Did you consider any of the MODIS quality flags?
Review and cite Pulliainen et al. 2020 which uses snow survey data to bias-correct the GlobSnow product at monthly scales. https://doi.org/10.1038/s41586-020-2258-0
Figure 1 – either remove the group title for elevation zones or add one for the ecoregions.
2.3 – subsection title – use and in place of ‘/’ , consider revising the subsection title
You find that longitude is important but not NDVI. Could longitude be serving as a proxy for land cover or snow type (e.g. snow class)?
Citation: https://doi.org/10.5194/egusphere-2024-639-RC1
RC2: 'Comment on egusphere-2024-639', Anonymous Referee #2, 26 Apr 2024

Comments on “Evaluating a hierarchy of bias correction methods for ERA5-Land SWE in northern Canada”
This paper compares the performance of four different bias correction methods for ERA5 land SWE estimates. The paper evaluates mean bias subtraction, single and multiple linear regression, and random forest model. Random forest is found to have the best performance, but it has limitations extrapolating beyond training data. The topic of bias-correcting SWE estimates is relevant and a good option for improving SWE data.
However, in my opinion, this paper is not ready or nearly ready for publication. The text contains multiple grammatical errors and formatting could be improved. For example, spaces are missing in multiple places before a source is given. Also, a new section of text should have an empty line before it or be indented. The paper would benefit from careful proofreading that would make the text easier to follow. Additionally, this manuscript seems to be quite similar to the paper by King et al. (2020), mentioned in the paper, with location of analysis being the only obvious difference between the two papers. Differences between these two papers should be discussed and any new contributions of this paper should be highlighted.

Below are a few specific comments.
L22 ’Remotely sensed’ should not be capitalized.
Methods
This section needs some clarifications on how the actual bias corrections are done.
Equation 1 shows how mean bias is calculated not how the average difference is subtracted from the model as the text claims. Also, as the authors point out, ERA5L performs differently in different areas, so is it justified to use one mean bias value to perform the bias correction everywhere? Would it not be better to calculate different mean bias values, for example, for different elevations?
How is the single/multiple linear regression model used for bias correction? This is not explained at all in the text.
The basics of the random forest model are somewhat explained but again few sentences on how the actual bias correction is done would be good.
L147 Using just one letter to describe each month is a bit confusing; consider writing out the month names. Also, spring should not be capitalized.

Citation: https://doi.org/10.5194/egusphere-2024-639-RC2

Status: closed

RC1:
'Comment on egusphere-2024-639', Anonymous Referee #1, 02 Apr 2024
The manuscript tests three bias correction methods of increasing complexity to bias-correct ERA5-Land over Canada north of 49°N. The latter two methods incorporate various predictors related to physiography and time. The authors use CanSWE as the in situ reference. In my opinion, the manuscript is not ready for peer-review. The text contains numerous typographical errors, colloquialisms, language and grammatical errors that could be corrected with a careful read-through by all co-authors. There are several factual errors, and the paper lacks any real physical basis for parameter selection and interpretation of results. The work itself is linked to prior work of King et al. 2020 who investigated various methods to bias correct SNODAS SWE. Please state more clearly if and how these two works are related and whether or how your work builds upon the previous study. I only skimmed the results and discussion because the paper, in its current form, is not ready for review. The manuscript provides limited new information. The analysis and interpretations are inadequate compared to what is typically published in The Cryosphere.
The authors provide some limited interpretations from a model fitting perspective. However, I was looking for more insightful interpretations of results that link back to SWE and potential deficiencies in the ERA5L product being corrected. For example, if slope, aspect, and NDVI were found to be least important predictors in the MDI model. Are there any physical explanations for this or is it simply a function of the model? The authors should provide a brief rationale for parameter selection in terms of SWE and the physical processes that govern it (and how this might link to the ERA5L product).
Some specific comments
Please provide links to the data used.

Data is plural – i.e. data were used

Add space before in-text citation

Review use of capitalization

L17: you provided the full term ‘Snow Water Equivalent (SWE)’ two sentences prior.
L18-19: Incorrect, snow surveys are not single point measurements. The references you cite discuss this.
L 54: ‘approximately’ not ‘approx’
L69 – SD in m w.e. is SWE. Similarly, revise the description of geopotential height – just state you used the geopotential height variable.
L78: What do you mean ‘specific to each product’? Aren’t you only evaluating 1 product? Or do you mean ERA5L and the reference data?
L85 – there is an extra period
L82 – missing period at end of sentence.
L162-164 – you should consider whether there are more appropriate ways to use the data to reduce these so-called ‘sampling issues’.
Data and methods
How did you decide on your spatial domain? I have never seen ‘Northern Canada’ defined as north of 49°N. Why not just analyze all of Canada? If you are truly interested in ‘Northern Canada’ maybe use a more stringent minimum latitude.
Consider that your reference dataset includes 3 different types of measurements which have different temporal sampling frequencies. In CanSWE, observations for the two automated methods (GMON and snow pillow) are provided at a daily frequency whereas the manual snow surveys vary from weekly to once per year. You provided several supplemental figures which indirectly describe this, but you did not address it in your experimental setup.
Is 1 year a stringent enough threshold given that some sites might only have 1 observation per year?
How were the predictors selected? What was the background rationale or physical basis used to identify the predictors?
Unclear how exactly the NDVI data were used. Specifically, did you use a static layer derived from the 16-day composites or temporally varying (16-day composites)? Did you restrict it to the snow-covered period? Did you consider any of the MODIS quality flags?
Review and cite Pulliainen et al. 2020 which uses snow survey data to bias-correct the GlobSnow product at monthly scales. https://doi.org/10.1038/s41586-020-2258-0
Figure 1 – either remove the group title for elevation zones or add one for the ecoregions.
2.3 – subsection title – use and in place of ‘/’ , consider revising the subsection title
You find that longitude is important but not NDVI. Could longitude be serving as a proxy for land cover or snow type (e.g. snow class)?
Citation: https://doi.org/10.5194/egusphere-2024-639-RC1
RC2: 'Comment on egusphere-2024-639', Anonymous Referee #2, 26 Apr 2024

Comments on “Evaluating a hierarchy of bias correction methods for ERA5-Land SWE in northern Canada”
This paper compares the performance of four different bias correction methods for ERA5 land SWE estimates. The paper evaluates mean bias subtraction, single and multiple linear regression, and random forest model. Random forest is found to have the best performance, but it has limitations extrapolating beyond training data. The topic of bias-correcting SWE estimates is relevant and a good option for improving SWE data.
However, in my opinion, this paper is not ready or nearly ready for publication. The text contains multiple grammatical errors and formatting could be improved. For example, spaces are missing in multiple places before a source is given. Also, a new section of text should have an empty line before it or be indented. The paper would benefit from careful proofreading that would make the text easier to follow. Additionally, this manuscript seems to be quite similar to the paper by King et al. (2020), mentioned in the paper, with location of analysis being the only obvious difference between the two papers. Differences between these two papers should be discussed and any new contributions of this paper should be highlighted.

Below are a few specific comments.
L22 ’Remotely sensed’ should not be capitalized.
Methods
This section needs some clarifications on how the actual bias corrections are done.
Equation 1 shows how mean bias is calculated not how the average difference is subtracted from the model as the text claims. Also, as the authors point out, ERA5L performs differently in different areas, so is it justified to use one mean bias value to perform the bias correction everywhere? Would it not be better to calculate different mean bias values, for example, for different elevations?
How is the single/multiple linear regression model used for bias correction? This is not explained at all in the text.
The basics of the random forest model are somewhat explained but again few sentences on how the actual bias correction is done would be good.
L147 Using just one letter to describe each month is a bit confusing; consider writing out the month names. Also, spring should not be capitalized.

Citation: https://doi.org/10.5194/egusphere-2024-639-RC2

Neha Kanda and Christopher G. Fletcher

Viewed

Total article views: 911 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
538	336	37	911	38	56

HTML: 538
PDF: 336
XML: 37
Total: 911
BibTeX: 38
EndNote: 56

Views and downloads (calculated since 27 Mar 2024)

Month	HTML	PDF	XML	Total
Mar 2024	30	8	3	41
Apr 2024	84	27	9	120
May 2024	34	13	2	49
Jun 2024	10	10	5	25
Jul 2024	19	9	3	31
Aug 2024	19	3	3	25
Sep 2024	20	6	0	26
Oct 2024	13	13	0	26
Nov 2024	8	6	0	14
Dec 2024	10	16	0	26
Jan 2025	8	6	0	14
Feb 2025	15	9	2	26
Mar 2025	14	12	0	26
Apr 2025	15	15	1	31
May 2025	9	7	1	17
Jun 2025	22	56	0	78
Jul 2025	24	32	1	57
Aug 2025	47	23	4	74
Sep 2025	114	27	2	143
Oct 2025	23	38	1	62

Cumulative views and downloads (calculated since 27 Mar 2024)

Month	HTML	PDF	XML	Total
Mar 2024	30	8	3	41
Apr 2024	84	27	9	120
May 2024	34	13	2	49
Jun 2024	10	10	5	25
Jul 2024	19	9	3	31
Aug 2024	19	3	3	25
Sep 2024	20	6	0	26
Oct 2024	13	13	0	26
Nov 2024	8	6	0	14
Dec 2024	10	16	0	26
Jan 2025	8	6	0	14
Feb 2025	15	9	2	26
Mar 2025	14	12	0	26
Apr 2025	15	15	1	31
May 2025	9	7	1	17
Jun 2025	22	56	0	78
Jul 2025	24	32	1	57
Aug 2025	47	23	4	74
Sep 2025	114	27	2	143
Oct 2025	23	38	1	62

Viewed (geographical distribution)

Total article views: 926 (including HTML, PDF, and XML) Thereof 926 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 28 Oct 2025

Short summary

For improved water management in snow-dominated regions like Northern Canada, accurate estimates of Snow Water Equivalent (SWE), a metric that quantifies the water in a snowpack are crucial. Our study aims to improve the SWE estimates which were found to be underestimated, particularly in the mountains. We tested four correction techniques and found Random Forest (RF) to be the most effective technique that significantly reduced the errors.


Total:	0
HTML:	0
PDF:	0
XML:	0