the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Evaluating a hierarchy of bias correction methods for ERA5-Land SWE in northern Canada
Abstract. Precise estimates of Snow Water Equivalent (SWE) are crucial for informed decision-making in regions like Northern Canada, where snow cover significantly contributes to springtime discharge. However, the sparse nature of the existing SWE monitoring network poses a challenge to comprehensively understanding the SWE distribution and variability. Reanalysis products like ERA5-Land provide long-term continuous SWE estimates, but our evaluation identified a negative bias (-61 mm) in the estimated SWE and maximum underestimation was observed at high elevation (>1500 m) areas. To correct these biases, we applied four correction methods: Mean Bias Subtraction (MBS), Simple Linear Regression (SLR), Multiple Linear Regression (MLR), and Random Forest (RF). RF exhibited the highest performance, reducing the Root Mean Square Error (RMSE) by 78 % and minimizing the annual mean bias from 61.2 mm to 0.01 mm. However, RF did not produce reliable SWE estimates for unseen spatial and temporal domains due to its limitation of not extrapolating beyond the training data.
- Preprint
(18819 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2024-639', Anonymous Referee #1, 02 Apr 2024
The manuscript tests three bias correction methods of increasing complexity to bias-correct ERA5-Land over Canada north of 49°N. The latter two methods incorporate various predictors related to physiography and time. The authors use CanSWE as the in situ reference. In my opinion, the manuscript is not ready for peer-review. The text contains numerous typographical errors, colloquialisms, language and grammatical errors that could be corrected with a careful read-through by all co-authors. There are several factual errors, and the paper lacks any real physical basis for parameter selection and interpretation of results. The work itself is linked to prior work of King et al. 2020 who investigated various methods to bias correct SNODAS SWE. Please state more clearly if and how these two works are related and whether or how your work builds upon the previous study. I only skimmed the results and discussion because the paper, in its current form, is not ready for review. The manuscript provides limited new information. The analysis and interpretations are inadequate compared to what is typically published in The Cryosphere.
The authors provide some limited interpretations from a model fitting perspective. However, I was looking for more insightful interpretations of results that link back to SWE and potential deficiencies in the ERA5L product being corrected. For example, if slope, aspect, and NDVI were found to be least important predictors in the MDI model. Are there any physical explanations for this or is it simply a function of the model? The authors should provide a brief rationale for parameter selection in terms of SWE and the physical processes that govern it (and how this might link to the ERA5L product).
Some specific comments
- Please provide links to the data used.
- Data is plural – i.e. data were used
- Add space before in-text citation
- Review use of capitalization
L17: you provided the full term ‘Snow Water Equivalent (SWE)’ two sentences prior.
L18-19: Incorrect, snow surveys are not single point measurements. The references you cite discuss this.
L 54: ‘approximately’ not ‘approx’
L69 – SD in m w.e. is SWE. Similarly, revise the description of geopotential height – just state you used the geopotential height variable.
L78: What do you mean ‘specific to each product’? Aren’t you only evaluating 1 product? Or do you mean ERA5L and the reference data?
L85 – there is an extra period
L82 – missing period at end of sentence.
L162-164 – you should consider whether there are more appropriate ways to use the data to reduce these so-called ‘sampling issues’.
Data and methods
How did you decide on your spatial domain? I have never seen ‘Northern Canada’ defined as north of 49°N. Why not just analyze all of Canada? If you are truly interested in ‘Northern Canada’ maybe use a more stringent minimum latitude.
Consider that your reference dataset includes 3 different types of measurements which have different temporal sampling frequencies. In CanSWE, observations for the two automated methods (GMON and snow pillow) are provided at a daily frequency whereas the manual snow surveys vary from weekly to once per year. You provided several supplemental figures which indirectly describe this, but you did not address it in your experimental setup.
Is 1 year a stringent enough threshold given that some sites might only have 1 observation per year?
How were the predictors selected? What was the background rationale or physical basis used to identify the predictors?
Unclear how exactly the NDVI data were used. Specifically, did you use a static layer derived from the 16-day composites or temporally varying (16-day composites)? Did you restrict it to the snow-covered period? Did you consider any of the MODIS quality flags?
Review and cite Pulliainen et al. 2020 which uses snow survey data to bias-correct the GlobSnow product at monthly scales. https://doi.org/10.1038/s41586-020-2258-0
Figure 1 – either remove the group title for elevation zones or add one for the ecoregions.
2.3 – subsection title – use and in place of ‘/’ , consider revising the subsection title
You find that longitude is important but not NDVI. Could longitude be serving as a proxy for land cover or snow type (e.g. snow class)?
Citation: https://doi.org/10.5194/egusphere-2024-639-RC1 -
RC2: 'Comment on egusphere-2024-639', Anonymous Referee #2, 26 Apr 2024
Comments on “Evaluating a hierarchy of bias correction methods for ERA5-Land SWE in northern Canada”
This paper compares the performance of four different bias correction methods for ERA5 land SWE estimates. The paper evaluates mean bias subtraction, single and multiple linear regression, and random forest model. Random forest is found to have the best performance, but it has limitations extrapolating beyond training data. The topic of bias-correcting SWE estimates is relevant and a good option for improving SWE data.
However, in my opinion, this paper is not ready or nearly ready for publication. The text contains multiple grammatical errors and formatting could be improved. For example, spaces are missing in multiple places before a source is given. Also, a new section of text should have an empty line before it or be indented. The paper would benefit from careful proofreading that would make the text easier to follow. Additionally, this manuscript seems to be quite similar to the paper by King et al. (2020), mentioned in the paper, with location of analysis being the only obvious difference between the two papers. Differences between these two papers should be discussed and any new contributions of this paper should be highlighted.
Below are a few specific comments.
L22 ’Remotely sensed’ should not be capitalized.
Methods
This section needs some clarifications on how the actual bias corrections are done.
Equation 1 shows how mean bias is calculated not how the average difference is subtracted from the model as the text claims. Also, as the authors point out, ERA5L performs differently in different areas, so is it justified to use one mean bias value to perform the bias correction everywhere? Would it not be better to calculate different mean bias values, for example, for different elevations?
How is the single/multiple linear regression model used for bias correction? This is not explained at all in the text.
The basics of the random forest model are somewhat explained but again few sentences on how the actual bias correction is done would be good.
L147 Using just one letter to describe each month is a bit confusing; consider writing out the month names. Also, spring should not be capitalized.
Citation: https://doi.org/10.5194/egusphere-2024-639-RC2
Status: closed
-
RC1: 'Comment on egusphere-2024-639', Anonymous Referee #1, 02 Apr 2024
The manuscript tests three bias correction methods of increasing complexity to bias-correct ERA5-Land over Canada north of 49°N. The latter two methods incorporate various predictors related to physiography and time. The authors use CanSWE as the in situ reference. In my opinion, the manuscript is not ready for peer-review. The text contains numerous typographical errors, colloquialisms, language and grammatical errors that could be corrected with a careful read-through by all co-authors. There are several factual errors, and the paper lacks any real physical basis for parameter selection and interpretation of results. The work itself is linked to prior work of King et al. 2020 who investigated various methods to bias correct SNODAS SWE. Please state more clearly if and how these two works are related and whether or how your work builds upon the previous study. I only skimmed the results and discussion because the paper, in its current form, is not ready for review. The manuscript provides limited new information. The analysis and interpretations are inadequate compared to what is typically published in The Cryosphere.
The authors provide some limited interpretations from a model fitting perspective. However, I was looking for more insightful interpretations of results that link back to SWE and potential deficiencies in the ERA5L product being corrected. For example, if slope, aspect, and NDVI were found to be least important predictors in the MDI model. Are there any physical explanations for this or is it simply a function of the model? The authors should provide a brief rationale for parameter selection in terms of SWE and the physical processes that govern it (and how this might link to the ERA5L product).
Some specific comments
- Please provide links to the data used.
- Data is plural – i.e. data were used
- Add space before in-text citation
- Review use of capitalization
L17: you provided the full term ‘Snow Water Equivalent (SWE)’ two sentences prior.
L18-19: Incorrect, snow surveys are not single point measurements. The references you cite discuss this.
L 54: ‘approximately’ not ‘approx’
L69 – SD in m w.e. is SWE. Similarly, revise the description of geopotential height – just state you used the geopotential height variable.
L78: What do you mean ‘specific to each product’? Aren’t you only evaluating 1 product? Or do you mean ERA5L and the reference data?
L85 – there is an extra period
L82 – missing period at end of sentence.
L162-164 – you should consider whether there are more appropriate ways to use the data to reduce these so-called ‘sampling issues’.
Data and methods
How did you decide on your spatial domain? I have never seen ‘Northern Canada’ defined as north of 49°N. Why not just analyze all of Canada? If you are truly interested in ‘Northern Canada’ maybe use a more stringent minimum latitude.
Consider that your reference dataset includes 3 different types of measurements which have different temporal sampling frequencies. In CanSWE, observations for the two automated methods (GMON and snow pillow) are provided at a daily frequency whereas the manual snow surveys vary from weekly to once per year. You provided several supplemental figures which indirectly describe this, but you did not address it in your experimental setup.
Is 1 year a stringent enough threshold given that some sites might only have 1 observation per year?
How were the predictors selected? What was the background rationale or physical basis used to identify the predictors?
Unclear how exactly the NDVI data were used. Specifically, did you use a static layer derived from the 16-day composites or temporally varying (16-day composites)? Did you restrict it to the snow-covered period? Did you consider any of the MODIS quality flags?
Review and cite Pulliainen et al. 2020 which uses snow survey data to bias-correct the GlobSnow product at monthly scales. https://doi.org/10.1038/s41586-020-2258-0
Figure 1 – either remove the group title for elevation zones or add one for the ecoregions.
2.3 – subsection title – use and in place of ‘/’ , consider revising the subsection title
You find that longitude is important but not NDVI. Could longitude be serving as a proxy for land cover or snow type (e.g. snow class)?
Citation: https://doi.org/10.5194/egusphere-2024-639-RC1 -
RC2: 'Comment on egusphere-2024-639', Anonymous Referee #2, 26 Apr 2024
Comments on “Evaluating a hierarchy of bias correction methods for ERA5-Land SWE in northern Canada”
This paper compares the performance of four different bias correction methods for ERA5 land SWE estimates. The paper evaluates mean bias subtraction, single and multiple linear regression, and random forest model. Random forest is found to have the best performance, but it has limitations extrapolating beyond training data. The topic of bias-correcting SWE estimates is relevant and a good option for improving SWE data.
However, in my opinion, this paper is not ready or nearly ready for publication. The text contains multiple grammatical errors and formatting could be improved. For example, spaces are missing in multiple places before a source is given. Also, a new section of text should have an empty line before it or be indented. The paper would benefit from careful proofreading that would make the text easier to follow. Additionally, this manuscript seems to be quite similar to the paper by King et al. (2020), mentioned in the paper, with location of analysis being the only obvious difference between the two papers. Differences between these two papers should be discussed and any new contributions of this paper should be highlighted.
Below are a few specific comments.
L22 ’Remotely sensed’ should not be capitalized.
Methods
This section needs some clarifications on how the actual bias corrections are done.
Equation 1 shows how mean bias is calculated not how the average difference is subtracted from the model as the text claims. Also, as the authors point out, ERA5L performs differently in different areas, so is it justified to use one mean bias value to perform the bias correction everywhere? Would it not be better to calculate different mean bias values, for example, for different elevations?
How is the single/multiple linear regression model used for bias correction? This is not explained at all in the text.
The basics of the random forest model are somewhat explained but again few sentences on how the actual bias correction is done would be good.
L147 Using just one letter to describe each month is a bit confusing; consider writing out the month names. Also, spring should not be capitalized.
Citation: https://doi.org/10.5194/egusphere-2024-639-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
235 | 93 | 25 | 353 | 15 | 16 |
- HTML: 235
- PDF: 93
- XML: 25
- Total: 353
- BibTeX: 15
- EndNote: 16
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1