the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Random Forest approach to quality-checking automatic snow-depth sensor measurements
Abstract. State-of-the-art snow sensing technologies currently provide an unprecedented amount of data from both remote sensing satellites and ground sensors, but their assimilation into dynamic models is bounded to data quality, which is often low − especially in mountain, high-elevation, and unattended regions where snow is the predominant land-cover feature. To maximize the value of snow-depth measurements, we developed a Random Forest classifier to automatize the quality assurance/quality control (QA/QC) procedure of near-surface snow depth measurements collected through ultrasonic sensors, with particular reference to differentiate snow cover from grass or bare ground data and to detecting random errors (e.g., spikes). The model was trained and validated using a split-sample approach of an already manually classified dataset of 18 years of data from 43 sensors in Aosta Valley (north-western Italian Alps), and then further validated using 3 years of data from 27 stations across the rest of Italy (with no further training or tuning). The F1 score was used as scoring metric, being it the most suited to describe the performances of a model in case of a multi-class imbalanced classification problem. The model proved to be both robust and reliable in the classification of snow cover vs. grass/bare ground in Aosta Valley (F1 values above 90 %), yet less reliable in rare random-error detection, mostly due to the dataset imbalance (samples distribution: 46.46 % snow, 49.21 % grass/bare ground, 4.34 % error). No clear correlation with snow-season climatology was found in the training dataset, which further suggests robustness of our approach. The application across the rest of Italy yielded F1 scores on the order of 90 % for snow and grass/bare ground, thus confirming results from the testing region and corroborating model robustness and reliability, with again a less skillful classification of random errors (values below 5 %). This machine learning algorithm of data quality assessment will provide more reliable snow ground data, enhancing their use in snow models.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(3506 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(3506 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-656', Anonymous Referee #1, 13 Jun 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-656/egusphere-2023-656-RC1-supplement.pdf
-
AC1: 'Reply on RC1', Giulia Blandini, 22 Sep 2023
We thank Reviewer 1 for their constructive comments. We are happy that the Reviewer appreciated the writing style. All requested revisions are feasible and we will work in this direction as soon as the interactive discussion will be finalized.
See our point-by-point reply in the attached pdf.
-
AC1: 'Reply on RC1', Giulia Blandini, 22 Sep 2023
-
RC2: 'Comment on egusphere-2023-656', Anonymous Referee #2, 04 Sep 2023
Review of Blandini et al. 2023General comment :The paper is well-structured, well illustrated, and easy to read. Particular care was taken to provide neat and readable figures and explanations, which I thank and congratulate the authors for.The paper's topic is both interesting and timely, in line with the increasing availability of diverse snow depth data, sometimes produced by non-professional networks or organizations, and that could serve scientific goals provided they can be qualified.The methods proposed and the analysis of the results are sound and provide a balanced evaluation of the proposed automatic quality assessment tool.I have some suggestions that I hope will help clarify some methodological points related to metrics/evaluation, and complement the perspectives. I recommend the publication of this article provided these minor suggestions are taken into account.Detailed comments :* "what is the accuracy of a Random Forest classifier algorithm in automatically performing QA/QC of near-surface snow depthobservations?"Although the choice of a RF classifiers is well justified in the paper, it seems other AI algorithm could also be used. This could be something to explore in future work. Especially, the consideration of snow height mesurements as a time series and not just separate features, could possibly help identify spikes better ; this could be done through the use of AI algorith incorporating memory features like recurrent networks or LSTM (see last point of the Detailed comments).* L149-150 "After the oversampling procedure, a sample of 1.9×10**6 over-sampled measurements was used". It is somewhat not so clear whether 1.9x10**6 is the size of the total training set (including majority classes + oversampled minority class, which I believe it is) or just the oversampled minority class, which the "over-sampled measurement" in the sentence makes think. Could you be a bit more specific in the description ?* L162-165 : it should be stated that the metrics are going to be used to characterize the performance of the RF for each class separately, and then globally to characterize the multi-class performance through use of a macro-average. It may be also useful to explain the term macro-average to enhance readability.* L 179 : which radiation is this ? Incoming longwave, shortwave ; reflected or upcoming ones from the ground ? It should be specified because it matters for the interpretation of the importance of and relationships to this predictor. Typically, reflected shortwave radiation could be a super-help to detect snow vs grass-ground, but I assume this is not the kind of radiation that was used.* Both the test set and the evaluation set share years with the training set. The effect of different years on the RF performance is assessed in Fig 7 and related text, but actually, it seems to me that the temporal transferability of the algorithm (= transferability to other, completely unknown years) was not thoroughly tested within the split-sample procedure, though this is probably one key application of this algorithm. You very wisely discuss that your results "may point to our Random Forest being robust to different climatic regimes".Is there a particular reason why you did not choose an evaluation set enabling an evaluation of the RF model in full spatio-temporal extrapolation mode ?Maybe related to that, how sensitive would be the performance of the RF algorithm to a moderate reduction in size of the train set, for instance to a withdrawal of the 3 complete years 2018, 2020, 2022 that would then enable an evaluation of the model in spatio-temporal extrapolation ?Finally, the rationale behind the very short section 4.3 should be described either ahead of this section in the introduction, or within the section.* L 320-324 : I am not sure that the addition of more data will distinctively refine the accuracy in the "errors" class, except if you use a super huge amount of new data. I would hypothesize that other strategies may pay out with respect to this issue, and may be either explored or at least cited if you find them relevant :- maybe using other, pre-processed features could help, as for instance a Delta-HS = HS(t)-HS(t-1) with HS = Height of Snow. This could help detect unrealistic spikes or drops in snow data like the spikes remaining after RF treatment in Fig A1. This hypothesis is very basic to test.- alternatively, using AI algorithms suited to dataseries and incorporating some memory, like recurrent network or LSTM, could help if fed with snow height time-series or small extracts of them.- finally, have you considered the use of webcam images from nearby the stations within the same elevation/aspect, that could provide a simple, maybe not completely reliable snow-nosnow information, but with errors maybe not completely correlated with the RF errors ?Edits :L5-6 : "with particular reference to differentiate snow cover from grass or bare ground data and to detecting random errors (e.g., spikes)" -> to detect ?L54 : "It is clear then the necessity for a quality checking procedure, that ought to... " it seems there is a syntax issueFig 2 : adding the contours of Italy would be niceL143 : end,askowledingL143 : " the work of (Ponziani et al., 2023) in which no clear evidence of out-performance of any strategy, " It seems some words are missingL 162 : "precision(measure of"L 208 : I guess a "." is missing before "Fig 5".Caption of Fig 5 : "model.In".Fig 8 : maybe use the same vertical scale across rows, as the amplitudes are otherwise quite hard to compare esp. in the 3rd column.References : there is an issue with the Avanzi et al 2020, 2021 and 2022 references that are always stated twice.Citation: https://doi.org/
10.5194/egusphere-2023-656-RC2 -
AC2: 'Reply on RC2', Giulia Blandini, 22 Sep 2023
We appreciate the reviewer comment and acknowledge that recommended changes will improve the clarity of our work. We think all suggested modification are feasible, thus we will work towards this direction to improve our work.
See our point-by-point reply in the attached pdf.
-
AC2: 'Reply on RC2', Giulia Blandini, 22 Sep 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-656', Anonymous Referee #1, 13 Jun 2023
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2023/egusphere-2023-656/egusphere-2023-656-RC1-supplement.pdf
-
AC1: 'Reply on RC1', Giulia Blandini, 22 Sep 2023
We thank Reviewer 1 for their constructive comments. We are happy that the Reviewer appreciated the writing style. All requested revisions are feasible and we will work in this direction as soon as the interactive discussion will be finalized.
See our point-by-point reply in the attached pdf.
-
AC1: 'Reply on RC1', Giulia Blandini, 22 Sep 2023
-
RC2: 'Comment on egusphere-2023-656', Anonymous Referee #2, 04 Sep 2023
Review of Blandini et al. 2023General comment :The paper is well-structured, well illustrated, and easy to read. Particular care was taken to provide neat and readable figures and explanations, which I thank and congratulate the authors for.The paper's topic is both interesting and timely, in line with the increasing availability of diverse snow depth data, sometimes produced by non-professional networks or organizations, and that could serve scientific goals provided they can be qualified.The methods proposed and the analysis of the results are sound and provide a balanced evaluation of the proposed automatic quality assessment tool.I have some suggestions that I hope will help clarify some methodological points related to metrics/evaluation, and complement the perspectives. I recommend the publication of this article provided these minor suggestions are taken into account.Detailed comments :* "what is the accuracy of a Random Forest classifier algorithm in automatically performing QA/QC of near-surface snow depthobservations?"Although the choice of a RF classifiers is well justified in the paper, it seems other AI algorithm could also be used. This could be something to explore in future work. Especially, the consideration of snow height mesurements as a time series and not just separate features, could possibly help identify spikes better ; this could be done through the use of AI algorith incorporating memory features like recurrent networks or LSTM (see last point of the Detailed comments).* L149-150 "After the oversampling procedure, a sample of 1.9×10**6 over-sampled measurements was used". It is somewhat not so clear whether 1.9x10**6 is the size of the total training set (including majority classes + oversampled minority class, which I believe it is) or just the oversampled minority class, which the "over-sampled measurement" in the sentence makes think. Could you be a bit more specific in the description ?* L162-165 : it should be stated that the metrics are going to be used to characterize the performance of the RF for each class separately, and then globally to characterize the multi-class performance through use of a macro-average. It may be also useful to explain the term macro-average to enhance readability.* L 179 : which radiation is this ? Incoming longwave, shortwave ; reflected or upcoming ones from the ground ? It should be specified because it matters for the interpretation of the importance of and relationships to this predictor. Typically, reflected shortwave radiation could be a super-help to detect snow vs grass-ground, but I assume this is not the kind of radiation that was used.* Both the test set and the evaluation set share years with the training set. The effect of different years on the RF performance is assessed in Fig 7 and related text, but actually, it seems to me that the temporal transferability of the algorithm (= transferability to other, completely unknown years) was not thoroughly tested within the split-sample procedure, though this is probably one key application of this algorithm. You very wisely discuss that your results "may point to our Random Forest being robust to different climatic regimes".Is there a particular reason why you did not choose an evaluation set enabling an evaluation of the RF model in full spatio-temporal extrapolation mode ?Maybe related to that, how sensitive would be the performance of the RF algorithm to a moderate reduction in size of the train set, for instance to a withdrawal of the 3 complete years 2018, 2020, 2022 that would then enable an evaluation of the model in spatio-temporal extrapolation ?Finally, the rationale behind the very short section 4.3 should be described either ahead of this section in the introduction, or within the section.* L 320-324 : I am not sure that the addition of more data will distinctively refine the accuracy in the "errors" class, except if you use a super huge amount of new data. I would hypothesize that other strategies may pay out with respect to this issue, and may be either explored or at least cited if you find them relevant :- maybe using other, pre-processed features could help, as for instance a Delta-HS = HS(t)-HS(t-1) with HS = Height of Snow. This could help detect unrealistic spikes or drops in snow data like the spikes remaining after RF treatment in Fig A1. This hypothesis is very basic to test.- alternatively, using AI algorithms suited to dataseries and incorporating some memory, like recurrent network or LSTM, could help if fed with snow height time-series or small extracts of them.- finally, have you considered the use of webcam images from nearby the stations within the same elevation/aspect, that could provide a simple, maybe not completely reliable snow-nosnow information, but with errors maybe not completely correlated with the RF errors ?Edits :L5-6 : "with particular reference to differentiate snow cover from grass or bare ground data and to detecting random errors (e.g., spikes)" -> to detect ?L54 : "It is clear then the necessity for a quality checking procedure, that ought to... " it seems there is a syntax issueFig 2 : adding the contours of Italy would be niceL143 : end,askowledingL143 : " the work of (Ponziani et al., 2023) in which no clear evidence of out-performance of any strategy, " It seems some words are missingL 162 : "precision(measure of"L 208 : I guess a "." is missing before "Fig 5".Caption of Fig 5 : "model.In".Fig 8 : maybe use the same vertical scale across rows, as the amplitudes are otherwise quite hard to compare esp. in the 3rd column.References : there is an issue with the Avanzi et al 2020, 2021 and 2022 references that are always stated twice.Citation: https://doi.org/
10.5194/egusphere-2023-656-RC2 -
AC2: 'Reply on RC2', Giulia Blandini, 22 Sep 2023
We appreciate the reviewer comment and acknowledge that recommended changes will improve the clarity of our work. We think all suggested modification are feasible, thus we will work towards this direction to improve our work.
See our point-by-point reply in the attached pdf.
-
AC2: 'Reply on RC2', Giulia Blandini, 22 Sep 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
434 | 140 | 20 | 594 | 13 | 10 |
- HTML: 434
- PDF: 140
- XML: 20
- Total: 594
- BibTeX: 13
- EndNote: 10
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Giulia Blandini
Francesco Avanzi
Simone Gabellani
Denise Ponziani
Hervé Stevenin
Sara Ratto
Luca Ferraris
Alberto Viglione
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(3506 KB) - Metadata XML