the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Towards deep learning solutions for classification of automated snow height measurements (CleanSnow v1.0.0)
Abstract. Snow height measurements are still the backbone of any snow cover monitoring whether based on modeling or remote sensing. These ground-based measurements are often realized with the use of ultrasonic or laser technologies. In challenging environments, such as high alpine regions, the quality of sensor measurements deteriorates quickly, especially in the presence of extreme weather conditions or ephemeral snow conditions. Moreover, the sensors by their nature measure the height of an underlying object and are therefore prone to return other information, such as the height of vegetation, in snow-free periods. Quality assessment and real-time classification of automated snow height measurements is therefore desirable in order to provide high-quality data for research and operational applications. To this end, we propose CleanSnow, a machine learning approach to automated classification of snow height measurements into a snow cover class and a class corresponding to everything else, which takes into account both the temporal context and the dependencies between snow height and other sensor measurements. We created a new dataset of manually annotated snow height measurements, which allowed us to train our models in a supervised manner as well as quantitatively evaluate our results. Through a series of experiments and ablation studies to evaluate feature importance and compare several different models, we validated our design choices and demonstrate the importance of using temporal information together with information from auxiliary sensors. CleanSnow achieved a high accuracy and represents a new baseline for further research in the field. The presented approach to snow height classification finds its use in various tasks, ranging from snow modeling to climate science.
- Preprint
(3776 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on egusphere-2024-1752', Juan Antonio Añel, 14 Aug 2024
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlFirst, the "Code Availability" section only contains information for CleanSnow v1.0.0; however, to perform your work you have used additional software. It is the case of MeteoIO. For MeteoIO you cite a paper published ten years ago in our journal, that points to a webpage that does not comply with our current requirements for code availability. That is, it is not an acceptable repository. Regading this, MeteoIO is published under the GPLv3 license, so you can take the code, and store it in a repository that complies with our policy. Therefore, please, do it, and reply to this comment with the link and DOI of its repository.
Secondly, for the "Data Availability" section: the link that you provide for a repository for the data used to train your model, is not valid. It is not a trustable long-term repository that can be accepted for scientific publication. Therefore, you must take al the data and store it in one the repositories that we can accept, and again, reply to this comment with the link and DOI for it. However, this is not the only problem: you use SnowPack data. For this dataset you cite a paper (Lehning et al., 1999), and despite you use the SWE data, it is not possible to access it. Therefore, as for the annotated data used to train your model, you must publish the SWE data.
Also, remember that you must modify the "Code" and "Data" availability sections in any potentially reviewed version of your manuscript, so that they contain the information that you must post in reply to this comment.
I have to note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal. Please, reply to this comment with the requested information, as it must be public to make possible the Discussions stage and the review of your manuscript by any interested reader.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC1 -
AC1: 'Reply on CEC1', Jan Svoboda, 15 Aug 2024
We thank the Editor for the comments and suggestions. Please find our detailed response (including new DOIs) in the attached PDF document.
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 15 Aug 2024
Dear authors,
Many thanks for addressing these outstanding issues so quickly and satisfactorily. We can now consider the current version of your manuscript in compliance with our policy.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 15 Aug 2024
-
CC1: 'Reply on CEC1', I. Iosifescu Enescu, 17 Aug 2024
Daer Mr. Añel,
we were sorry to hear that the Authors were not permitted to deposit their data in EnviDat, the official institutional repository of WSL. EnviDat provides DOI for all deposits, and fulfills all journal requirements listed (https://www.geoscientific-model-development.net/policies/code_and_data_policy.html#item3):
- institutional support providing reasonable confidence that the material will remain available for many years/decades
- mechanisms preventing the depositor of the material from unilaterally removing it from the archive
- mechanisms for identifying the precise version of the material referred to in a persistent way. This will usually be a DOI.
We would be happy to further discuss any additional requirements that would not force our authors to deposit their datasets in two places, having two DOI from different repositories for the same dataset.
Many thanks for your consideration,
Ionut Iosifescu (technical coordinator EnviDat)
Citation: https://doi.org/10.5194/egusphere-2024-1752-CC1 -
CEC3: 'Reply on CC1', Juan Antonio Añel, 17 Aug 2024
Dear I. Iosifescu Enescu,
Just to clarify, while the authors of this manuscript are permitted to deposit their data in Envidat, we cannot overlook the current issues that prevent it from being a trustworthy repository for scientific publication.
We welcome the efforts that Envidat could be making to become a trustable repository for scientific publication and that you are willing to comply with our requirements. Unfortunately, before commenting on this, I double-checked the status of Envidat in Fairsharing.org, which, regarding Envidat, as you can see, could make better regarding several issues. Also, finding the page with Envidat's conditions and policy from the homepage took work. I found this information through Fairsharing.org, not the Envidat homepage.
Despite your claims that authors can not remove the data, your policy (https://www.envidat.ch/#/about/policies) clearly states, "Metadata and content items may be removed at the request of the depositor." No exception is listed to this rule in your policy. Therefore, I think it is clear that Envidat does not comply with our requirements that ask for the impossibility of authors to remove an item (software or data). Moreover, your policy clearly states that you can change it unilaterally at any point "The EnviDat policies are subject to change by EnviDat at any time and without notice." We can not trust a policy that can change by unilateral decision at any point, and without being clear on who depends such change or their procedures.I miss some other important details. For example, for how long has EnviDat been funded? We usually require that items be submitted to repositories with funding secured for their maintenance for extended periods (minimum > 10 years, and generally more than 20 years) or a proven ability to obtain recurrent funding or commitment of organizations participating in it to fund it. Also, is there a board that decides on the data stored and its removal? It is essential to know that such decisions are not arbitrary and made by only one person but based on consensus. I have not found these details described on your webpage, and they are relevant. We would appreciate it if you could provide and publish the appropriate documents and additional information on the Envidat webpage.
We are open to accepting Envidat for the deposit of the code and data software submitted to our journal. However, the issues we have raised must be addressed.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC3 -
CC2: 'Reply on CEC3', I. Iosifescu Enescu, 19 Aug 2024
Dear Juan A. Añel,
Thank you for your additional clarifications and feedback, much appreciated! EnviDat can indeed improve in many areas, including making the policies more visible. Currently, our policies are linked in the "about" section, similarly to Zenodo. (But we can of course link the policies also directly in the side bar instead of having them in the about section - we can do this change pretty soon, as it is a minor issue.)
Overall, if I understand your suggestions correctly, the journal will accept data published in EnviDat if we will make a section/page that will declare:
1. the repository start date (EnviDat started in 2013)
2. our long-term funding security (EnviDat is a strategic initiative of WSL, and the official institutional data repository of WSL).
Also for clarification, may I clearly reiterate that depositors cannot delete datasets (and actually, in our repository, "the content items are not deleted per se, but are restricted, and therefore, no longer accessible"), as clearly declared in our policies, and I quote:
- Metadata and content items may be removed at the request of the depositor. Possible reasons for withdrawal include, but are not limited to:
- violations of WSL research integrity guidelines,
- proven copyright violation or plagiarism,
- legal requirements and proven violations,
- journal publishers' rules.
- Withdrawing metadata and content items means:
- permanent identifiers (DOIs) and permanent (DOI) URLs are retained for the entire duration of EnviDat existence,
- DOI URLs will continue to point to tombstone records, to avoid broken links from scientific citations, with a modified description explaining the reasons for withdrawal,
- the content items are not deleted per se, but are restricted, and therefore, no longer accessible.
You made it obvious that our current policy can be misunderstood and should be immediately improved - thank you for this feedback.
Consequently, we will also rephrase and simplify the policy sections about withdrawal, so that we can prevent further misunderstandings in the future. We will, effective immediately, start comparing and simplifying our policies to be as close as possible to the ones from Zenodo (see https://about.zenodo.org/policies/), since Zenodo policies are obviously acceptable for your journal. And we can get the new EnviDat policies approved towards end of September at our next UGM, so they should be coming to production in the early October release.
Would the implementation of all the above suggested changes influence your views on EnviDat as a trustworthy repository, therefore making it acceptable for your journal in the future? Would there be anything else we would need to improve?
Kind Regards,
Ionut Iosifescu, Technical Coordinator EnviDat
Citation: https://doi.org/10.5194/egusphere-2024-1752-CC2 -
CEC4: 'Reply on CC2', Juan Antonio Añel, 15 Oct 2024
Dear authors,
I wanted to make clear that your reply to our concerns on Envidat look reasonable. However, because I went on holidays, I was not able to reply to your previous comment when you submitted it by the end of August.
I appreciate your efforts to make Envidat in compliance with our policy. For it, I think we need to address better an issue, the funding of Envidat. I think it is necessary you provide the evidence for the funding that supports Envidat, making public the amounts, duration of the grants and founders. Otherwise, your claims that it is a "core" activity are not only good intentions. In the meantime, and to properly address the issues with this manuscript, I would kindly request you to deposit a copy of your assets in another repository from our list that we currently accept. In this case duplication of the repositories will allow to avoid potential future problems.
Best regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC4 -
AC4: 'Reply on CEC4', Jan Svoboda, 16 Oct 2024
Dear Editor,
the dataset has been already uploaded to Zenodo, which complies with GMD rules:
https://doi.org/10.5281/zenodo.13324736
We will make sure to reference Zenodo DOI in our manuscript in the future.
Best Regards,
Jan Svoboda, on behalf of all co-authors
Citation: https://doi.org/10.5194/egusphere-2024-1752-AC4
-
AC4: 'Reply on CEC4', Jan Svoboda, 16 Oct 2024
- Metadata and content items may be removed at the request of the depositor. Possible reasons for withdrawal include, but are not limited to:
-
CC2: 'Reply on CEC3', I. Iosifescu Enescu, 19 Aug 2024
-
AC1: 'Reply on CEC1', Jan Svoboda, 15 Aug 2024
-
RC1: 'Comment on egusphere-2024-1752', Anonymous Referee #1, 23 Sep 2024
Dear authors,
I'd first like to say that I found your manuscript very interesting. You were thorough in your evaluation of your model. I also like that your GitLab page encourages reproducibility and people to use your model. I took the liberty of directly annotating your manuscript. I think the content and research are very good, but the form and presentation quality could be improved so that it's easier to understand.
But here are some other more general comments:
- ML level: because you're aiming for an EGU journal, I think there might be a majority of readers who are not experts in ML. Therefore, I think that at some points, it's necessary to explain some terms (I've pointed some out in the manuscript). Be careful of the line where you become unnecessarily too technical and where you might lose some of your readers. I'd also add a section in the discussion advising people (who are unfamiliar with ML and are set on their traditional numerical models) on how to use your model.
- Consistency of terms: Be careful not to use too many names to refer to your model, sometimes CleanSnow, sometimes TCN. I'd stick to CleanSnow everywhere and use TCN only when you refer to the architecture; otherwise, it becomes very confusing. You devised a nice name for your model, so use it :) Re-read the manuscript and change it where needed.
- Cross-validation: The part about hyperparameter tuning of your model is briefly mentioned but very important. Did you do any cross-validation for this (if not, why not?)? And which hyperparameters were tuned and came out as best?
- Figures and their legend: your figure legends are generally concise and need more information. Although this is very tedious work, legends should respect a few things, such as acronyms that come up in the figure should be referred to (and generally explained) in the legend, and a reader should be able to understand the figure on its own without having to go look things up in the text. Please go over your legends again and make them more descriptive.
- Results: In your description of results (Section "Experiments"), when making statements that can be backed up by numbers in parenthesis, these numbers should be provided (such as F1 scores). There are a lot of F1 scores in your figures that can be easily used to back up your claims. Otherwise, the reader has to go look them up in the figures, and your statements seem empty. One good example where you do this is line 285, but this should be in all other results, too: "e.g., demonstrates that the model confidently classified snow (TPR = 99.4%) in contrast to the classification of snow-free ground with (TPR = 88.4%)".
- Grammatical tense: Be careful about mixing up too many tenses; sometimes, you switch from past to present without making too much sense. For the sake of consistency, try to keep the same when talking about the same things: for example, keep past tense when talking about your experiments and present tense for the results.
- Presentation of results and discussion: it seems to me that quite a lot of the results are simply repeated in the discussion, and that's not very interesting. I suggest that if a question comes up in the results, you discuss it immediately (for example, the negative effect of the solar variable). Otherwise, the reader doesn't get an explanation, reads on, forgets about it, and suddenly finds it again in the discussion. Instead of repeating results, the discussion, for example, also needs the limitations of CleanSnow.
- Repetitiveness: your text is quite long, and I think you can make it shorter by removing unnecessary repetitions. Some things to remove are repetitions of things said previously in other sections ("as previously described in ..."), which can just be a reference to a section. "As shown in Figure ..." can just be a statement with a "(Figure number)". I tried to strike out some things that jumped up to me, but I'll let you have a look.
- The problem of generalization: this comes back to the limitations of CleanSnow. You've shown that it generalizes well to stations within its training range but performs less well to those outside it. This is a normal limitation in ML but should be presented as such. CleanSnow will struggle when applied to a new station that is not within the distribution it's been trained on (which is normal because it's not like you did any transfer learning or something), but that's not a good generalization. So, I think your text needs more transparency about this limitation.
- I have an open question for you: you briefly mentioned input anomalies in your discussion. Did you notice any particular behavior for 2022 and 2023 (seeing as they're strong temperature anomalies)?
- AC2: 'Reply on RC1', Jan Svoboda, 15 Oct 2024
-
RC2: 'Comment on egusphere-2024-1752', Anonymous Referee #2, 03 Oct 2024
- This study focused on the classification of snow height measurements based on AI techniques. However, the scientific significance was not sufficient enough, or why this work is important?
- What’s the relationship between quality assessment and AI classification? In my view, this work doesn’t aim at improving data quality, just distinguishing possible anomalies from all station measurements. So how to reflect the advance of AI method in this work?
- The structure of this article is not clear enough, please improve it and maintain some important research work. Now
- In figure 1, how to determine the training and testing stations?
- P3, lines 70-80. These descriptions should be moved to Section introduction.
- P3, lines 80-85. Please give the physic basis.
- Please provide a flowchart for this paper.
- How to determine the truth data?
- P6, lines 110-135. This paragraph should belong to methodology, thus, the title ‘3 Machine learning based snow cover classification’ is not suitable. This section should be method or methodology.
- P8, ‘4.1 dataset’ should be introduced in methodology section, not here.
- It is difficult for me to understand the logic and structure of this study.
Citation: https://doi.org/10.5194/egusphere-2024-1752-RC2 - AC3: 'Reply on RC2', Jan Svoboda, 15 Oct 2024
Data sets
Snow Height Classification Dataset Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer https://doi.org/10.16904/envidat.512
Model code and software
Towards deep learning solutions for classification of automated snow height measurements (CleanSnow v1.0.0) Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer https://doi.org/10.5281/zenodo.12698071
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
384 | 121 | 134 | 639 | 14 | 15 |
- HTML: 384
- PDF: 121
- XML: 134
- Total: 639
- BibTeX: 14
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1