Towards deep learning solutions for classification of automated snow height measurements (CleanSnow v1.0.0)

Svoboda, Jan; Ruesch, Marc; Liechti, David; Jones, Corinne; Volpi, Michele; Zehnder, Michael; Schweizer, Jürg

doi:10.5194/egusphere-2024-1752

Preprints

https://doi.org/10.5194/egusphere-2024-1752

Preprints

22 Jul 2024

| 22 Jul 2024

Towards deep learning solutions for classification of automated snow height measurements (CleanSnow v1.0.0)

Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer

Abstract. Snow height measurements are still the backbone of any snow cover monitoring whether based on modeling or remote sensing. These ground-based measurements are often realized with the use of ultrasonic or laser technologies. In challenging environments, such as high alpine regions, the quality of sensor measurements deteriorates quickly, especially in the presence of extreme weather conditions or ephemeral snow conditions. Moreover, the sensors by their nature measure the height of an underlying object and are therefore prone to return other information, such as the height of vegetation, in snow-free periods. Quality assessment and real-time classification of automated snow height measurements is therefore desirable in order to provide high-quality data for research and operational applications. To this end, we propose CleanSnow, a machine learning approach to automated classification of snow height measurements into a snow cover class and a class corresponding to everything else, which takes into account both the temporal context and the dependencies between snow height and other sensor measurements. We created a new dataset of manually annotated snow height measurements, which allowed us to train our models in a supervised manner as well as quantitatively evaluate our results. Through a series of experiments and ablation studies to evaluate feature importance and compare several different models, we validated our design choices and demonstrate the importance of using temporal information together with information from auxiliary sensors. CleanSnow achieved a high accuracy and represents a new baseline for further research in the field. The presented approach to snow height classification finds its use in various tasks, ranging from snow modeling to climate science.

Received: 11 Jun 2024 – Discussion started: 22 Jul 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 3776 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (3776 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

17 Mar 2025

Towards deep-learning solutions for classification of automated snow height measurements (CleanSnow v1.0.2)

Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer

Geosci. Model Dev., 18, 1829–1849, https://doi.org/10.5194/gmd-18-1829-2025,https://doi.org/10.5194/gmd-18-1829-2025, 2025

Short summary

Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer

Interactive discussion

Status: closed

CEC1:
'Comment on egusphere-2024-1752', Juan Antonio Añel, 14 Aug 2024

Dear authors,

Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
First, the "Code Availability" section only contains information for CleanSnow v1.0.0; however, to perform your work you have used additional software. It is the case of MeteoIO. For MeteoIO you cite a paper published ten years ago in our journal, that points to a webpage that does not comply with our current requirements for code availability. That is, it is not an acceptable repository. Regading this, MeteoIO is published under the GPLv3 license, so you can take the code, and store it in a repository that complies with our policy. Therefore, please, do it, and reply to this comment with the link and DOI of its repository.
Secondly, for the "Data Availability" section: the link that you provide for a repository for the data used to train your model, is not valid. It is not a trustable long-term repository that can be accepted for scientific publication. Therefore, you must take al the data and store it in one the repositories that we can accept, and again, reply to this comment with the link and DOI for it. However, this is not the only problem: you use SnowPack data. For this dataset you cite a paper (Lehning et al., 1999), and despite you use the SWE data, it is not possible to access it. Therefore, as for the annotated data used to train your model, you must publish the SWE data.
Also, remember that you must modify the "Code" and "Data" availability sections in any potentially reviewed version of your manuscript, so that they contain the information that you must post in reply to this comment.
I have to note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal. Please, reply to this comment with the requested information, as it must be public to make possible the Discussions stage and the review of your manuscript by any interested reader.
Juan A. Añel
Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC1
- AC1:
  'Reply on CEC1', Jan Svoboda, 15 Aug 2024
  
  We thank the Editor for the comments and suggestions. Please find our detailed response (including new DOIs) in the attached PDF document.
  
  Citation: https://doi.org/10.5194/egusphere-2024-1752-AC1
  - CEC2: 'Reply on AC1', Juan Antonio Añel, 15 Aug 2024
    
    Dear authors,
    Many thanks for addressing these outstanding issues so quickly and satisfactorily. We can now consider the current version of your manuscript in compliance with our policy.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC2
- CC1:
  'Reply on CEC1', I. Iosifescu Enescu, 17 Aug 2024
  Daer Mr. Añel,
  we were sorry to hear that the Authors were not permitted to deposit their data in EnviDat, the official institutional repository of WSL. EnviDat provides DOI for all deposits, and fulfills all journal requirements listed (https://www.geoscientific-model-development.net/policies/code_and_data_policy.html#item3):
  institutional support providing reasonable confidence that the material will remain available for many years/decades
  
  mechanisms preventing the depositor of the material from unilaterally removing it from the archive
  
  mechanisms for identifying the precise version of the material referred to in a persistent way. This will usually be a DOI.
  
  We would be happy to further discuss any additional requirements that would not force our authors to deposit their datasets in two places, having two DOI from different repositories for the same dataset.
  Many thanks for your consideration,
  Ionut Iosifescu (technical coordinator EnviDat)
  
  Citation: https://doi.org/10.5194/egusphere-2024-1752-CC1
  - CEC3: 'Reply on CC1', Juan Antonio Añel, 17 Aug 2024
    
    Dear I. Iosifescu Enescu,
    Just to clarify, while the authors of this manuscript are permitted to deposit their data in Envidat, we cannot overlook the current issues that prevent it from being a trustworthy repository for scientific publication.
    
    We welcome the efforts that Envidat could be making to become a trustable repository for scientific publication and that you are willing to comply with our requirements. Unfortunately, before commenting on this, I double-checked the status of Envidat in Fairsharing.org, which, regarding Envidat, as you can see, could make better regarding several issues. Also, finding the page with Envidat's conditions and policy from the homepage took work. I found this information through Fairsharing.org, not the Envidat homepage.
    
    Despite your claims that authors can not remove the data, your policy (https://www.envidat.ch/#/about/policies) clearly states, "Metadata and content items may be removed at the request of the depositor." No exception is listed to this rule in your policy. Therefore, I think it is clear that Envidat does not comply with our requirements that ask for the impossibility of authors to remove an item (software or data). Moreover, your policy clearly states that you can change it unilaterally at any point "The EnviDat policies are subject to change by EnviDat at any time and without notice." We can not trust a policy that can change by unilateral decision at any point, and without being clear on who depends such change or their procedures.
    I miss some other important details. For example, for how long has EnviDat been funded? We usually require that items be submitted to repositories with funding secured for their maintenance for extended periods (minimum > 10 years, and generally more than 20 years) or a proven ability to obtain recurrent funding or commitment of organizations participating in it to fund it. Also, is there a board that decides on the data stored and its removal? It is essential to know that such decisions are not arbitrary and made by only one person but based on consensus. I have not found these details described on your webpage, and they are relevant. We would appreciate it if you could provide and publish the appropriate documents and additional information on the Envidat webpage.
    We are open to accepting Envidat for the deposit of the code and data software submitted to our journal. However, the issues we have raised must be addressed.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC3
    
    CC2: 'Reply on CEC3', I. Iosifescu Enescu, 19 Aug 2024
    
    Dear Juan A. Añel,
    Thank you for your additional clarifications and feedback, much appreciated! EnviDat can indeed improve in many areas, including making the policies more visible. Currently, our policies are linked in the "about" section, similarly to Zenodo. (But we can of course link the policies also directly in the side bar instead of having them in the about section - we can do this change pretty soon, as it is a minor issue.)
    Overall, if I understand your suggestions correctly, the journal will accept data published in EnviDat if we will make a section/page that will declare:
    1. the repository start date (EnviDat started in 2013)
    2. our long-term funding security (EnviDat is a strategic initiative of WSL, and the official institutional data repository of WSL).
    Also for clarification, may I clearly reiterate that depositors cannot delete datasets (and actually, in our repository, "the content items are not deleted per se, but are restricted, and therefore, no longer accessible"), as clearly declared in our policies, and I quote:
    Metadata and content items may be removed at the request of the depositor. Possible reasons for withdrawal include, but are not limited to:
    violations of WSL research integrity guidelines,
    
    proven copyright violation or plagiarism,
    
    legal requirements and proven violations,
    
    journal publishers' rules.
    
    Withdrawing metadata and content items means:
    permanent identifiers (DOIs) and permanent (DOI) URLs are retained for the entire duration of EnviDat existence,
    
    DOI URLs will continue to point to tombstone records, to avoid broken links from scientific citations, with a modified description explaining the reasons for withdrawal,
    
    the content items are not deleted per se, but are restricted, and therefore, no longer accessible.
    
    You made it obvious that our current policy can be misunderstood and should be immediately improved - thank you for this feedback.
    Consequently, we will also rephrase and simplify the policy sections about withdrawal, so that we can prevent further misunderstandings in the future. We will, effective immediately, start comparing and simplifying our policies to be as close as possible to the ones from Zenodo (see https://about.zenodo.org/policies/), since Zenodo policies are obviously acceptable for your journal. And we can get the new EnviDat policies approved towards end of September at our next UGM, so they should be coming to production in the early October release.
    Would the implementation of all the above suggested changes influence your views on EnviDat as a trustworthy repository, therefore making it acceptable for your journal in the future? Would there be anything else we would need to improve?
    Kind Regards,
    Ionut Iosifescu, Technical Coordinator EnviDat
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-CC2
    
    CEC4: 'Reply on CC2', Juan Antonio Añel, 15 Oct 2024
    
    Dear authors,
    I wanted to make clear that your reply to our concerns on Envidat look reasonable. However, because I went on holidays, I was not able to reply to your previous comment when you submitted it by the end of August.
    I appreciate your efforts to make Envidat in compliance with our policy. For it, I think we need to address better an issue, the funding of Envidat. I think it is necessary you provide the evidence for the funding that supports Envidat, making public the amounts, duration of the grants and founders. Otherwise, your claims that it is a "core" activity are not only good intentions. In the meantime, and to properly address the issues with this manuscript, I would kindly request you to deposit a copy of your assets in another repository from our list that we currently accept. In this case duplication of the repositories will allow to avoid potential future problems.
    Best regards,
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC4
    
    AC4: 'Reply on CEC4', Jan Svoboda, 16 Oct 2024
    
    Dear Editor,
    the dataset has been already uploaded to Zenodo, which complies with GMD rules:
    https://doi.org/10.5281/zenodo.13324736
    We will make sure to reference Zenodo DOI in our manuscript in the future.
    Best Regards,
    Jan Svoboda, on behalf of all co-authors
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-AC4
RC1:
'Comment on egusphere-2024-1752', Anonymous Referee #1, 23 Sep 2024
Dear authors,
I'd first like to say that I found your manuscript very interesting. You were thorough in your evaluation of your model. I also like that your GitLab page encourages reproducibility and people to use your model. I took the liberty of directly annotating your manuscript. I think the content and research are very good, but the form and presentation quality could be improved so that it's easier to understand.
But here are some other more general comments:
ML level: because you're aiming for an EGU journal, I think there might be a majority of readers who are not experts in ML. Therefore, I think that at some points, it's necessary to explain some terms (I've pointed some out in the manuscript). Be careful of the line where you become unnecessarily too technical and where you might lose some of your readers. I'd also add a section in the discussion advising people (who are unfamiliar with ML and are set on their traditional numerical models) on how to use your model.

Consistency of terms: Be careful not to use too many names to refer to your model, sometimes CleanSnow, sometimes TCN. I'd stick to CleanSnow everywhere and use TCN only when you refer to the architecture; otherwise, it becomes very confusing. You devised a nice name for your model, so use it :) Re-read the manuscript and change it where needed.

Cross-validation: The part about hyperparameter tuning of your model is briefly mentioned but very important. Did you do any cross-validation for this (if not, why not?)? And which hyperparameters were tuned and came out as best?

Figures and their legend: your figure legends are generally concise and need more information. Although this is very tedious work, legends should respect a few things, such as acronyms that come up in the figure should be referred to (and generally explained) in the legend, and a reader should be able to understand the figure on its own without having to go look things up in the text. Please go over your legends again and make them more descriptive.

Results: In your description of results (Section "Experiments"), when making statements that can be backed up by numbers in parenthesis, these numbers should be provided (such as F1 scores). There are a lot of F1 scores in your figures that can be easily used to back up your claims. Otherwise, the reader has to go look them up in the figures, and your statements seem empty. One good example where you do this is line 285, but this should be in all other results, too: "e.g., demonstrates that the model confidently classified snow (TPR = 99.4%) in contrast to the classification of snow-free ground with (TPR = 88.4%)".

Grammatical tense: Be careful about mixing up too many tenses; sometimes, you switch from past to present without making too much sense. For the sake of consistency, try to keep the same when talking about the same things: for example, keep past tense when talking about your experiments and present tense for the results.

Presentation of results and discussion: it seems to me that quite a lot of the results are simply repeated in the discussion, and that's not very interesting. I suggest that if a question comes up in the results, you discuss it immediately (for example, the negative effect of the solar variable). Otherwise, the reader doesn't get an explanation, reads on, forgets about it, and suddenly finds it again in the discussion. Instead of repeating results, the discussion, for example, also needs the limitations of CleanSnow.

Repetitiveness: your text is quite long, and I think you can make it shorter by removing unnecessary repetitions. Some things to remove are repetitions of things said previously in other sections ("as previously described in ..."), which can just be a reference to a section. "As shown in Figure ..." can just be a statement with a "(Figure number)". I tried to strike out some things that jumped up to me, but I'll let you have a look.

The problem of generalization: this comes back to the limitations of CleanSnow. You've shown that it generalizes well to stations within its training range but performs less well to those outside it. This is a normal limitation in ML but should be presented as such. CleanSnow will struggle when applied to a new station that is not within the distribution it's been trained on (which is normal because it's not like you did any transfer learning or something), but that's not a good generalization. So, I think your text needs more transparency about this limitation.

I have an open question for you: you briefly mentioned input anomalies in your discussion. Did you notice any particular behavior for 2022 and 2023 (seeing as they're strong temperature anomalies)?
Citation: https://doi.org/10.5194/egusphere-2024-1752-RC1
- AC2: 'Reply on RC1', Jan Svoboda, 15 Oct 2024
  
  Dear reviewer,
  thank you for your valuable comments. Please find our response in the attached PDF.
  Best regards,
  Jan Svoboda, on behalf of all co-authors
  
  Citation: https://doi.org/10.5194/egusphere-2024-1752-AC2
RC2:
'Comment on egusphere-2024-1752', Anonymous Referee #2, 03 Oct 2024
This study focused on the classification of snow height measurements based on AI techniques. However, the scientific significance was not sufficient enough, or why this work is important?

What’s the relationship between quality assessment and AI classification? In my view, this work doesn’t aim at improving data quality, just distinguishing possible anomalies from all station measurements. So how to reflect the advance of AI method in this work?

The structure of this article is not clear enough, please improve it and maintain some important research work. Now

In figure 1, how to determine the training and testing stations?

P3, lines 70-80. These descriptions should be moved to Section introduction.

P3, lines 80-85. Please give the physic basis.

Please provide a flowchart for this paper.

How to determine the truth data?

P6, lines 110-135. This paragraph should belong to methodology, thus, the title ‘3 Machine learning based snow cover classification’ is not suitable. This section should be method or methodology.

P8, ‘4.1 dataset’ should be introduced in methodology section, not here.

It is difficult for me to understand the logic and structure of this study.
Citation: https://doi.org/10.5194/egusphere-2024-1752-RC2
- AC3: 'Reply on RC2', Jan Svoboda, 15 Oct 2024
  
  Dear reviewer,
  thank you for your valuable comments. Please find our response in the attached PDF.
  Best regards,
  Jan Svoboda, on behalf of all co-authors
  
  Citation: https://doi.org/10.5194/egusphere-2024-1752-AC3

Interactive discussion

Status: closed

CEC1:
'Comment on egusphere-2024-1752', Juan Antonio Añel, 14 Aug 2024

Dear authors,

Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
First, the "Code Availability" section only contains information for CleanSnow v1.0.0; however, to perform your work you have used additional software. It is the case of MeteoIO. For MeteoIO you cite a paper published ten years ago in our journal, that points to a webpage that does not comply with our current requirements for code availability. That is, it is not an acceptable repository. Regading this, MeteoIO is published under the GPLv3 license, so you can take the code, and store it in a repository that complies with our policy. Therefore, please, do it, and reply to this comment with the link and DOI of its repository.
Secondly, for the "Data Availability" section: the link that you provide for a repository for the data used to train your model, is not valid. It is not a trustable long-term repository that can be accepted for scientific publication. Therefore, you must take al the data and store it in one the repositories that we can accept, and again, reply to this comment with the link and DOI for it. However, this is not the only problem: you use SnowPack data. For this dataset you cite a paper (Lehning et al., 1999), and despite you use the SWE data, it is not possible to access it. Therefore, as for the annotated data used to train your model, you must publish the SWE data.
Also, remember that you must modify the "Code" and "Data" availability sections in any potentially reviewed version of your manuscript, so that they contain the information that you must post in reply to this comment.
I have to note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal. Please, reply to this comment with the requested information, as it must be public to make possible the Discussions stage and the review of your manuscript by any interested reader.
Juan A. Añel
Geosci. Model Dev. Executive Editor

Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC1
- AC1:
  'Reply on CEC1', Jan Svoboda, 15 Aug 2024
  
  We thank the Editor for the comments and suggestions. Please find our detailed response (including new DOIs) in the attached PDF document.
  
  Citation: https://doi.org/10.5194/egusphere-2024-1752-AC1
  - CEC2: 'Reply on AC1', Juan Antonio Añel, 15 Aug 2024
    
    Dear authors,
    Many thanks for addressing these outstanding issues so quickly and satisfactorily. We can now consider the current version of your manuscript in compliance with our policy.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC2
- CC1:
  'Reply on CEC1', I. Iosifescu Enescu, 17 Aug 2024
  Daer Mr. Añel,
  we were sorry to hear that the Authors were not permitted to deposit their data in EnviDat, the official institutional repository of WSL. EnviDat provides DOI for all deposits, and fulfills all journal requirements listed (https://www.geoscientific-model-development.net/policies/code_and_data_policy.html#item3):
  institutional support providing reasonable confidence that the material will remain available for many years/decades
  
  mechanisms preventing the depositor of the material from unilaterally removing it from the archive
  
  mechanisms for identifying the precise version of the material referred to in a persistent way. This will usually be a DOI.
  
  We would be happy to further discuss any additional requirements that would not force our authors to deposit their datasets in two places, having two DOI from different repositories for the same dataset.
  Many thanks for your consideration,
  Ionut Iosifescu (technical coordinator EnviDat)
  
  Citation: https://doi.org/10.5194/egusphere-2024-1752-CC1
  - CEC3: 'Reply on CC1', Juan Antonio Añel, 17 Aug 2024
    
    Dear I. Iosifescu Enescu,
    Just to clarify, while the authors of this manuscript are permitted to deposit their data in Envidat, we cannot overlook the current issues that prevent it from being a trustworthy repository for scientific publication.
    
    We welcome the efforts that Envidat could be making to become a trustable repository for scientific publication and that you are willing to comply with our requirements. Unfortunately, before commenting on this, I double-checked the status of Envidat in Fairsharing.org, which, regarding Envidat, as you can see, could make better regarding several issues. Also, finding the page with Envidat's conditions and policy from the homepage took work. I found this information through Fairsharing.org, not the Envidat homepage.
    
    Despite your claims that authors can not remove the data, your policy (https://www.envidat.ch/#/about/policies) clearly states, "Metadata and content items may be removed at the request of the depositor." No exception is listed to this rule in your policy. Therefore, I think it is clear that Envidat does not comply with our requirements that ask for the impossibility of authors to remove an item (software or data). Moreover, your policy clearly states that you can change it unilaterally at any point "The EnviDat policies are subject to change by EnviDat at any time and without notice." We can not trust a policy that can change by unilateral decision at any point, and without being clear on who depends such change or their procedures.
    I miss some other important details. For example, for how long has EnviDat been funded? We usually require that items be submitted to repositories with funding secured for their maintenance for extended periods (minimum > 10 years, and generally more than 20 years) or a proven ability to obtain recurrent funding or commitment of organizations participating in it to fund it. Also, is there a board that decides on the data stored and its removal? It is essential to know that such decisions are not arbitrary and made by only one person but based on consensus. I have not found these details described on your webpage, and they are relevant. We would appreciate it if you could provide and publish the appropriate documents and additional information on the Envidat webpage.
    We are open to accepting Envidat for the deposit of the code and data software submitted to our journal. However, the issues we have raised must be addressed.
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC3
    
    CC2: 'Reply on CEC3', I. Iosifescu Enescu, 19 Aug 2024
    
    Dear Juan A. Añel,
    Thank you for your additional clarifications and feedback, much appreciated! EnviDat can indeed improve in many areas, including making the policies more visible. Currently, our policies are linked in the "about" section, similarly to Zenodo. (But we can of course link the policies also directly in the side bar instead of having them in the about section - we can do this change pretty soon, as it is a minor issue.)
    Overall, if I understand your suggestions correctly, the journal will accept data published in EnviDat if we will make a section/page that will declare:
    1. the repository start date (EnviDat started in 2013)
    2. our long-term funding security (EnviDat is a strategic initiative of WSL, and the official institutional data repository of WSL).
    Also for clarification, may I clearly reiterate that depositors cannot delete datasets (and actually, in our repository, "the content items are not deleted per se, but are restricted, and therefore, no longer accessible"), as clearly declared in our policies, and I quote:
    Metadata and content items may be removed at the request of the depositor. Possible reasons for withdrawal include, but are not limited to:
    violations of WSL research integrity guidelines,
    
    proven copyright violation or plagiarism,
    
    legal requirements and proven violations,
    
    journal publishers' rules.
    
    Withdrawing metadata and content items means:
    permanent identifiers (DOIs) and permanent (DOI) URLs are retained for the entire duration of EnviDat existence,
    
    DOI URLs will continue to point to tombstone records, to avoid broken links from scientific citations, with a modified description explaining the reasons for withdrawal,
    
    the content items are not deleted per se, but are restricted, and therefore, no longer accessible.
    
    You made it obvious that our current policy can be misunderstood and should be immediately improved - thank you for this feedback.
    Consequently, we will also rephrase and simplify the policy sections about withdrawal, so that we can prevent further misunderstandings in the future. We will, effective immediately, start comparing and simplifying our policies to be as close as possible to the ones from Zenodo (see https://about.zenodo.org/policies/), since Zenodo policies are obviously acceptable for your journal. And we can get the new EnviDat policies approved towards end of September at our next UGM, so they should be coming to production in the early October release.
    Would the implementation of all the above suggested changes influence your views on EnviDat as a trustworthy repository, therefore making it acceptable for your journal in the future? Would there be anything else we would need to improve?
    Kind Regards,
    Ionut Iosifescu, Technical Coordinator EnviDat
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-CC2
    
    CEC4: 'Reply on CC2', Juan Antonio Añel, 15 Oct 2024
    
    Dear authors,
    I wanted to make clear that your reply to our concerns on Envidat look reasonable. However, because I went on holidays, I was not able to reply to your previous comment when you submitted it by the end of August.
    I appreciate your efforts to make Envidat in compliance with our policy. For it, I think we need to address better an issue, the funding of Envidat. I think it is necessary you provide the evidence for the funding that supports Envidat, making public the amounts, duration of the grants and founders. Otherwise, your claims that it is a "core" activity are not only good intentions. In the meantime, and to properly address the issues with this manuscript, I would kindly request you to deposit a copy of your assets in another repository from our list that we currently accept. In this case duplication of the repositories will allow to avoid potential future problems.
    Best regards,
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-CEC4
    
    AC4: 'Reply on CEC4', Jan Svoboda, 16 Oct 2024
    
    Dear Editor,
    the dataset has been already uploaded to Zenodo, which complies with GMD rules:
    https://doi.org/10.5281/zenodo.13324736
    We will make sure to reference Zenodo DOI in our manuscript in the future.
    Best Regards,
    Jan Svoboda, on behalf of all co-authors
    
    Citation: https://doi.org/10.5194/egusphere-2024-1752-AC4
RC1:
'Comment on egusphere-2024-1752', Anonymous Referee #1, 23 Sep 2024
Dear authors,
I'd first like to say that I found your manuscript very interesting. You were thorough in your evaluation of your model. I also like that your GitLab page encourages reproducibility and people to use your model. I took the liberty of directly annotating your manuscript. I think the content and research are very good, but the form and presentation quality could be improved so that it's easier to understand.
But here are some other more general comments:
ML level: because you're aiming for an EGU journal, I think there might be a majority of readers who are not experts in ML. Therefore, I think that at some points, it's necessary to explain some terms (I've pointed some out in the manuscript). Be careful of the line where you become unnecessarily too technical and where you might lose some of your readers. I'd also add a section in the discussion advising people (who are unfamiliar with ML and are set on their traditional numerical models) on how to use your model.

Consistency of terms: Be careful not to use too many names to refer to your model, sometimes CleanSnow, sometimes TCN. I'd stick to CleanSnow everywhere and use TCN only when you refer to the architecture; otherwise, it becomes very confusing. You devised a nice name for your model, so use it :) Re-read the manuscript and change it where needed.

Cross-validation: The part about hyperparameter tuning of your model is briefly mentioned but very important. Did you do any cross-validation for this (if not, why not?)? And which hyperparameters were tuned and came out as best?

Figures and their legend: your figure legends are generally concise and need more information. Although this is very tedious work, legends should respect a few things, such as acronyms that come up in the figure should be referred to (and generally explained) in the legend, and a reader should be able to understand the figure on its own without having to go look things up in the text. Please go over your legends again and make them more descriptive.

Results: In your description of results (Section "Experiments"), when making statements that can be backed up by numbers in parenthesis, these numbers should be provided (such as F1 scores). There are a lot of F1 scores in your figures that can be easily used to back up your claims. Otherwise, the reader has to go look them up in the figures, and your statements seem empty. One good example where you do this is line 285, but this should be in all other results, too: "e.g., demonstrates that the model confidently classified snow (TPR = 99.4%) in contrast to the classification of snow-free ground with (TPR = 88.4%)".

Grammatical tense: Be careful about mixing up too many tenses; sometimes, you switch from past to present without making too much sense. For the sake of consistency, try to keep the same when talking about the same things: for example, keep past tense when talking about your experiments and present tense for the results.

Presentation of results and discussion: it seems to me that quite a lot of the results are simply repeated in the discussion, and that's not very interesting. I suggest that if a question comes up in the results, you discuss it immediately (for example, the negative effect of the solar variable). Otherwise, the reader doesn't get an explanation, reads on, forgets about it, and suddenly finds it again in the discussion. Instead of repeating results, the discussion, for example, also needs the limitations of CleanSnow.

Repetitiveness: your text is quite long, and I think you can make it shorter by removing unnecessary repetitions. Some things to remove are repetitions of things said previously in other sections ("as previously described in ..."), which can just be a reference to a section. "As shown in Figure ..." can just be a statement with a "(Figure number)". I tried to strike out some things that jumped up to me, but I'll let you have a look.

The problem of generalization: this comes back to the limitations of CleanSnow. You've shown that it generalizes well to stations within its training range but performs less well to those outside it. This is a normal limitation in ML but should be presented as such. CleanSnow will struggle when applied to a new station that is not within the distribution it's been trained on (which is normal because it's not like you did any transfer learning or something), but that's not a good generalization. So, I think your text needs more transparency about this limitation.

I have an open question for you: you briefly mentioned input anomalies in your discussion. Did you notice any particular behavior for 2022 and 2023 (seeing as they're strong temperature anomalies)?
Citation: https://doi.org/10.5194/egusphere-2024-1752-RC1
- AC2: 'Reply on RC1', Jan Svoboda, 15 Oct 2024
  
  Dear reviewer,
  thank you for your valuable comments. Please find our response in the attached PDF.
  Best regards,
  Jan Svoboda, on behalf of all co-authors
  
  Citation: https://doi.org/10.5194/egusphere-2024-1752-AC2
RC2:
'Comment on egusphere-2024-1752', Anonymous Referee #2, 03 Oct 2024
This study focused on the classification of snow height measurements based on AI techniques. However, the scientific significance was not sufficient enough, or why this work is important?

What’s the relationship between quality assessment and AI classification? In my view, this work doesn’t aim at improving data quality, just distinguishing possible anomalies from all station measurements. So how to reflect the advance of AI method in this work?

The structure of this article is not clear enough, please improve it and maintain some important research work. Now

In figure 1, how to determine the training and testing stations?

P3, lines 70-80. These descriptions should be moved to Section introduction.

P3, lines 80-85. Please give the physic basis.

Please provide a flowchart for this paper.

How to determine the truth data?

P6, lines 110-135. This paragraph should belong to methodology, thus, the title ‘3 Machine learning based snow cover classification’ is not suitable. This section should be method or methodology.

P8, ‘4.1 dataset’ should be introduced in methodology section, not here.

It is difficult for me to understand the logic and structure of this study.
Citation: https://doi.org/10.5194/egusphere-2024-1752-RC2
- AC3: 'Reply on RC2', Jan Svoboda, 15 Oct 2024
  
  Dear reviewer,
  thank you for your valuable comments. Please find our response in the attached PDF.
  Best regards,
  Jan Svoboda, on behalf of all co-authors
  
  Citation: https://doi.org/10.5194/egusphere-2024-1752-AC3

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Jan Svoboda on behalf of the Authors (12 Nov 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (28 Nov 2024) by Ludovic Räss

RR by Anonymous Referee #1 (13 Dec 2024)

ED: Publish subject to technical corrections (17 Dec 2024) by Ludovic Räss

AR by Jan Svoboda on behalf of the Authors (02 Jan 2025) Manuscript

Journal article(s) based on this preprint

17 Mar 2025

Towards deep-learning solutions for classification of automated snow height measurements (CleanSnow v1.0.2)

Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer

Geosci. Model Dev., 18, 1829–1849, https://doi.org/10.5194/gmd-18-1829-2025,https://doi.org/10.5194/gmd-18-1829-2025, 2025

Short summary

Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer

Data sets

Snow Height Classification Dataset Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer https://doi.org/10.16904/envidat.512

Model code and software

Towards deep learning solutions for classification of automated snow height measurements (CleanSnow v1.0.0) Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer https://doi.org/10.5281/zenodo.12698071

Jan Svoboda, Marc Ruesch, David Liechti, Corinne Jones, Michele Volpi, Michael Zehnder, and Jürg Schweizer

Viewed

Total article views: 3,649 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,962	863	824	3,649	105	171

HTML: 1,962
PDF: 863
XML: 824
Total: 3,649
BibTeX: 105
EndNote: 171

Views and downloads (calculated since 22 Jul 2024)

Month	HTML	PDF	XML	Total
Jul 2024	172	40	18	230
Aug 2024	314	114	36	464
Sep 2024	84	32	12	128
Oct 2024	148	52	60	260
Nov 2024	88	20	96	204
Dec 2024	28	20	102	150
Jan 2025	48	24	394	466
Feb 2025	48	22	48	118
Mar 2025	34	46	4	84
Apr 2025	24	24	4	52
May 2025	30	30	0	60
Jun 2025	30	60	2	92
Jul 2025	40	32	4	76
Aug 2025	92	22	6	120
Sep 2025	282	50	4	336
Oct 2025	40	60	0	100
Nov 2025	42	44	4	90
Dec 2025	68	42	8	118
Jan 2026	88	14	10	112
Feb 2026	102	36	2	140
Mar 2026	96	38	4	138
Apr 2026	19	30	2	51
May 2026	25	4	1	30
Jun 2026	5	1	0	6
Jul 2026	15	6	3	24

Cumulative views and downloads (calculated since 22 Jul 2024)

Month	HTML	PDF	XML	Total
Jul 2024	172	40	18	230
Aug 2024	314	114	36	464
Sep 2024	84	32	12	128
Oct 2024	148	52	60	260
Nov 2024	88	20	96	204
Dec 2024	28	20	102	150
Jan 2025	48	24	394	466
Feb 2025	48	22	48	118
Mar 2025	34	46	4	84
Apr 2025	24	24	4	52
May 2025	30	30	0	60
Jun 2025	30	60	2	92
Jul 2025	40	32	4	76
Aug 2025	92	22	6	120
Sep 2025	282	50	4	336
Oct 2025	40	60	0	100
Nov 2025	42	44	4	90
Dec 2025	68	42	8	118
Jan 2026	88	14	10	112
Feb 2026	102	36	2	140
Mar 2026	96	38	4	138
Apr 2026	19	30	2	51
May 2026	25	4	1	30
Jun 2026	5	1	0	6
Jul 2026	15	6	3	24

Viewed (geographical distribution)

Total article views: 3,637 (including HTML, PDF, and XML) Thereof 3,637 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 31 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (3776 KB)
Metadata XML

Short summary

Accurately measuring snow height is key for modeling approaches in climate sciences, snow hydrology and avalanche forecasting. Erroneous snow height measurements often occur when the snow height is low or changes, for instance, during a snowfall in the summer. We prepare a new benchmark dataset with annotated snow height data and demonstrate how to improve the measurement quality using modern deep learning approaches. Our approach can be easily implemented into a data pipeline for snow modeling.


Total:	0
HTML:	0
PDF:	0
XML:	0