Interactive Snow Avalanche Segmentation from Webcam Imagery: results, potential and limitations

Hafner, Elisabeth Doris; Kontogianni, Theodora; Caye Daudt, Rodrigo; Oberson, Lucien; Wegner, Jan Dirk; Schindler, Konrad; Bühler, Yves

doi:10.5194/egusphere-2024-498

Preprints

https://doi.org/10.5194/egusphere-2024-498

Preprints

19 Mar 2024

| 19 Mar 2024

Interactive Snow Avalanche Segmentation from Webcam Imagery: results, potential and limitations

Elisabeth Doris Hafner, Theodora Kontogianni, Rodrigo Caye Daudt, Lucien Oberson, Jan Dirk Wegner, Konrad Schindler, and Yves Bühler

Abstract. For many safety-related applications such as hazard mapping or road management, well documented avalanche events are crucial. Nowadays, despite research into different directions, the available data is mostly restricted to isolated locations where it is collected by observers in the field. Webcams are getting more frequent in the Alps and beyond, capturing numerous avalanche prone slopes several times a day. To complement the knowledge about avalanche occurrences, we propose to make use of this webcam imagery for avalanche mapping. For humans, avalanches are relatively easy to identify, but the manual mapping of their outlines is time intensive. Therefore, we propose to support the mapping of avalanches in images with a learned segmentation model. In interactive avalanche segmentation (IAS), a user collaborates with a deep learning model to segment the avalanche outlines, taking advantage of human expert knowledge while keeping the effort low thanks to the model's ability to delineate avalanches. The human corrections to the prediction in the form of positive clicks on the avalanche or negative clicks on the background result in avalanche outlines of good quality with little effort. Relying on IAS, we extract avalanches from the images in a flexible and efficient manner, resulting in a 90 % time saving compared to conventional manual mapping. If mounted in a stable position, the camera can be georeferenced with a mono-photogrammetry tool, allowing for exact geolocation of the avalanche outlines and subsequent use in geographical information systems (GIS). In this way all avalanches mapped in an image can be imported into a designated database, making them available for the relevant safety-related applications. For imagery, we rely on current and archive data from webcams that cover the Dischma valley near Davos, Switzerland and capture an image every 30 minutes during daytime since the winter 2019. Our model and the associated mapping pipeline represent an important step forward towards continuous and precise avalanche documentation, complementing existing databases and thereby providing a better base for safety-critical decisions and planning in avalanche-prone mountain regions.

Received: 20 Feb 2024 – Discussion started: 19 Mar 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 31428 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (31428 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

23 Aug 2024

Interactive snow avalanche segmentation from webcam imagery: results, potential, and limitations

Elisabeth D. Hafner, Theodora Kontogianni, Rodrigo Caye Daudt, Lucien Oberson, Jan Dirk Wegner, Konrad Schindler, and Yves Bühler

The Cryosphere, 18, 3807–3823, https://doi.org/10.5194/tc-18-3807-2024,https://doi.org/10.5194/tc-18-3807-2024, 2024

Short summary

Elisabeth Doris Hafner, Theodora Kontogianni, Rodrigo Caye Daudt, Lucien Oberson, Jan Dirk Wegner, Konrad Schindler, and Yves Bühler

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-498', Anonymous Referee #1, 17 Apr 2024

General comments:
The manuscript presents a novel dataset using state-of-the-art methodology that will benefit the avalanche research community. I think the novelty of the research is significant for publication, but the clarity of the writing needs to be improve. I suggest a few comments, especially in the methods section, that will enhance the clarity of the manuscript before publication. In addition, there is a lot of syntax errors (missing comas) that affect the general comprehension and the quality of the manuscript. I give more details in the specific and technical comments about these syntax errors.

Specific comments:
The introduction is well structured and the problematic well described.
Most of the method section is well written but some refinements are necessary to improve clarity in the section. My first specific comment is that many processing, model tuning, validation and testing was done on multiple datasets. I suggest the authors to dedicate a general paragraph at the beginning of the method section, that defines more the overall analysis of the paper. I spent quite a time figure out which analysis will be done on which dataset.
I understand that you split your dataset into a train, validate and test dataset. However, the difference between validation and test was not obvious at the beginning since the terms used are both similar in statistical validation context. It wasn’t explain until I read the last section of the method. I suggest you change the word “validate” for maybe “tuning” or something else, because this dataset was only been used to tune the hyperparameter (if I understand correctly).
The section model architecture is very technical, which is not bad, but it might be difficult to understand for the avalanche research community. This community is gonna be very interested in the dataset, but might not be very specialized in that field of IOS (like myself). Maybe you can add a few sentences to highlight the key element of the algorithm HRNet+OCR in a broader sense, before entering into the technical details describing how you adapt the original algorithm to avalanches.
The present tense was generally used to write the method and the result section, where usually the past tense is used in these sections. Please change to the past tense as the calculations and analysis were made in the past. There is also a few present tense that should be in the past tense, as you refers to your results.
There is a lot of missing comas within the manuscript that affect the comprehension of the sentences. These missing comas are always in the beginning of the sentence like for example :”In our user study the participants...” where there should be a coma between study and the participants. There is also cases where the coma is not missing. Please correct these missing comas to enhance the comprehension of the text. See technical comments for more.
Technical corrections:
Intro:
Line 28: Missing coma after “Depending on the source, “
Line 34 : Maybe change heightens to increases.
Line 39 : In opposition instead of opposed.
Line 50 : This sentence is unclear with a few missing comas. Maybe change it for “ However, where satellite data is available, areas affected by avalanches....”
Line 52: the link to the second part of this sentence is not clear, please rephrase or add a few words after the coma to make the link clearer between the canonical problem and instance segmentation for clarity.
Line 67 : Gaussians?
Line 69 : missing coma before “but”.
Line 73 : missing coma between “georeferencing” and “the”.
Line 75 : remove the word “make”.
Line 78 : I don't understand the words “real world application” in your objective. While I think the user study is significant and interesting, I think real world application doesn't apply to the analysis made.

Data:
Line 85 : Suggestion : “Our webcams network covered the whole Dischma valley...”
Line 86 : I think the unit needs to be separate from the value in EGU pub like “ 13 km”.
Line 98 : The sentence “With steep mountains on both sides of the valley over
80% of the entire area are potential avalanche terrain” needs to be rephrase for clarity.
Line 99 : Missing coma between “settlements” and “avalanches”
Figure 1: It is unclear to me which field view correspond to which cameras. Please consider using polyline (transparent in the middle) instead of polygon. Add arrows to clearly show which direction the cameras are looking. I think there is too many details with the swisstopo background that are not needed. I suggest removing some of them could enhance the clarity of the map.
Line 104 : Missing coma after “training”.
Line 105 : Missing coma after “our user stud”.
Line 109 : Missing coma after “validation” and before “while, and after “this” and “we”.
Line 113 : Missing coma “For our user study, we relied”.
Line 123 : Missing coma “validation, which” and “dataset, we”.
Line 125 : Missing coma “comparison, their”.

Methods:
Line 132 : Is it 5 pixels?
Line 133 : Missing coma “segmentation, the”.
Line 137 : Missing coma “implementation, it”.
Line 138 : Does that means that Images are randomly selected during the training for the a user to manually click on the images. Maybe add a sentence to state when the fine tuning is made with the user input (Figure 4). Is it on every images or randomly selected?
Line 142 : Missing coma “mode, the”
Line 157 : Missing coma “the masks, we report”.
Line 159 : Unclear formulation “since the we aim for a high”.
Line 162 : Missing coma “object-level, we compare”.
Line 163 : Please change the letter t for the threshold, as t is already use to explain time step in Figure 4.
Line 164 : Missing coma “Like Fox et al. (2023), we first”.
Line 167 : Missing coma “the matches, we compute”.
Line 173 : Missing coma “our webcam imagery, we evaluate”
Line 180 : is it hyperparameters?
Line 183 : Missing coma “In addition, we compare”.
Line 190 : I suggest to put “we carried out a small user study” at the beginning of the sentence.
Line 191 : Missing coma “user study, we used”.
Line 192 : Missing coma “per click, as well”, since its a enumeration.
Line 194 : Missing coma “in UserPic, the participant”.
Line 197 : Missing coma “user study, we report”.
Line 198 : Missing coma “the NoC 20 @85, as well”, since its a enumeration.
Line 201 : Missing coma “significant, we used”.
Results:
Line 204 : The beginning of this sentence is unclear “Evaluating on the SLF test the AvaWeb”.
Line 208 : Missing coma “baseline, all models”.
Line 209 : Missing coma “Overall, the”.
Line 212 : Missing coma “analyses, we are”.
Figure 6 : Please take the same font as in the text for the figure.
Line 217 : Missing coma “For all models, the images”.
Figure 7 : Please specify what is GT? Is it ground truth?
Line 220 : Missing coma “GroundPic, the AvaWeb”, and “about 10%, while it”.
Line 224 : Missing coma “those avalanches, the IoU”.
Line 225 : Missing coma “avalanches, the AvaWeb”.
Line 226 : Unclear sentence “while this for the AvaPic and AvaMix this is the case for less than 1% of all avalanches”.
Line 227 : Missing coma “same images, which depict”.
Line 230 : Missing coma “bounding boxes, the AvaWeb”.
Line 232 : Missing coma “AvaMix, the F1”.
Line 235 : Missing coma “User Study, we loaded”.
Figure 8-9-10 : The font style is not consistent in this figure.
Line 241 : Missing coma “On average, participants”.
Line 248 : Missing coma”User study, we observed”.
Line 250-251 : Please rephrase this sentence to make it more clear, or maybe make two sentences.
Line 254 : Please rephrase this sentence to make it more clear , especially the end:”While they are not for IoU@1 and IoU@2 (t-test: p-value:
> 0.05), for IoU@3 (p-value= 0.045), IoU@4 (p-value= 0.034) and IoU@5 (p-value= 0.035) they are.”
line 257 : Replase “eachother” by “each other”.

Discussion:
Line 263 : The syntax of this sentence is problematic, maybe remove “outlines” to make it more clear or rephrase it.
Line 265 : Missing coma “(GroundPic), but fails”.
Line 271 : Missing coma “SLF dataset, help”, and also maybe before “following”.
Line 276 : Missing coma “the avalanche, resulting in”.
Line 277 : Missing coma “But overall, the AvaWeb”.
Line 279 : Missing coma “approximately 20% lower IoU”.
Line 281 : Missing coma “imagery, which the model”.
Line 282 : Missing coma “this paper, but for”. Maybe try “this paper but, future work should consider experimenting...”
Line 284 : Missing coma “automated method, Fox et al. (2023)”, and “overlap, which is”.
Line 286 : Add the word “that” in “We capture the area that the avalanche covered...”
Line 288 : Missing coma “user study, the participants. And also the term User study is sometimes written with a capital U and sometimes not, this needs to be consistent.
Line 288 : In this sentence “the best performance are as good as the simulation”, I think the past tense “were” is more appropriate.
Line 294 : Missing coma “manual mapping, using IAS”.
Line 295 : Missing coma “average size 1.75), that take less”
Line 296 : Missing coma “new avalanches, the user”.
Line 297 : Missing coma “Hafner et al. (2023), the mean”.
Line 298 : I think the past tense (were) is more appropriate in “ are within 5% of each other and all have an IoU”
Figure 14 : Missing North arrow in the map.
Line 305 : Missing coma “the winter, leading to more”
Line 206 : The coma should be after “however”, not “requires”.
Line 309 : Missing coma “Without that, the application”.
Line 310 : Missing coma “warning service, while all other”.
Line 316 : Missing coma “mapping avalanches IAS saves”.
Line 317 : This sentence is unclear “since the avalanches were time was recorded were rather small”.
Line 317 : Why is this sentence being one paragraph?

Conclusion:
Line 321 : Past tense “were” in “the predictions are simulated,”.
Line 322 : Missing coma “With IAS, a human user”.
Line 324 : Missing coma “60 minutes, increases the likelihood”.
Line 331 : Missing coma “is stable, the georeferencing”.
Line 331 : The last part of the sentence in unclear, please rephrase “like done before for webcam-based snow cover monitoring (Portenier et al., 2020)”.
Line 332 : Missing coma “In the future, existing approaches”.
Line 338 : Missing coma “more reliable, compared to the”.
Line 343 : I would remove “as is “ in “The model as is may also be used to”.
Line 343 : Replace “These” by “this”.
Line 345 : Maybe add “avalanche annotations” to “thereby getting more accurate and reliable avalanche annotations in the future.”.
Line 345: Missing coma “Overall, this”.

Citation: https://doi.org/10.5194/egusphere-2024-498-RC1
- AC1: 'Reply on RC1', Elisabeth D. Hafner, 04 Jun 2024
  
  Dear anonymous reviewer,
  
  Thank you very much for your detailed feedback to our manuscript! We greatly appreciate the time spent on formulating detailed comments and suggestions. Please find below the answers to both specific and technical comments:
  Specific comments:
  
  We will add a general paragraph at the beginning of the method section, that introduces and describes the overall analysis of the paper better.
  
  We will define validation and test set and strive to correct sentences with ambiguous meaning.
  
  We will add a few sentences to describe the key elements of the HRNet+OCR algorithm in simple words, before entering the technical details.
  
  We will correct the tense in the method and the result section.
  
  Technical comments:
  
  Line 28: We will add the missing comma.
  
  Line 34 : We will change heightens to increases.
  
  Line 39 : We will use in opposition instead of opposed.
  
  Line 50 : We will change this sentence as proposed.
  
  Line 52: We will rephrase this sentence for clarity.
  
  Line 67 : When discs are used to encode clicks the whole area specified by the radius is given the same weight. When clicks are encoded as Gaussians the weight is distributed as a Gaussian distribution with decreasing weight from the center over the area specified by the radius. We will explain what we mean by Gaussians in the revised version of the manuscript.
  
  Line 69 : We will add the missing comma.
  
  Line 73 : We will add the missing comma.
  
  Line 75 : We will remove the word “make”.
  
  Line 78 : The challenge with Interactive Deep Learning models is that during training the human user interactions (clicks in our case) are modeled. In the worst case the modeled behavior does not represent reality well, and the model is useless when used by a human. Consequently, testing if the way the user interactions are modeled lead to a model speeding up segmentation for a human using it is essential. We will change this passage to clarify for future readers.
  
  Data:
  
  Line 85 : We will change the structure of this sentence as suggested.
  
  Line 86 : We will add a placeholder between the number and unit.
  
  Line 98 : We will rephrase this sentence for clarity.
  
  Line 99 : We will add the missing comma.
  
  Figure 1: We will adapt this figure to improve readability and information retrieval.
  
  Line 104 : We will add the missing comma.
  
  Line 105 : We will add the missing comma.
  
  Line 109 : We will add the missing commas.
  
  Line 113 : We will add the missing comma.
  
  Line 123 : We will add the missing comma.
  
  Line 125 : We will add the missing comma.
  
  Methods:
  
  Line 132 : Yes, this is it 5 pixels, we will add this information to the manuscript.
  
  Line 133 : We will add the missing comma.
  
  Line 137 : We will add the missing comma.
  
  Line 138 : We are not entirely sure we understood your question: The images are not randomly selected, instead there is a fixed split to a training, validation, and test set (see Table 1). The random and iterative sampling strategies in this sentence refer to the simulation of user input in the form of clicks for training, validation, and testing. The simulated user clicks are partially placed in random locations and partially in the area with the largest error (see next line). There is no human clicking involved in training the model. This is also the reason we did a user study (see also comment to line 78)
  
  Line 142 : We will add the missing comma.
  
  Line 157 : We will add the missing comma.
  
  Line 159 : We will split this sentence up for better clarity: “Achieving a high IoU after few clicks makes the model most useful. Consequently, we compare the IoU at click k (for k = 1,2,....,20) averaged over all the images (mIoU@k).”.
  
  Line 162 : We will add the missing comma.
  
  Line 163 : We will change the abbreviation for threshold to T in the whole document to avoid confusion with the timestep in Figure 4.
  
  Line 164 : We will add the missing comma.
  
  Line 167 : We will add the missing comma.
  
  Line 173 : We will add the missing comma.
  
  Line 180 : Yes, this is hyperparameters, we will specify this in the text.
  
  Line 183 : We will add the missing comma.
  
  Line 190 : We will put “we carried out a small user study” at the beginning of the sentence as suggested.
  
  Line 191 : We will add the missing comma.
  
  Line 192 : We will add the missing comma.
  
  Line 194 : We will add the missing comma.
  
  Line 197 : We will add the missing comma.
  
  Line 198 : We will add the missing comma.
  
  Line 201 : We will add the missing comma.
  
  Results:
  
  Line 204 : We will adapt this sentence to improve clarity.
  
  Line 208 : We will add the missing comma.
  
  Line 209 : We will add the missing comma.
  
  Line 212 : We will add the missing comma.
  
  Figure 6 : We do not agree that the text and figures need to have exactly the same font. To improve readability and appearance we will harmonize the font in our figures.
  
  Line 217 : We will add the missing comma.
  
  Figure 7 : It is ground truth, yes. We will replace the abbreviation with “Ground truth” in Figure 7, 9 and 10.
  
  Line 220 : We will add the missing comma.
  
  Line 224 : We will add the missing comma.
  
  Line 225 : We will add the missing comma.
  
  Line 226 : We will change the sentence to: “For more than one fourth of all avalanches, the AvaWeb never reaches the NoC20@85, while for the AvaPic and AvaMix less than 1% of all avalanches never reach an IoU of 85%”
  
  Line 227 : We will add the missing comma.
  
  Line 230 : We will add the missing comma.
  
  Line 232 : We will add the missing comma.
  
  Line 235 : We will add the missing comma.
  
  Figure 8-9-10 : We do not agree that the text and figures need to have exactly the same font. To improve readability and appearance we will harmonize the font in our figures.
  
  Line 241 : We will add the missing comma.
  
  Line 248 : We will add the missing comma.
  
  Line 250-251 : We will rephrase this sentence to make it more clear, or maybe make two sentences.
  
  Line 254 : We will rephrase this sentence to “For clicks 1 to 5, where we had enough samples from all participants, we tested if the differences between the highest and the lowest mIoU are statistically significant. The differences are not significant for IoU@1 and IoU@2 (t-test: p-value: $>0.05$) but they are statistically significant for IoU@3 (p-value= 0.045), IoU@4 (p-value= 0.034) and IoU@5 (p-value= 0.035).”
  
  Line 257 : We will replace “eachother” by “each other”.
  
  Discussion:
  
  Line 263 : We will remove “outlines” from the sentence.
  
  Line 265 : We will add the missing comma.
  
  Line 271 : We will add the missing comma.
  
  Line 276 : We will add the missing comma.
  
  Line 277 : We will add the missing comma.
  
  Line 279 : We will add the missing comma.
  
  Line 281 : We will add the missing comma.
  
  Line 282 : We will add the missing comma.
  
  Line 284 : We will add the missing comma.
  
  Line 286 : We will add the word “that” to this sentence.
  
  Line 288 : We will add the missing comma and we will harmonize the way “user study” is written throughout the text.
  
  Line 288 : We will change to past tense.
  
  Line 294 : We will add the missing comma.
  
  Line 295 : We will add the missing comma.
  
  Line 296 : We will add the missing comma.
  
  Line 297 : We will add the missing comma.
  
  Line 298 : We will change to “were” in this sentence
  
  Figure 14 : We will add a North arrow to the map.
  
  Line 305 : We will add the missing comma.
  
  Line 206 : We will move the comma.
  
  Line 309 : We will add the missing comma.
  
  Line 310 : We will add the missing comma.
  
  Line 316 : We will add the missing comma.
  
  Line 317 : We will change this sentence to “Compared to the traditional way of mapping avalanches, IAS saves over 90\% time. We believe that the time saved may be even greater since the avalanches with a time recording were rather small (mean size 1.75) and all located in an area well known to the person mapping.” to increase clarity.
  
  Line 317 : We will merge this sentence with the above to avoid a one-sentence-paragraph.
  
  Conclusion:
  
  Line 321 : We changed the tense to were.
  
  Line 322 : We will add the missing comma.
  
  Line 324 : We will add the missing comma.
  
  Line 331 : We will add the missing comma.
  
  Line 331 : We will change to “Assuming the camera position and area captured is stable, the georeferencing can be reused for all subsequent images. In the past this has been done for webcam-based snow cover monitoring (Portenier et al., 2020).”
  
  Line 332 : We will add the missing comma.
  
  Line 338 : We will add the missing comma.
  
  Line 343 : We will remove “as is “ in “The model as is may also be used to”.
  
  Line 343 : We will replace “these” by “this”.
  
  Line 345 : We will add “avalanche annotations” to “thereby getting more accurate and reliable avalanche annotations in the future.”.
  
  Line 345: We will add the missing comma.
  
  Citation: https://doi.org/10.5194/egusphere-2024-498-AC1
CC1:
'Comment on egusphere-2024-498, Interactive Snow Avalanche Segmentation from Webcam Imagery: results, potential and limitations', Ron Simenhois, 13 May 2024

General comments:
This manuscript presents an innovative, relatively simple, cost-effective tool to dramatically improve avalanche observations while taking advantage of infrastructure already in many places. I think it is worthy of publication with a few changes.
The authors describe four different models, but all of these models have the same structure. The main difference is the dataset on which they were trained. This is not entirely clear from the manuscript. I suggest you mention it clearly for better clarity.
The point above suggests that the comparison is between the training datasets on which the models were trained. This point and the implications of the results are missing from the discussion and the conclusions. The authors are missing the opportunity to highlight and bridge their results to practical implications. The manuscript value will increase if the authors add some guides on the preferred way to train similar systems (e.g., For a system that deals with specific cameras in specific locations (like the SLF cameras), start with a pre-trained model on the COCO+LVIS dataset and then do transfer learning on the specific images from the system's cameras. For systems with many cameras, start with the base models and use a large dataset (like the AvaMix) for better generalization).
Finally, I will echo Referee #1's. Please correct the typos, punctuation, and grammar inconsistencies in the manuscript.
I added specific comments in the uploaded file.

Citation: https://doi.org/10.5194/egusphere-2024-498-CC1
- AC2: 'Reply on CC1', Elisabeth D. Hafner, 04 Jun 2024
  
  Dear Ron Simenhois,
  
  thank you for taking the time to read our manuscript and for giving detailed feedback about passages that need improvement for better readability and passages that are ambiguous and need to be corrected for clarity.
  
  Please find below the answers your comments:
  
  Different models: We will go over the manuscript and make it clear that our models (AvaWeb,..) differ in the data used to train them (and the number of epochs like noted in 3.3), but not in the model architecture.
  
  Model comparison: In our discussion we mention that we believe the coarseness of the annotations in the AvaPic prevents the model from learning all it could from such a large and diverse dataset. We expect the best model performance from training with a large dataset with fine annotations covering various perspectives, avalanche types, avalanche sizes as well as snow and illumination conditions. We will expand this section in the discussion chapter and explicitly describe the implications and recommendations for practice that we have found, picking them up again in the conclusion.
  
  We will correct the typos, punctuation, and grammar inconsistencies in the manuscript as already promised to Reviewer1.
  
  Line 32/324: We will replace the “between 10 and 60 minutes” with a general statement or leave it altogether.
  
  Line 39: True, this would be a good place. We might introduce segmentation here or elsewhere, but definitely before the reader needs to know what it is to understand what s/he is reading.
  
  Line 62: We will remove “mask” here.
  
  Line 70: You are right, it is possible to georeference any image where enough persistent objects in the image with known coordinates are identified. We meant to emphasize that for cameras in a stable position this process can be done once and reused for all subsequent images. In contrast, each image with a unique perspective needs to be individually georeferenced, resulting in a comparably higher effort per image. We will adapt our manuscript specifying what we mean.
  
  Line 132: We will add “pixels” as a unit here.
  
  Line 159: We will remove the redundant word “the”.
  
  Line 164: Fox et al. (2023) trained the model with an Intersection Over Union (IoU) threshold of 0.2 and a confidence threshold of 0.25. But when testing they used an IoU threshold of 0.05 and a “confidence threshold which maximizes the model F1 score” (see caption to Table 2 in Fox et al., 2023).
  
  Line 175: We will rewrite this description to make our point clearer.
  
  Line 227: You are right, it does not come as a surprise that having seen avalanches in training is beneficial to segmenting one later. We believe describing this in more detail in Sect 4 does not fit, but we will mention this in the discussion section.
  
  Line 230ff: Fox et al. (2023) state they achieve an F1 score of 64.0 ± 0.6 which we have correctly copied to Table 4. For the number in this line, we meant to compare F1 scores neglecting standard deviation. The difference is however 0.12 and not 0.13 (0.64 vs. 0.76). We will correct this mistake and add the most important F1 scores to the text to avoid confusion and allow the text to stand-alone.
  
  Table 5: We will add the number of images part of the UserPic to the caption of this table.
  
  Line 286: It is identical IoU of 5% for the bounding boxes: Fox et al. (2023)’s confidence threshold for this F1 score is unknown (see comment to line 164). We thresholded our raw predictions, which could also be called model confidence, at 0.5 (see line 153). This value was determined by analyzing mean IoU scores per click on the validation set.
  
  Consequently, the F1 scores we compare are both based on the confidence threshold the respective authors found to work best.
  
  Line 296/318: We will add the appropriate unit, in this case the European avalanche size scale.
  
  Line 321: We will replace prediction with segmentation.
  
  Citation: https://doi.org/10.5194/egusphere-2024-498-AC2
RC2:
'Comment on egusphere-2024-498', Anonymous Referee #2, 16 May 2024

The authors have presented a new and valuable tool for detecting and mapping avalanches from webcam imagery, which will contribute significantly to advancing avalanche research as well as public warning services. It is clear that a lot of work has been put into the study and the manuscript is worthy of publication, however some revisions are needed which will make the work more understandable to a wider audience, and not only to those involved in avalanche research or those who have interest in the application of deep learning models.

Specific comments.
Fig.1 needs improving. It is somewhat confusing at first glance. There are 5(maybe 6??) different coloured areas, which I assumed was the field of view covered by the different cameras at each location, but it isn't immediately clear which area belongs to which location. Also, it isn't clear what the areas with diagonal lines correspond to.
Sect 2.2. SLF dataset: Does this dataset of annotated avalanches contain unique avalanches, or are some of the annotated avalanches simply the same avalanche captured under different light conditions, or at a different angle? If so, would this have any impact on the model specificity (for example with AvaWeb that performed well on the Webcam images but poorer on the more generalized datasets)
Sect 3.1 Model architecture: this section could be improved to be more reader-friendly to researchers/general public interested in the topic but who are not familiar with deep learning models. As someone who has some relatively basic experience of automatic avalanche segmentation but even less knowledge with deep learning, I found this section quite heavy to understand and needed to re-read some sentences or google certain terminology/abbreviations to try and follow this section. Here I would recommend additional descriptions for HRNet+OCR, perhaps elaborating on the meaning of "tensor" and "discs" (amongst others) to aid the understanding without the reader having to google or dive into the references first which disrupts the flow of the reading.
Sect 3.3 Experimental setup: what is COC+LVIS? There is no reference and needs a little more description. I struggled to understand the relation between this baseline model and the earlier mentioned HRNet+OCR and Conv1S. Could these be represented in Fig 4 for example where there is currently just a box for Deep Learning Model?
Also, when introducing the 3 additional models (AvaWeb, AvaPic, AvaMix) that have been created from training the baseline with the different datasets, it is worth re-emphasising the difference in size of training datasets, which you have mentioned in the discussion. It would be useful to have made this point in the description of the experimental setup as something to keep in mind before presenting the results.

Technical comments.
The 2 earlier referees have made some points about many missing commas, but I personally didn't really find this to be a major hurdle when trying to read the manuscript so I have relatively few comments. However, there are some typos and a few suggestions for alternative words:
L4. "becoming more frequent" instead of "getting more frequent"
L32. I prefer to say "often" rather than "oftentimes"
L33. change "an" to "a"
L75. Add "of" to "make use OF webcam"
L85. change "webcams network" to "webcam network"
L122. change "their UIBK" to "the UIBK"
L123. change "cropped" to "cropping"
L134. 2 instances of Fig.4
L182. change "evaluate on the" to "evaluate the"
L190. change "users who's" to "users whose"
L225. change one fourth to one quarter
L226. change "while this for the" to "while for"
L246. change "to a more" to "to greater"
L289. I think "exceed" is a better word to use than "beat" in a scientific publication

Citation: https://doi.org/10.5194/egusphere-2024-498-RC2
- AC3: 'Reply on RC2', Elisabeth D. Hafner, 04 Jun 2024
  
  Dear anonymous reviewer,
  thank you very much for your detailed comments and suggestions to improve the quality of our manuscript!
  Please find below the answers to your comments:
  
  Fig. 1: We will make this figure better readable by changing the way the field of view per camera is displayed.
  Sect. 2.2. The dataset includes selected avalanches twice, captured under different illumination conditions. These avalanches are of course not split between the datasets for training and testing. We believe this helps the model to become robust and independent of the illumination conditions. We see no influence on the ability of the AvaWeb to generalize better or worse to unknown view angles.
  Sect 3.1. We will restructure and expand the model description for better readability and easier understanding for the readers with little deep-learning background.
  Sect 3.3. COCO+LVIS is a combination of COCO (an image dataset for object detection) and LVIS (a large-scale instance segmentation dataset). Both are publicly available datasets with a total of 104k images and 1.6M instance-level masks. They are widely used for training, testing, and comparing models. In our case baseline refers to the HRNet+OCR trained on this dataset. The AvaWeb, AvaPic, AvaMix refer to the HRNet+OCR trained with Avalanche images “on top of” COCO and LVIS. In other words, we use pretrained weights from COCO+LVIS to then fine-tune to avalanches.
  
  HRNet+OCR is a model for semantic segmentation, initially not used for interactive object segmentation. Conv1S is the solution to feed user corrections into the HRNet+OCR without losing the advantages of the pretraining on COCO+LVIS.
  We will expand this section in the revised version of the manuscript to make it easier to understand, also for the readers with little deep-learning background. We do not think showing details of the HRNet+OCR in Figure 4 would increase understanding. We will re-emphasize the difference in the size of training datasets in Section 3.3.
  Technical comments.
  L4. We will change to "becoming more frequent" instead of "getting more frequent".
  L32. We will use "often" rather than "oftentimes".
  L33. We will change "an" to "a".
  L75. We will change to “we propose to use webcam infrastructure” according to the suggestion from reviewer 1.
  L85. We will correct "webcams network" to "webcam network".
  L122. We will change "their UIBK" to "the UIBK".
  L123. We will change "cropped" to "cropping".
  L134. We will remove one instance of Fig.4.
  L182. We will change "evaluate on the" to "evaluate the".
  L190. We will change "users who's" to "users whose".
  L225. We will change one fourth to one quarter.
  L226. We will change "while this for the" to "while for".
  L246. We will change "to a more" to "to greater".
  L289. We will replace “beat” by "exceed" to use language more appropriate for a scientific publication.
  
  Citation: https://doi.org/10.5194/egusphere-2024-498-AC3

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-498', Anonymous Referee #1, 17 Apr 2024

General comments:
The manuscript presents a novel dataset using state-of-the-art methodology that will benefit the avalanche research community. I think the novelty of the research is significant for publication, but the clarity of the writing needs to be improve. I suggest a few comments, especially in the methods section, that will enhance the clarity of the manuscript before publication. In addition, there is a lot of syntax errors (missing comas) that affect the general comprehension and the quality of the manuscript. I give more details in the specific and technical comments about these syntax errors.

Specific comments:
The introduction is well structured and the problematic well described.
Most of the method section is well written but some refinements are necessary to improve clarity in the section. My first specific comment is that many processing, model tuning, validation and testing was done on multiple datasets. I suggest the authors to dedicate a general paragraph at the beginning of the method section, that defines more the overall analysis of the paper. I spent quite a time figure out which analysis will be done on which dataset.
I understand that you split your dataset into a train, validate and test dataset. However, the difference between validation and test was not obvious at the beginning since the terms used are both similar in statistical validation context. It wasn’t explain until I read the last section of the method. I suggest you change the word “validate” for maybe “tuning” or something else, because this dataset was only been used to tune the hyperparameter (if I understand correctly).
The section model architecture is very technical, which is not bad, but it might be difficult to understand for the avalanche research community. This community is gonna be very interested in the dataset, but might not be very specialized in that field of IOS (like myself). Maybe you can add a few sentences to highlight the key element of the algorithm HRNet+OCR in a broader sense, before entering into the technical details describing how you adapt the original algorithm to avalanches.
The present tense was generally used to write the method and the result section, where usually the past tense is used in these sections. Please change to the past tense as the calculations and analysis were made in the past. There is also a few present tense that should be in the past tense, as you refers to your results.
There is a lot of missing comas within the manuscript that affect the comprehension of the sentences. These missing comas are always in the beginning of the sentence like for example :”In our user study the participants...” where there should be a coma between study and the participants. There is also cases where the coma is not missing. Please correct these missing comas to enhance the comprehension of the text. See technical comments for more.
Technical corrections:
Intro:
Line 28: Missing coma after “Depending on the source, “
Line 34 : Maybe change heightens to increases.
Line 39 : In opposition instead of opposed.
Line 50 : This sentence is unclear with a few missing comas. Maybe change it for “ However, where satellite data is available, areas affected by avalanches....”
Line 52: the link to the second part of this sentence is not clear, please rephrase or add a few words after the coma to make the link clearer between the canonical problem and instance segmentation for clarity.
Line 67 : Gaussians?
Line 69 : missing coma before “but”.
Line 73 : missing coma between “georeferencing” and “the”.
Line 75 : remove the word “make”.
Line 78 : I don't understand the words “real world application” in your objective. While I think the user study is significant and interesting, I think real world application doesn't apply to the analysis made.

Data:
Line 85 : Suggestion : “Our webcams network covered the whole Dischma valley...”
Line 86 : I think the unit needs to be separate from the value in EGU pub like “ 13 km”.
Line 98 : The sentence “With steep mountains on both sides of the valley over
80% of the entire area are potential avalanche terrain” needs to be rephrase for clarity.
Line 99 : Missing coma between “settlements” and “avalanches”
Figure 1: It is unclear to me which field view correspond to which cameras. Please consider using polyline (transparent in the middle) instead of polygon. Add arrows to clearly show which direction the cameras are looking. I think there is too many details with the swisstopo background that are not needed. I suggest removing some of them could enhance the clarity of the map.
Line 104 : Missing coma after “training”.
Line 105 : Missing coma after “our user stud”.
Line 109 : Missing coma after “validation” and before “while, and after “this” and “we”.
Line 113 : Missing coma “For our user study, we relied”.
Line 123 : Missing coma “validation, which” and “dataset, we”.
Line 125 : Missing coma “comparison, their”.

Methods:
Line 132 : Is it 5 pixels?
Line 133 : Missing coma “segmentation, the”.
Line 137 : Missing coma “implementation, it”.
Line 138 : Does that means that Images are randomly selected during the training for the a user to manually click on the images. Maybe add a sentence to state when the fine tuning is made with the user input (Figure 4). Is it on every images or randomly selected?
Line 142 : Missing coma “mode, the”
Line 157 : Missing coma “the masks, we report”.
Line 159 : Unclear formulation “since the we aim for a high”.
Line 162 : Missing coma “object-level, we compare”.
Line 163 : Please change the letter t for the threshold, as t is already use to explain time step in Figure 4.
Line 164 : Missing coma “Like Fox et al. (2023), we first”.
Line 167 : Missing coma “the matches, we compute”.
Line 173 : Missing coma “our webcam imagery, we evaluate”
Line 180 : is it hyperparameters?
Line 183 : Missing coma “In addition, we compare”.
Line 190 : I suggest to put “we carried out a small user study” at the beginning of the sentence.
Line 191 : Missing coma “user study, we used”.
Line 192 : Missing coma “per click, as well”, since its a enumeration.
Line 194 : Missing coma “in UserPic, the participant”.
Line 197 : Missing coma “user study, we report”.
Line 198 : Missing coma “the NoC 20 @85, as well”, since its a enumeration.
Line 201 : Missing coma “significant, we used”.
Results:
Line 204 : The beginning of this sentence is unclear “Evaluating on the SLF test the AvaWeb”.
Line 208 : Missing coma “baseline, all models”.
Line 209 : Missing coma “Overall, the”.
Line 212 : Missing coma “analyses, we are”.
Figure 6 : Please take the same font as in the text for the figure.
Line 217 : Missing coma “For all models, the images”.
Figure 7 : Please specify what is GT? Is it ground truth?
Line 220 : Missing coma “GroundPic, the AvaWeb”, and “about 10%, while it”.
Line 224 : Missing coma “those avalanches, the IoU”.
Line 225 : Missing coma “avalanches, the AvaWeb”.
Line 226 : Unclear sentence “while this for the AvaPic and AvaMix this is the case for less than 1% of all avalanches”.
Line 227 : Missing coma “same images, which depict”.
Line 230 : Missing coma “bounding boxes, the AvaWeb”.
Line 232 : Missing coma “AvaMix, the F1”.
Line 235 : Missing coma “User Study, we loaded”.
Figure 8-9-10 : The font style is not consistent in this figure.
Line 241 : Missing coma “On average, participants”.
Line 248 : Missing coma”User study, we observed”.
Line 250-251 : Please rephrase this sentence to make it more clear, or maybe make two sentences.
Line 254 : Please rephrase this sentence to make it more clear , especially the end:”While they are not for IoU@1 and IoU@2 (t-test: p-value:
> 0.05), for IoU@3 (p-value= 0.045), IoU@4 (p-value= 0.034) and IoU@5 (p-value= 0.035) they are.”
line 257 : Replase “eachother” by “each other”.

Discussion:
Line 263 : The syntax of this sentence is problematic, maybe remove “outlines” to make it more clear or rephrase it.
Line 265 : Missing coma “(GroundPic), but fails”.
Line 271 : Missing coma “SLF dataset, help”, and also maybe before “following”.
Line 276 : Missing coma “the avalanche, resulting in”.
Line 277 : Missing coma “But overall, the AvaWeb”.
Line 279 : Missing coma “approximately 20% lower IoU”.
Line 281 : Missing coma “imagery, which the model”.
Line 282 : Missing coma “this paper, but for”. Maybe try “this paper but, future work should consider experimenting...”
Line 284 : Missing coma “automated method, Fox et al. (2023)”, and “overlap, which is”.
Line 286 : Add the word “that” in “We capture the area that the avalanche covered...”
Line 288 : Missing coma “user study, the participants. And also the term User study is sometimes written with a capital U and sometimes not, this needs to be consistent.
Line 288 : In this sentence “the best performance are as good as the simulation”, I think the past tense “were” is more appropriate.
Line 294 : Missing coma “manual mapping, using IAS”.
Line 295 : Missing coma “average size 1.75), that take less”
Line 296 : Missing coma “new avalanches, the user”.
Line 297 : Missing coma “Hafner et al. (2023), the mean”.
Line 298 : I think the past tense (were) is more appropriate in “ are within 5% of each other and all have an IoU”
Figure 14 : Missing North arrow in the map.
Line 305 : Missing coma “the winter, leading to more”
Line 206 : The coma should be after “however”, not “requires”.
Line 309 : Missing coma “Without that, the application”.
Line 310 : Missing coma “warning service, while all other”.
Line 316 : Missing coma “mapping avalanches IAS saves”.
Line 317 : This sentence is unclear “since the avalanches were time was recorded were rather small”.
Line 317 : Why is this sentence being one paragraph?

Conclusion:
Line 321 : Past tense “were” in “the predictions are simulated,”.
Line 322 : Missing coma “With IAS, a human user”.
Line 324 : Missing coma “60 minutes, increases the likelihood”.
Line 331 : Missing coma “is stable, the georeferencing”.
Line 331 : The last part of the sentence in unclear, please rephrase “like done before for webcam-based snow cover monitoring (Portenier et al., 2020)”.
Line 332 : Missing coma “In the future, existing approaches”.
Line 338 : Missing coma “more reliable, compared to the”.
Line 343 : I would remove “as is “ in “The model as is may also be used to”.
Line 343 : Replace “These” by “this”.
Line 345 : Maybe add “avalanche annotations” to “thereby getting more accurate and reliable avalanche annotations in the future.”.
Line 345: Missing coma “Overall, this”.

Citation: https://doi.org/10.5194/egusphere-2024-498-RC1
- AC1: 'Reply on RC1', Elisabeth D. Hafner, 04 Jun 2024
  
  Dear anonymous reviewer,
  
  Thank you very much for your detailed feedback to our manuscript! We greatly appreciate the time spent on formulating detailed comments and suggestions. Please find below the answers to both specific and technical comments:
  Specific comments:
  
  We will add a general paragraph at the beginning of the method section, that introduces and describes the overall analysis of the paper better.
  
  We will define validation and test set and strive to correct sentences with ambiguous meaning.
  
  We will add a few sentences to describe the key elements of the HRNet+OCR algorithm in simple words, before entering the technical details.
  
  We will correct the tense in the method and the result section.
  
  Technical comments:
  
  Line 28: We will add the missing comma.
  
  Line 34 : We will change heightens to increases.
  
  Line 39 : We will use in opposition instead of opposed.
  
  Line 50 : We will change this sentence as proposed.
  
  Line 52: We will rephrase this sentence for clarity.
  
  Line 67 : When discs are used to encode clicks the whole area specified by the radius is given the same weight. When clicks are encoded as Gaussians the weight is distributed as a Gaussian distribution with decreasing weight from the center over the area specified by the radius. We will explain what we mean by Gaussians in the revised version of the manuscript.
  
  Line 69 : We will add the missing comma.
  
  Line 73 : We will add the missing comma.
  
  Line 75 : We will remove the word “make”.
  
  Line 78 : The challenge with Interactive Deep Learning models is that during training the human user interactions (clicks in our case) are modeled. In the worst case the modeled behavior does not represent reality well, and the model is useless when used by a human. Consequently, testing if the way the user interactions are modeled lead to a model speeding up segmentation for a human using it is essential. We will change this passage to clarify for future readers.
  
  Data:
  
  Line 85 : We will change the structure of this sentence as suggested.
  
  Line 86 : We will add a placeholder between the number and unit.
  
  Line 98 : We will rephrase this sentence for clarity.
  
  Line 99 : We will add the missing comma.
  
  Figure 1: We will adapt this figure to improve readability and information retrieval.
  
  Line 104 : We will add the missing comma.
  
  Line 105 : We will add the missing comma.
  
  Line 109 : We will add the missing commas.
  
  Line 113 : We will add the missing comma.
  
  Line 123 : We will add the missing comma.
  
  Line 125 : We will add the missing comma.
  
  Methods:
  
  Line 132 : Yes, this is it 5 pixels, we will add this information to the manuscript.
  
  Line 133 : We will add the missing comma.
  
  Line 137 : We will add the missing comma.
  
  Line 138 : We are not entirely sure we understood your question: The images are not randomly selected, instead there is a fixed split to a training, validation, and test set (see Table 1). The random and iterative sampling strategies in this sentence refer to the simulation of user input in the form of clicks for training, validation, and testing. The simulated user clicks are partially placed in random locations and partially in the area with the largest error (see next line). There is no human clicking involved in training the model. This is also the reason we did a user study (see also comment to line 78)
  
  Line 142 : We will add the missing comma.
  
  Line 157 : We will add the missing comma.
  
  Line 159 : We will split this sentence up for better clarity: “Achieving a high IoU after few clicks makes the model most useful. Consequently, we compare the IoU at click k (for k = 1,2,....,20) averaged over all the images (mIoU@k).”.
  
  Line 162 : We will add the missing comma.
  
  Line 163 : We will change the abbreviation for threshold to T in the whole document to avoid confusion with the timestep in Figure 4.
  
  Line 164 : We will add the missing comma.
  
  Line 167 : We will add the missing comma.
  
  Line 173 : We will add the missing comma.
  
  Line 180 : Yes, this is hyperparameters, we will specify this in the text.
  
  Line 183 : We will add the missing comma.
  
  Line 190 : We will put “we carried out a small user study” at the beginning of the sentence as suggested.
  
  Line 191 : We will add the missing comma.
  
  Line 192 : We will add the missing comma.
  
  Line 194 : We will add the missing comma.
  
  Line 197 : We will add the missing comma.
  
  Line 198 : We will add the missing comma.
  
  Line 201 : We will add the missing comma.
  
  Results:
  
  Line 204 : We will adapt this sentence to improve clarity.
  
  Line 208 : We will add the missing comma.
  
  Line 209 : We will add the missing comma.
  
  Line 212 : We will add the missing comma.
  
  Figure 6 : We do not agree that the text and figures need to have exactly the same font. To improve readability and appearance we will harmonize the font in our figures.
  
  Line 217 : We will add the missing comma.
  
  Figure 7 : It is ground truth, yes. We will replace the abbreviation with “Ground truth” in Figure 7, 9 and 10.
  
  Line 220 : We will add the missing comma.
  
  Line 224 : We will add the missing comma.
  
  Line 225 : We will add the missing comma.
  
  Line 226 : We will change the sentence to: “For more than one fourth of all avalanches, the AvaWeb never reaches the NoC20@85, while for the AvaPic and AvaMix less than 1% of all avalanches never reach an IoU of 85%”
  
  Line 227 : We will add the missing comma.
  
  Line 230 : We will add the missing comma.
  
  Line 232 : We will add the missing comma.
  
  Line 235 : We will add the missing comma.
  
  Figure 8-9-10 : We do not agree that the text and figures need to have exactly the same font. To improve readability and appearance we will harmonize the font in our figures.
  
  Line 241 : We will add the missing comma.
  
  Line 248 : We will add the missing comma.
  
  Line 250-251 : We will rephrase this sentence to make it more clear, or maybe make two sentences.
  
  Line 254 : We will rephrase this sentence to “For clicks 1 to 5, where we had enough samples from all participants, we tested if the differences between the highest and the lowest mIoU are statistically significant. The differences are not significant for IoU@1 and IoU@2 (t-test: p-value: $>0.05$) but they are statistically significant for IoU@3 (p-value= 0.045), IoU@4 (p-value= 0.034) and IoU@5 (p-value= 0.035).”
  
  Line 257 : We will replace “eachother” by “each other”.
  
  Discussion:
  
  Line 263 : We will remove “outlines” from the sentence.
  
  Line 265 : We will add the missing comma.
  
  Line 271 : We will add the missing comma.
  
  Line 276 : We will add the missing comma.
  
  Line 277 : We will add the missing comma.
  
  Line 279 : We will add the missing comma.
  
  Line 281 : We will add the missing comma.
  
  Line 282 : We will add the missing comma.
  
  Line 284 : We will add the missing comma.
  
  Line 286 : We will add the word “that” to this sentence.
  
  Line 288 : We will add the missing comma and we will harmonize the way “user study” is written throughout the text.
  
  Line 288 : We will change to past tense.
  
  Line 294 : We will add the missing comma.
  
  Line 295 : We will add the missing comma.
  
  Line 296 : We will add the missing comma.
  
  Line 297 : We will add the missing comma.
  
  Line 298 : We will change to “were” in this sentence
  
  Figure 14 : We will add a North arrow to the map.
  
  Line 305 : We will add the missing comma.
  
  Line 206 : We will move the comma.
  
  Line 309 : We will add the missing comma.
  
  Line 310 : We will add the missing comma.
  
  Line 316 : We will add the missing comma.
  
  Line 317 : We will change this sentence to “Compared to the traditional way of mapping avalanches, IAS saves over 90\% time. We believe that the time saved may be even greater since the avalanches with a time recording were rather small (mean size 1.75) and all located in an area well known to the person mapping.” to increase clarity.
  
  Line 317 : We will merge this sentence with the above to avoid a one-sentence-paragraph.
  
  Conclusion:
  
  Line 321 : We changed the tense to were.
  
  Line 322 : We will add the missing comma.
  
  Line 324 : We will add the missing comma.
  
  Line 331 : We will add the missing comma.
  
  Line 331 : We will change to “Assuming the camera position and area captured is stable, the georeferencing can be reused for all subsequent images. In the past this has been done for webcam-based snow cover monitoring (Portenier et al., 2020).”
  
  Line 332 : We will add the missing comma.
  
  Line 338 : We will add the missing comma.
  
  Line 343 : We will remove “as is “ in “The model as is may also be used to”.
  
  Line 343 : We will replace “these” by “this”.
  
  Line 345 : We will add “avalanche annotations” to “thereby getting more accurate and reliable avalanche annotations in the future.”.
  
  Line 345: We will add the missing comma.
  
  Citation: https://doi.org/10.5194/egusphere-2024-498-AC1
CC1:
'Comment on egusphere-2024-498, Interactive Snow Avalanche Segmentation from Webcam Imagery: results, potential and limitations', Ron Simenhois, 13 May 2024

General comments:
This manuscript presents an innovative, relatively simple, cost-effective tool to dramatically improve avalanche observations while taking advantage of infrastructure already in many places. I think it is worthy of publication with a few changes.
The authors describe four different models, but all of these models have the same structure. The main difference is the dataset on which they were trained. This is not entirely clear from the manuscript. I suggest you mention it clearly for better clarity.
The point above suggests that the comparison is between the training datasets on which the models were trained. This point and the implications of the results are missing from the discussion and the conclusions. The authors are missing the opportunity to highlight and bridge their results to practical implications. The manuscript value will increase if the authors add some guides on the preferred way to train similar systems (e.g., For a system that deals with specific cameras in specific locations (like the SLF cameras), start with a pre-trained model on the COCO+LVIS dataset and then do transfer learning on the specific images from the system's cameras. For systems with many cameras, start with the base models and use a large dataset (like the AvaMix) for better generalization).
Finally, I will echo Referee #1's. Please correct the typos, punctuation, and grammar inconsistencies in the manuscript.
I added specific comments in the uploaded file.

Citation: https://doi.org/10.5194/egusphere-2024-498-CC1
- AC2: 'Reply on CC1', Elisabeth D. Hafner, 04 Jun 2024
  
  Dear Ron Simenhois,
  
  thank you for taking the time to read our manuscript and for giving detailed feedback about passages that need improvement for better readability and passages that are ambiguous and need to be corrected for clarity.
  
  Please find below the answers your comments:
  
  Different models: We will go over the manuscript and make it clear that our models (AvaWeb,..) differ in the data used to train them (and the number of epochs like noted in 3.3), but not in the model architecture.
  
  Model comparison: In our discussion we mention that we believe the coarseness of the annotations in the AvaPic prevents the model from learning all it could from such a large and diverse dataset. We expect the best model performance from training with a large dataset with fine annotations covering various perspectives, avalanche types, avalanche sizes as well as snow and illumination conditions. We will expand this section in the discussion chapter and explicitly describe the implications and recommendations for practice that we have found, picking them up again in the conclusion.
  
  We will correct the typos, punctuation, and grammar inconsistencies in the manuscript as already promised to Reviewer1.
  
  Line 32/324: We will replace the “between 10 and 60 minutes” with a general statement or leave it altogether.
  
  Line 39: True, this would be a good place. We might introduce segmentation here or elsewhere, but definitely before the reader needs to know what it is to understand what s/he is reading.
  
  Line 62: We will remove “mask” here.
  
  Line 70: You are right, it is possible to georeference any image where enough persistent objects in the image with known coordinates are identified. We meant to emphasize that for cameras in a stable position this process can be done once and reused for all subsequent images. In contrast, each image with a unique perspective needs to be individually georeferenced, resulting in a comparably higher effort per image. We will adapt our manuscript specifying what we mean.
  
  Line 132: We will add “pixels” as a unit here.
  
  Line 159: We will remove the redundant word “the”.
  
  Line 164: Fox et al. (2023) trained the model with an Intersection Over Union (IoU) threshold of 0.2 and a confidence threshold of 0.25. But when testing they used an IoU threshold of 0.05 and a “confidence threshold which maximizes the model F1 score” (see caption to Table 2 in Fox et al., 2023).
  
  Line 175: We will rewrite this description to make our point clearer.
  
  Line 227: You are right, it does not come as a surprise that having seen avalanches in training is beneficial to segmenting one later. We believe describing this in more detail in Sect 4 does not fit, but we will mention this in the discussion section.
  
  Line 230ff: Fox et al. (2023) state they achieve an F1 score of 64.0 ± 0.6 which we have correctly copied to Table 4. For the number in this line, we meant to compare F1 scores neglecting standard deviation. The difference is however 0.12 and not 0.13 (0.64 vs. 0.76). We will correct this mistake and add the most important F1 scores to the text to avoid confusion and allow the text to stand-alone.
  
  Table 5: We will add the number of images part of the UserPic to the caption of this table.
  
  Line 286: It is identical IoU of 5% for the bounding boxes: Fox et al. (2023)’s confidence threshold for this F1 score is unknown (see comment to line 164). We thresholded our raw predictions, which could also be called model confidence, at 0.5 (see line 153). This value was determined by analyzing mean IoU scores per click on the validation set.
  
  Consequently, the F1 scores we compare are both based on the confidence threshold the respective authors found to work best.
  
  Line 296/318: We will add the appropriate unit, in this case the European avalanche size scale.
  
  Line 321: We will replace prediction with segmentation.
  
  Citation: https://doi.org/10.5194/egusphere-2024-498-AC2
RC2:
'Comment on egusphere-2024-498', Anonymous Referee #2, 16 May 2024

The authors have presented a new and valuable tool for detecting and mapping avalanches from webcam imagery, which will contribute significantly to advancing avalanche research as well as public warning services. It is clear that a lot of work has been put into the study and the manuscript is worthy of publication, however some revisions are needed which will make the work more understandable to a wider audience, and not only to those involved in avalanche research or those who have interest in the application of deep learning models.

Specific comments.
Fig.1 needs improving. It is somewhat confusing at first glance. There are 5(maybe 6??) different coloured areas, which I assumed was the field of view covered by the different cameras at each location, but it isn't immediately clear which area belongs to which location. Also, it isn't clear what the areas with diagonal lines correspond to.
Sect 2.2. SLF dataset: Does this dataset of annotated avalanches contain unique avalanches, or are some of the annotated avalanches simply the same avalanche captured under different light conditions, or at a different angle? If so, would this have any impact on the model specificity (for example with AvaWeb that performed well on the Webcam images but poorer on the more generalized datasets)
Sect 3.1 Model architecture: this section could be improved to be more reader-friendly to researchers/general public interested in the topic but who are not familiar with deep learning models. As someone who has some relatively basic experience of automatic avalanche segmentation but even less knowledge with deep learning, I found this section quite heavy to understand and needed to re-read some sentences or google certain terminology/abbreviations to try and follow this section. Here I would recommend additional descriptions for HRNet+OCR, perhaps elaborating on the meaning of "tensor" and "discs" (amongst others) to aid the understanding without the reader having to google or dive into the references first which disrupts the flow of the reading.
Sect 3.3 Experimental setup: what is COC+LVIS? There is no reference and needs a little more description. I struggled to understand the relation between this baseline model and the earlier mentioned HRNet+OCR and Conv1S. Could these be represented in Fig 4 for example where there is currently just a box for Deep Learning Model?
Also, when introducing the 3 additional models (AvaWeb, AvaPic, AvaMix) that have been created from training the baseline with the different datasets, it is worth re-emphasising the difference in size of training datasets, which you have mentioned in the discussion. It would be useful to have made this point in the description of the experimental setup as something to keep in mind before presenting the results.

Technical comments.
The 2 earlier referees have made some points about many missing commas, but I personally didn't really find this to be a major hurdle when trying to read the manuscript so I have relatively few comments. However, there are some typos and a few suggestions for alternative words:
L4. "becoming more frequent" instead of "getting more frequent"
L32. I prefer to say "often" rather than "oftentimes"
L33. change "an" to "a"
L75. Add "of" to "make use OF webcam"
L85. change "webcams network" to "webcam network"
L122. change "their UIBK" to "the UIBK"
L123. change "cropped" to "cropping"
L134. 2 instances of Fig.4
L182. change "evaluate on the" to "evaluate the"
L190. change "users who's" to "users whose"
L225. change one fourth to one quarter
L226. change "while this for the" to "while for"
L246. change "to a more" to "to greater"
L289. I think "exceed" is a better word to use than "beat" in a scientific publication

Citation: https://doi.org/10.5194/egusphere-2024-498-RC2
- AC3: 'Reply on RC2', Elisabeth D. Hafner, 04 Jun 2024
  
  Dear anonymous reviewer,
  thank you very much for your detailed comments and suggestions to improve the quality of our manuscript!
  Please find below the answers to your comments:
  
  Fig. 1: We will make this figure better readable by changing the way the field of view per camera is displayed.
  Sect. 2.2. The dataset includes selected avalanches twice, captured under different illumination conditions. These avalanches are of course not split between the datasets for training and testing. We believe this helps the model to become robust and independent of the illumination conditions. We see no influence on the ability of the AvaWeb to generalize better or worse to unknown view angles.
  Sect 3.1. We will restructure and expand the model description for better readability and easier understanding for the readers with little deep-learning background.
  Sect 3.3. COCO+LVIS is a combination of COCO (an image dataset for object detection) and LVIS (a large-scale instance segmentation dataset). Both are publicly available datasets with a total of 104k images and 1.6M instance-level masks. They are widely used for training, testing, and comparing models. In our case baseline refers to the HRNet+OCR trained on this dataset. The AvaWeb, AvaPic, AvaMix refer to the HRNet+OCR trained with Avalanche images “on top of” COCO and LVIS. In other words, we use pretrained weights from COCO+LVIS to then fine-tune to avalanches.
  
  HRNet+OCR is a model for semantic segmentation, initially not used for interactive object segmentation. Conv1S is the solution to feed user corrections into the HRNet+OCR without losing the advantages of the pretraining on COCO+LVIS.
  We will expand this section in the revised version of the manuscript to make it easier to understand, also for the readers with little deep-learning background. We do not think showing details of the HRNet+OCR in Figure 4 would increase understanding. We will re-emphasize the difference in the size of training datasets in Section 3.3.
  Technical comments.
  L4. We will change to "becoming more frequent" instead of "getting more frequent".
  L32. We will use "often" rather than "oftentimes".
  L33. We will change "an" to "a".
  L75. We will change to “we propose to use webcam infrastructure” according to the suggestion from reviewer 1.
  L85. We will correct "webcams network" to "webcam network".
  L122. We will change "their UIBK" to "the UIBK".
  L123. We will change "cropped" to "cropping".
  L134. We will remove one instance of Fig.4.
  L182. We will change "evaluate on the" to "evaluate the".
  L190. We will change "users who's" to "users whose".
  L225. We will change one fourth to one quarter.
  L226. We will change "while this for the" to "while for".
  L246. We will change "to a more" to "to greater".
  L289. We will replace “beat” by "exceed" to use language more appropriate for a scientific publication.
  
  Citation: https://doi.org/10.5194/egusphere-2024-498-AC3

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to minor revisions (review by editor) (12 Jun 2024) by Alexandre Langlois

AR by Elisabeth D. Hafner-Aeschbacher on behalf of the Authors (28 Jun 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (04 Jul 2024) by Alexandre Langlois

AR by Elisabeth D. Hafner-Aeschbacher on behalf of the Authors (09 Jul 2024) Manuscript

Journal article(s) based on this preprint

23 Aug 2024

Interactive snow avalanche segmentation from webcam imagery: results, potential, and limitations

Elisabeth D. Hafner, Theodora Kontogianni, Rodrigo Caye Daudt, Lucien Oberson, Jan Dirk Wegner, Konrad Schindler, and Yves Bühler

The Cryosphere, 18, 3807–3823, https://doi.org/10.5194/tc-18-3807-2024,https://doi.org/10.5194/tc-18-3807-2024, 2024

Short summary

Elisabeth Doris Hafner, Theodora Kontogianni, Rodrigo Caye Daudt, Lucien Oberson, Jan Dirk Wegner, Konrad Schindler, and Yves Bühler

Viewed

Total article views: 514 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
359	124	31	514	19	31

HTML: 359
PDF: 124
XML: 31
Total: 514
BibTeX: 19
EndNote: 31

Views and downloads (calculated since 19 Mar 2024)

Month	HTML	PDF	XML	Total
Mar 2024	59	19	4	82
Apr 2024	71	25	7	103
May 2024	86	29	8	123
Jun 2024	86	27	10	123
Jul 2024	33	18	1	52
Aug 2024	24	5	1	30
Sep 2024	0
Oct 2024	0
Nov 2024	0
Dec 2024	0
Jan 2025	0
Feb 2025	0
Mar 2025	0
Apr 2025	0
May 2025	0
Jun 2025	0
Jul 2025	0
Aug 2025	0
Sep 2025	0
Oct 2025	0
Nov 2025	0
Dec 2025	0
Jan 2026	0
Feb 2026	0
Mar 2026	1	0	1
Apr 2026	0

Cumulative views and downloads (calculated since 19 Mar 2024)

Month	HTML	PDF	XML	Total
Mar 2024	59	19	4	82
Apr 2024	71	25	7	103
May 2024	86	29	8	123
Jun 2024	86	27	10	123
Jul 2024	33	18	1	52
Aug 2024	24	5	1	30
Sep 2024	0
Oct 2024	0
Nov 2024	0
Dec 2024	0
Jan 2025	0
Feb 2025	0
Mar 2025	0
Apr 2025	0
May 2025	0
Jun 2025	0
Jul 2025	0
Aug 2025	0
Sep 2025	0
Oct 2025	0
Nov 2025	0
Dec 2025	0
Jan 2026	0
Feb 2026	0
Mar 2026	1	0	1
Apr 2026	0

Viewed (geographical distribution)

Total article views: 526 (including HTML, PDF, and XML) Thereof 526 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 11 Apr 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (31428 KB)
Metadata XML

Short summary

For many safety-related applications such as road management, well documented avalanches are important. To enlarge the information, webcams may be used. We propose to support the mapping of avalanches from webcams with a machine learning model that interactively works together with the human. Relying on that model there is a 90 % saving of time compared to the "traditional" mapping. This gives a better base for safety-critical decisions and planning in avalanche-prone mountain regions.


Total:	0
HTML:	0
PDF:	0
XML:	0