the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Clustering simulated snow profiles to form avalanche forecast regions
Abstract. This study presents a statistical clustering method that allows avalanche forecasters to explore patterns in simulated snow profiles. The method uses fuzzy analysis clustering to group small regions into larger forecast regions by considering snow profile characteristics, spatial arrangements, and temporal trends. We developed the method, tuned parameters, and present clustering results using operational snowpack model data and human hazard assessments from the Columbia Mountains of western Canada during the 2022–23 winter season. The clustering results from simulated snow profiles closely matched actual forecast regions, effectively partitioning areas based on major patterns in avalanche hazard, such as varying danger ratings or avalanche problem types. By leveraging the uncertain predictions of fuzzy analysis clustering, this method can provide avalanche forecasters with a straightforward approach to interpreting complex snowpack model output and identifying regions of uncertainty. We provide practical and technical considerations to help integrate these methods into operational forecasting practices.
- Preprint
(4915 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
CC1: 'Comment on egusphere-2024-1609', Frank Techel, 17 Jul 2024
Dear Simon, Florian, and Pascal
I read with interest your preprint on "Clustering simulated snow profiles to form avalanche forecast regions".
I have three questions:
- Section 2.1: What is the size of the Columbia Mountains study area?
- L58, L352-356: Did I understand correctly that within this (rather large) study area (operationally?) only 168 grid points are used to simulate the snowpack? Or were these just the points used for the analysis? This seems like a rather drastic reduction compared to other studies recently submitted by you, as Herla et al. (2024) ( https://egusphere.copernicus.org/preprints/2024/egusphere-2024-871/egusphere-2024-871.pdf )
- And potentially relevant when interpreting the findings (as shown in Figure 10): Did forecasters have access to clustered snow-cover simulations during the investigated season? If they did, how were these used to cluster subregions into regions?
Thank you for clarifying these points.
Kind regards,
Frank Techel
Citation: https://doi.org/10.5194/egusphere-2024-1609-CC1 -
AC1: 'Reply on CC1', Simon Horton, 26 Jul 2024
Thank you for your interest and comments.
We can add the size of our study area (111,801 km2). You are correct that the operational model used only 168 grid points for this area, a significant reduction from Herla et al. (2024). That study used all grid points within the treeline elevation range in Glacier National Park (1348 km²). In contrast, the operational model splits each forecast polygon into smaller “microregion” polygons. Depending on the forecast polygon size, they were divided into 1 to 8 microregions, each typically covering 300 to 600 km². It then samples a representative grid point from each NWP model within the microregion. This sparse spatial sampling has been used in the operational model for several years to balance spatial resolution and computational costs, allowing the model to run quickly each morning (the operational domain covers 745,829 km² which is over 500 times larger than Glacier). Tests have found this sampling density captures most regional-scale patterns resolved by the NWP models (which typically have an effective resolution 5 to 7 times larger than the grid spacing). However, we are implementing finer resolution this year thanks to improved computational efficiencies.
Yes, forecasters had access to a dashboard with a prototype product that presented snowpack model clusters. While it's difficult to quantify its impact on human assessments, we should discuss this in our paper as there likely was some influence. Forecaster usage of the prototype varied between individuals and hazard situations, likely having more impact in data-sparse areas compared to data-rich areas like the Columbia Mountains. As forecasters increasingly use model-driven decision aids, it will become more challenging to use their assessments to validate models.
The prototype product analyzed the same simulated profiles, but used a different clustering method. The method calculated snow profile distances from basic summary statistics (e.g., HS, HN, presence of a weak layer, percentage of wet grains) instead of dynamic time warping. A hierarchical clustering algorithm was used, and the number of clusters selected using the within-between ratio. The domain was larger, causing subregions in the Columbia Mountains to be grouped with those in the neighboring Rocky Mountains. This means the clusters viewed by forecasters likely differed from those in this study.
Our first clustering prototype in 2020-21 used only the temporal sequence of HN24 to determine distances between subregions, which worked surprisingly well. We then used the summary statistics method described above for the 2021-22 and 2022-23 seasons. Forecasters expressed interest in these early prototypes, motivating us to refine the methods and conduct a more rigorous study. In 2023-24, we implemented a method similar to the one described in this study, using dynamic time warping distances and fuzzy clustering.
Citation: https://doi.org/10.5194/egusphere-2024-1609-AC1
-
RC1: 'Comment on egusphere-2024-1609', Bert Kruyt, 26 Jul 2024
Very relevant work, that aims to formalize processes that have thus far relied on inherently non-transparent expert assessment, and as such form a valuable complement to, and improvement of said assessment. We are living in an interesting time where avalanche forecasting is developing rapidly due to the application of modeling techniques long shunned by (skeptical) practitioners. These authors have played a crucial role in removing some of that skepticism with previous work, and this current work is no exception.Generally, this paper is well written, clear and shows novel methods.
One generic reservation I have with models like these, that rely on tuning of parameters to match a regional dataset, is the applicability to other regions. This becomes even more of a concern when these are model chains, where several of these 'tuned' models are strung together. In this case , Mayer's model for dry slab stability (Pmax) is tuned (or trained as it is a ML model) on snow profiles around Davos, CH. That model is then used in the author's clustering model in Canada.
One can question if the parameter choice would have been different if the Random Forest model of Mayer 2022 had been trained on Canadian profiles (with more new snow, less wind?). Or if someone takes this clustering model and applies it to a region with a very different snow climate (e.g. Norway). Would the parameter choice still be optimal?
This is not meant as critique to the paper, but more as a general concern to the community. Answering these questions implies a lot of work of a type that is not valued as much in the scientific community as much as doing 'new' things is. However for the rigidity of avalanche forecasts , it is just as important.Perhaps some discussion on the applicability of this method to other areas, and the dependency on training/tuning data would improve the paper?
Specific comments:
- Figure 4 is not clear in black and white. Whereas being readable in black&white is too much to ask for figures 7-10, for this figure a simple linestyle change (dotted/dashed or markers) may make it readable in black and white ( for example on an eReader.)
- Sect 4.4.
- Is there a reason why Beta values > 0.1 were not investigated? If so please mention/explain.
- line 210: "The number of human-assessed forecast regions changed 12 times over 107 days, with region arrangements changing on 34 days."
The into mentions "115 days when both model and human data were available for analysis." . What is this discrepancy (107 vs 115)?
- Fig 6: It is not clear to me how the ARI supports the choice of Beta=0.02. I see how 6a supports that, if the goal is to mimic the human assessment of the nr of regions. But 6b is unclear (to me). Intuitively I would say the ARI should be high on days where nothing changes, but you want the clusters to change only when the snowpack changes. How does ARI reflect that?
- Sect 5.3: "default fuzzines parameter r=2" : previously (sect 4.2) r was determined/picked to be optimal at 1.25; why is it 2 here?
- Out of curiosity: did you see the clustering change as solar irradiation (and thus the difference between N and S) became larger, i.e. throughout the season?
- in general; how do you make representative profiles for a region when solar irradiation leads to big differences in aspect? Or is the clustering only done for dry (ENW) avalanche profiles?Citation: https://doi.org/10.5194/egusphere-2024-1609-RC1 -
AC2: 'Reply on RC1', Simon Horton, 26 Jul 2024
We thank the reviewer for their thoughtful comments.
We appreciate and share the concern about the limitations of developing models tuned to specific datasets and are happy to discuss this further in the paper. We can elaborate on our experience tuning parameters to the larger operational domain in the 2023-24 season and suggest how others could do the same.
Unfortunately, tuning these types of model chains in the past has relied heavily on trial-and-error methods to produce realistic results. When introducing a newly developed method at the start of an operational season, we often need to adapt the code to handle unexpected midseason issues. Collaborative efforts to generalize models would be very beneficial, and we strongly support future efforts to test, validate, and apply models in different regions and contexts.
We can address the specific comments by clarifying some details in the manuscript.
In response to the comments on sequential clustering:
- We present grid search results with sequential weights between 0 and 0.1. Initially, we tested larger values, but they caused the clustering results to converge rapidly to a stable solution at the start of the season and then remain unchanged. This effect was partly observed for beta 0.1, which we explain at the end of the section. We can add a note at the beginning of the section to clarify why we don’t test beta values greater than 0.1.
- When evaluating the sequential clustering, we had fewer eligible days. The study spans 150 days (Nov 26 to Apr 24), with model data missing on 35 days. For non-sequential clustering, this leaves 115 days to analyze. However, for sequential clustering, we need to compare consecutive days, so each day of missing data prevents two comparisons (i.e., today's and tomorrow's). Based on the timing of the missing data, we had 107 consecutive day pairs to analyze.
- The critique about the ARI not clearly supporting beta = 0.02 is fair. The goal is for clusters to change when significant snowpack changes occur. We found that non-sequential clustering changed the regions too often, which would be disruptive to forecasting workflows. At the same time, forecasters had the sense they could not change the regions enough due to limited data and operational constraints. This suggest the ideal complexity for changes would be somewhere in between the two methods. The ARI shows that for beta = 0.02, the complexity of changes is midway between the non-sequential and human methods. Ultimately, selecting this parameter involved trial and error to produce favorable results, which we can explain more clearly in the manuscript.
Other comments:
- In Sect 5.3, we used a different fuzziness parameter (r = 2) to form coherent clusters. This parameter is sensitive to the distribution of values in the distance metric. The distance metric derived from counts has a larger skew towards values of 0 and 1 compared to metrics based on snow profile comparisons, so using r = 1.25 for the counts resulted in crisp clusters with membership values of 0% and 100%. To increase the fuzziness, we tested larger r values to optimize the average silhouette width, finding that r = 2.00 was optimal for the clustering method counts and r = 2.15 for human assessment counts.
- To date, we have only applied clustering to flat field simulations and have not specifically investigated seasonal changes influenced by solar radiation. We have observed that clusters often show stronger latitudinal dependencies in the spring, as southern regions transition to spring conditions earlier, while in winter, longitudinal patterns driven by precipitation are more dominant. We can assume similar patterns might be observed across aspects. Incorporating simulations on different aspects and elevation bands will add complexity to summarizing regional-scale patterns. Although we have considered this, we do not have a clear solution or recommendation. Ultimately, if we want to apply a standard clustering method, we need to distill all the snowpack and spatial information in a region into a single numeric pairwise comparison. Future work could explore combinations of averaging snow profiles for hazard-relevant features (as in Herla et al., 2022) and quantifying spatial relationships across spatial features with distances.
Citation: https://doi.org/10.5194/egusphere-2024-1609-AC2
-
AC2: 'Reply on RC1', Simon Horton, 26 Jul 2024
-
RC2: 'Comment on egusphere-2024-1609', Anonymous Referee #2, 21 Aug 2024
The manuscript "Clustering simulated snow profiles to form avalanche forecast regions" present a very interesting method to dynamically identify areas of similar avalanche hazard. The proposed method takes care of the spatial and temporal coherence of provided cluster which may be a significant help for the forecasters and allow for further time-based or geographic analyses using such model. The idea itself of clustering the simulated profiles is highly relevant to help forecasters in the analysis of snow models data. Hence, I think such work is highly valuable for Copernicus journal readers. However, I have some concerns that may be addressed before publication.
The selection of hyperparameters (alpha, beta, r and k) is not fully convincing due to the selected metrics (maybe a better description and justification would clarify this point) and method to the optimization of each parameter independently of other parameters. This would not be a problem if we had the conviction at the end that the model performs well in any situation. However, the final results of clustering are only provided on one year, which is the year on which the parameters have been optimized. This is not the state of the art of evaluation of Statistical/ML models. Moreover, this year seems quite specific as one day is said to be representative of the whole winter (line 264). Thus, we do not have a representative idea of the performances of such model even though it seems to be highly promising. We know that we can reproduce with a good performance clustering of the past for which we have "the ground truth" available (human clustering here) to optimize clustering but not how such model could perform the next year/in forecasting mode.
With the addition of results on a different dataset, this paper would make a high interest paper for the snow and avalanche community. An alternative solution is to demonstrate with a sensitivity parameter analysis (combining the 4 parameters) that the results are not highly sensitive to such parameter tuning. However, I believe that an additional year or at least some situations not in the tuning dataset should be available.
I also detail below more specific or detailed points.
1. Line 59 you limit the study to flat, sheltered terrain at treeline elevation. This seems an interesting selection here, but therefore the results are compared to human assessed forecast (e.g. Fig 7a). This human assessed forecast should take into account all elevations. Thus, is this really comparable ? In the discussion, you underline the high impact on elevation on the results of clustering (Bouchayer, 2017, line 295 and following).2. Line 60, you state that the method used to obtain simulated profiles is of little relevance. I do not agree. I understand the idea to put it in appendix not to break the flow of the paper (which is quite easy to read). However, choices in the simulated profiles can highly impact the results. Maybe this sentence could be modified. In particular, I noted line 355 that you selected meteorological data from 2 models. I wonder if this could impact the results as the climatology of meteorological data may be different between models, especially as the resolution are very different (so the representation of mountainous area is significantly different). I would be at least interested to have a map with the selected points for each model to ensure that there is no obvious correlation between clustering results and the choice of meteorological model.
3. Line 100-103: Could you explain why you state that "this method closely aligns with forecasters' criteria..." ?
4. Line 109-110 "that are more likely to align... than would result from basic Euclidian distances": a distance between two areas may be defined differently than the distance between centroid, especially it could be defined as the minimum distance between two points, each on in each area. This latter Euclidian distance would be relevant. Maybe this part of sentence is not relevant here, you made a choice that seem simple but relevant for your goal.
5. Equation 2: Notation $u^r_iv$ may not be clear. I supposed this is $u_iv$ at power r. If so maybe, the power could be put more clearly with a parenthesis, for instance.
6. Line 130: What is the value of threshold tolerance? Do you use a usually used value or how do you select this value?
7. Line 143: How do you select these ranges.? Especially the range for r is surprising as you said later that the default fussiness parameter is 2 (line 261).
8. Line 159: you here use distance between centroid whereas you stated line 109-110 that it was not relevant
9. Line 163: could you provide a reference and/or explain the goal of this metric. I am not sure I understand.
10. Same as 9 for line 167-171. "high-quality" clustering is not sufficient for me to understand the specificity of the score.
11. Line 179: "An elbow in the within between ratio... between 1.2 and 1.35": The within between ratio decrease after 1.35 and lower is better for this ratio, so I would not have chosen values between 1.2 and 1.35 with this ratio.
12. Line 196: Determination of k value. Why do you select different k values based on different scores and then average the different k values? With such method, you do not have any constraint on the values of the metrics (especially as these metrics are not always monotonous) while using the lower k value that are above/below thresholds for different scores for instance would give stronger results.
13. Line 210-213: "XX days" (used 3 times) is not fully clear. Is this "XX times over 107 days" ? Then, use "times" rather than "days" to be coherent with first use.
14. The average adjusted range index is not defined. Is this the average of adjusted range index for all couples of successive days.
15. Line 213: I suggest "0.69 suggesting *more* frequent and complex changes" unless you have a justified threshold on average adjusted rand index to separate between complex and simple clustering.
16. Figure 6: You here have both the influence of k and beta on the adjusted rand index (ARI). If the number of cluster is higher in the automatic clustering compared to reference human clustering but lower adjusted rand index even though the clustering is perfect because some human areas are partitioned into several clusters that varies against days while the union of clusters do not vary and match the human clusters. Why not doing as for other parameters and separate the effect of k and beta by fixing k to have a relevant comparison of ARI values?
17. Figure 7 and 8 top: please report the limits of public forecast on clustering maps to ease the reading of the graph.
18. Figure 7 and 8 top: You do not discuss the differences between human clustering and automatic clustering. For some misclassified regions, the fuzzy clustering show the uncertainty (e.g. 54% in the south of northern region) however, in some cases, there is no uncertainty shown (e/g/ >90% for south of eastern cluster and some in the south of the center region). Taking human clustering as the reference may be one limit here that could be discussed or maybe it is a limit of the method. Such result make me want to know more! However, I understand that such details may not be the core of such paper and should not take too much space.
19. Line 288-289: "Furthermore, expanding this distance...": I do not understand this sentence and what you have in mind. Maybe could you develop?
Citation: https://doi.org/10.5194/egusphere-2024-1609-RC2 -
AC3: 'Reply on RC2', Simon Horton, 26 Aug 2024
We thank the reviewer for their thoughtful comments. We recognize the shared concern with RC1 regarding the method's generalizability, as it relies on tuning parameters specific to one dataset. To address this, we are considering the following options:
- Additional analysis: Applying the method with the same hyperparameters to Avalanche Canada’s snow profile simulations from the 2023-24 winter, which are now available.
- Sensitivity analysis: A closer examination of how sensitive the clustering results are to changing hyperparameters.
- Expanded discussion: Sharing our experiences with different clustering methods across western North America over the past four winters, focusing on the impact of varying geographical domains and climates.
We can address the reviewer's specific comments with minor additions and clarifications. Here are our responses to three specific points:
- #1) Flat sheltered profiles: We agree flat sheltered treeline profiles don't capture all snowpack conditions affecting avalanche hazard. However, forecasters often find they represent regional patterns for new snow and persistent weak layers fairly well, even if they underrepresent wind-drifted and wet snow problems, which are usually more elevation/aspect specific. Also, new snow and persistent weak layers strongly influence danger levels and the delineation of forecast regions. We can address the limitations of flat sheltered profiles in both the methods and discussion sections.
- #12) Determination of k value: Could the reviewer clarify their comment on how determining k for each individual score produces a "stronger result"? Which single score would you select for an optimal solution? We find that using a single metric can be overly sensitive, whereas an ensemble average provides smoother and more consistent results over time.
- #16) Beta grid search: We agree that holding k constant during the beta grid search would improve the analysis and will re-run it accordingly.
Citation: https://doi.org/10.5194/egusphere-2024-1609-AC3
-
AC3: 'Reply on RC2', Simon Horton, 26 Aug 2024
Status: closed
-
CC1: 'Comment on egusphere-2024-1609', Frank Techel, 17 Jul 2024
Dear Simon, Florian, and Pascal
I read with interest your preprint on "Clustering simulated snow profiles to form avalanche forecast regions".
I have three questions:
- Section 2.1: What is the size of the Columbia Mountains study area?
- L58, L352-356: Did I understand correctly that within this (rather large) study area (operationally?) only 168 grid points are used to simulate the snowpack? Or were these just the points used for the analysis? This seems like a rather drastic reduction compared to other studies recently submitted by you, as Herla et al. (2024) ( https://egusphere.copernicus.org/preprints/2024/egusphere-2024-871/egusphere-2024-871.pdf )
- And potentially relevant when interpreting the findings (as shown in Figure 10): Did forecasters have access to clustered snow-cover simulations during the investigated season? If they did, how were these used to cluster subregions into regions?
Thank you for clarifying these points.
Kind regards,
Frank Techel
Citation: https://doi.org/10.5194/egusphere-2024-1609-CC1 -
AC1: 'Reply on CC1', Simon Horton, 26 Jul 2024
Thank you for your interest and comments.
We can add the size of our study area (111,801 km2). You are correct that the operational model used only 168 grid points for this area, a significant reduction from Herla et al. (2024). That study used all grid points within the treeline elevation range in Glacier National Park (1348 km²). In contrast, the operational model splits each forecast polygon into smaller “microregion” polygons. Depending on the forecast polygon size, they were divided into 1 to 8 microregions, each typically covering 300 to 600 km². It then samples a representative grid point from each NWP model within the microregion. This sparse spatial sampling has been used in the operational model for several years to balance spatial resolution and computational costs, allowing the model to run quickly each morning (the operational domain covers 745,829 km² which is over 500 times larger than Glacier). Tests have found this sampling density captures most regional-scale patterns resolved by the NWP models (which typically have an effective resolution 5 to 7 times larger than the grid spacing). However, we are implementing finer resolution this year thanks to improved computational efficiencies.
Yes, forecasters had access to a dashboard with a prototype product that presented snowpack model clusters. While it's difficult to quantify its impact on human assessments, we should discuss this in our paper as there likely was some influence. Forecaster usage of the prototype varied between individuals and hazard situations, likely having more impact in data-sparse areas compared to data-rich areas like the Columbia Mountains. As forecasters increasingly use model-driven decision aids, it will become more challenging to use their assessments to validate models.
The prototype product analyzed the same simulated profiles, but used a different clustering method. The method calculated snow profile distances from basic summary statistics (e.g., HS, HN, presence of a weak layer, percentage of wet grains) instead of dynamic time warping. A hierarchical clustering algorithm was used, and the number of clusters selected using the within-between ratio. The domain was larger, causing subregions in the Columbia Mountains to be grouped with those in the neighboring Rocky Mountains. This means the clusters viewed by forecasters likely differed from those in this study.
Our first clustering prototype in 2020-21 used only the temporal sequence of HN24 to determine distances between subregions, which worked surprisingly well. We then used the summary statistics method described above for the 2021-22 and 2022-23 seasons. Forecasters expressed interest in these early prototypes, motivating us to refine the methods and conduct a more rigorous study. In 2023-24, we implemented a method similar to the one described in this study, using dynamic time warping distances and fuzzy clustering.
Citation: https://doi.org/10.5194/egusphere-2024-1609-AC1
-
RC1: 'Comment on egusphere-2024-1609', Bert Kruyt, 26 Jul 2024
Very relevant work, that aims to formalize processes that have thus far relied on inherently non-transparent expert assessment, and as such form a valuable complement to, and improvement of said assessment. We are living in an interesting time where avalanche forecasting is developing rapidly due to the application of modeling techniques long shunned by (skeptical) practitioners. These authors have played a crucial role in removing some of that skepticism with previous work, and this current work is no exception.Generally, this paper is well written, clear and shows novel methods.
One generic reservation I have with models like these, that rely on tuning of parameters to match a regional dataset, is the applicability to other regions. This becomes even more of a concern when these are model chains, where several of these 'tuned' models are strung together. In this case , Mayer's model for dry slab stability (Pmax) is tuned (or trained as it is a ML model) on snow profiles around Davos, CH. That model is then used in the author's clustering model in Canada.
One can question if the parameter choice would have been different if the Random Forest model of Mayer 2022 had been trained on Canadian profiles (with more new snow, less wind?). Or if someone takes this clustering model and applies it to a region with a very different snow climate (e.g. Norway). Would the parameter choice still be optimal?
This is not meant as critique to the paper, but more as a general concern to the community. Answering these questions implies a lot of work of a type that is not valued as much in the scientific community as much as doing 'new' things is. However for the rigidity of avalanche forecasts , it is just as important.Perhaps some discussion on the applicability of this method to other areas, and the dependency on training/tuning data would improve the paper?
Specific comments:
- Figure 4 is not clear in black and white. Whereas being readable in black&white is too much to ask for figures 7-10, for this figure a simple linestyle change (dotted/dashed or markers) may make it readable in black and white ( for example on an eReader.)
- Sect 4.4.
- Is there a reason why Beta values > 0.1 were not investigated? If so please mention/explain.
- line 210: "The number of human-assessed forecast regions changed 12 times over 107 days, with region arrangements changing on 34 days."
The into mentions "115 days when both model and human data were available for analysis." . What is this discrepancy (107 vs 115)?
- Fig 6: It is not clear to me how the ARI supports the choice of Beta=0.02. I see how 6a supports that, if the goal is to mimic the human assessment of the nr of regions. But 6b is unclear (to me). Intuitively I would say the ARI should be high on days where nothing changes, but you want the clusters to change only when the snowpack changes. How does ARI reflect that?
- Sect 5.3: "default fuzzines parameter r=2" : previously (sect 4.2) r was determined/picked to be optimal at 1.25; why is it 2 here?
- Out of curiosity: did you see the clustering change as solar irradiation (and thus the difference between N and S) became larger, i.e. throughout the season?
- in general; how do you make representative profiles for a region when solar irradiation leads to big differences in aspect? Or is the clustering only done for dry (ENW) avalanche profiles?Citation: https://doi.org/10.5194/egusphere-2024-1609-RC1 -
AC2: 'Reply on RC1', Simon Horton, 26 Jul 2024
We thank the reviewer for their thoughtful comments.
We appreciate and share the concern about the limitations of developing models tuned to specific datasets and are happy to discuss this further in the paper. We can elaborate on our experience tuning parameters to the larger operational domain in the 2023-24 season and suggest how others could do the same.
Unfortunately, tuning these types of model chains in the past has relied heavily on trial-and-error methods to produce realistic results. When introducing a newly developed method at the start of an operational season, we often need to adapt the code to handle unexpected midseason issues. Collaborative efforts to generalize models would be very beneficial, and we strongly support future efforts to test, validate, and apply models in different regions and contexts.
We can address the specific comments by clarifying some details in the manuscript.
In response to the comments on sequential clustering:
- We present grid search results with sequential weights between 0 and 0.1. Initially, we tested larger values, but they caused the clustering results to converge rapidly to a stable solution at the start of the season and then remain unchanged. This effect was partly observed for beta 0.1, which we explain at the end of the section. We can add a note at the beginning of the section to clarify why we don’t test beta values greater than 0.1.
- When evaluating the sequential clustering, we had fewer eligible days. The study spans 150 days (Nov 26 to Apr 24), with model data missing on 35 days. For non-sequential clustering, this leaves 115 days to analyze. However, for sequential clustering, we need to compare consecutive days, so each day of missing data prevents two comparisons (i.e., today's and tomorrow's). Based on the timing of the missing data, we had 107 consecutive day pairs to analyze.
- The critique about the ARI not clearly supporting beta = 0.02 is fair. The goal is for clusters to change when significant snowpack changes occur. We found that non-sequential clustering changed the regions too often, which would be disruptive to forecasting workflows. At the same time, forecasters had the sense they could not change the regions enough due to limited data and operational constraints. This suggest the ideal complexity for changes would be somewhere in between the two methods. The ARI shows that for beta = 0.02, the complexity of changes is midway between the non-sequential and human methods. Ultimately, selecting this parameter involved trial and error to produce favorable results, which we can explain more clearly in the manuscript.
Other comments:
- In Sect 5.3, we used a different fuzziness parameter (r = 2) to form coherent clusters. This parameter is sensitive to the distribution of values in the distance metric. The distance metric derived from counts has a larger skew towards values of 0 and 1 compared to metrics based on snow profile comparisons, so using r = 1.25 for the counts resulted in crisp clusters with membership values of 0% and 100%. To increase the fuzziness, we tested larger r values to optimize the average silhouette width, finding that r = 2.00 was optimal for the clustering method counts and r = 2.15 for human assessment counts.
- To date, we have only applied clustering to flat field simulations and have not specifically investigated seasonal changes influenced by solar radiation. We have observed that clusters often show stronger latitudinal dependencies in the spring, as southern regions transition to spring conditions earlier, while in winter, longitudinal patterns driven by precipitation are more dominant. We can assume similar patterns might be observed across aspects. Incorporating simulations on different aspects and elevation bands will add complexity to summarizing regional-scale patterns. Although we have considered this, we do not have a clear solution or recommendation. Ultimately, if we want to apply a standard clustering method, we need to distill all the snowpack and spatial information in a region into a single numeric pairwise comparison. Future work could explore combinations of averaging snow profiles for hazard-relevant features (as in Herla et al., 2022) and quantifying spatial relationships across spatial features with distances.
Citation: https://doi.org/10.5194/egusphere-2024-1609-AC2
-
AC2: 'Reply on RC1', Simon Horton, 26 Jul 2024
-
RC2: 'Comment on egusphere-2024-1609', Anonymous Referee #2, 21 Aug 2024
The manuscript "Clustering simulated snow profiles to form avalanche forecast regions" present a very interesting method to dynamically identify areas of similar avalanche hazard. The proposed method takes care of the spatial and temporal coherence of provided cluster which may be a significant help for the forecasters and allow for further time-based or geographic analyses using such model. The idea itself of clustering the simulated profiles is highly relevant to help forecasters in the analysis of snow models data. Hence, I think such work is highly valuable for Copernicus journal readers. However, I have some concerns that may be addressed before publication.
The selection of hyperparameters (alpha, beta, r and k) is not fully convincing due to the selected metrics (maybe a better description and justification would clarify this point) and method to the optimization of each parameter independently of other parameters. This would not be a problem if we had the conviction at the end that the model performs well in any situation. However, the final results of clustering are only provided on one year, which is the year on which the parameters have been optimized. This is not the state of the art of evaluation of Statistical/ML models. Moreover, this year seems quite specific as one day is said to be representative of the whole winter (line 264). Thus, we do not have a representative idea of the performances of such model even though it seems to be highly promising. We know that we can reproduce with a good performance clustering of the past for which we have "the ground truth" available (human clustering here) to optimize clustering but not how such model could perform the next year/in forecasting mode.
With the addition of results on a different dataset, this paper would make a high interest paper for the snow and avalanche community. An alternative solution is to demonstrate with a sensitivity parameter analysis (combining the 4 parameters) that the results are not highly sensitive to such parameter tuning. However, I believe that an additional year or at least some situations not in the tuning dataset should be available.
I also detail below more specific or detailed points.
1. Line 59 you limit the study to flat, sheltered terrain at treeline elevation. This seems an interesting selection here, but therefore the results are compared to human assessed forecast (e.g. Fig 7a). This human assessed forecast should take into account all elevations. Thus, is this really comparable ? In the discussion, you underline the high impact on elevation on the results of clustering (Bouchayer, 2017, line 295 and following).2. Line 60, you state that the method used to obtain simulated profiles is of little relevance. I do not agree. I understand the idea to put it in appendix not to break the flow of the paper (which is quite easy to read). However, choices in the simulated profiles can highly impact the results. Maybe this sentence could be modified. In particular, I noted line 355 that you selected meteorological data from 2 models. I wonder if this could impact the results as the climatology of meteorological data may be different between models, especially as the resolution are very different (so the representation of mountainous area is significantly different). I would be at least interested to have a map with the selected points for each model to ensure that there is no obvious correlation between clustering results and the choice of meteorological model.
3. Line 100-103: Could you explain why you state that "this method closely aligns with forecasters' criteria..." ?
4. Line 109-110 "that are more likely to align... than would result from basic Euclidian distances": a distance between two areas may be defined differently than the distance between centroid, especially it could be defined as the minimum distance between two points, each on in each area. This latter Euclidian distance would be relevant. Maybe this part of sentence is not relevant here, you made a choice that seem simple but relevant for your goal.
5. Equation 2: Notation $u^r_iv$ may not be clear. I supposed this is $u_iv$ at power r. If so maybe, the power could be put more clearly with a parenthesis, for instance.
6. Line 130: What is the value of threshold tolerance? Do you use a usually used value or how do you select this value?
7. Line 143: How do you select these ranges.? Especially the range for r is surprising as you said later that the default fussiness parameter is 2 (line 261).
8. Line 159: you here use distance between centroid whereas you stated line 109-110 that it was not relevant
9. Line 163: could you provide a reference and/or explain the goal of this metric. I am not sure I understand.
10. Same as 9 for line 167-171. "high-quality" clustering is not sufficient for me to understand the specificity of the score.
11. Line 179: "An elbow in the within between ratio... between 1.2 and 1.35": The within between ratio decrease after 1.35 and lower is better for this ratio, so I would not have chosen values between 1.2 and 1.35 with this ratio.
12. Line 196: Determination of k value. Why do you select different k values based on different scores and then average the different k values? With such method, you do not have any constraint on the values of the metrics (especially as these metrics are not always monotonous) while using the lower k value that are above/below thresholds for different scores for instance would give stronger results.
13. Line 210-213: "XX days" (used 3 times) is not fully clear. Is this "XX times over 107 days" ? Then, use "times" rather than "days" to be coherent with first use.
14. The average adjusted range index is not defined. Is this the average of adjusted range index for all couples of successive days.
15. Line 213: I suggest "0.69 suggesting *more* frequent and complex changes" unless you have a justified threshold on average adjusted rand index to separate between complex and simple clustering.
16. Figure 6: You here have both the influence of k and beta on the adjusted rand index (ARI). If the number of cluster is higher in the automatic clustering compared to reference human clustering but lower adjusted rand index even though the clustering is perfect because some human areas are partitioned into several clusters that varies against days while the union of clusters do not vary and match the human clusters. Why not doing as for other parameters and separate the effect of k and beta by fixing k to have a relevant comparison of ARI values?
17. Figure 7 and 8 top: please report the limits of public forecast on clustering maps to ease the reading of the graph.
18. Figure 7 and 8 top: You do not discuss the differences between human clustering and automatic clustering. For some misclassified regions, the fuzzy clustering show the uncertainty (e.g. 54% in the south of northern region) however, in some cases, there is no uncertainty shown (e/g/ >90% for south of eastern cluster and some in the south of the center region). Taking human clustering as the reference may be one limit here that could be discussed or maybe it is a limit of the method. Such result make me want to know more! However, I understand that such details may not be the core of such paper and should not take too much space.
19. Line 288-289: "Furthermore, expanding this distance...": I do not understand this sentence and what you have in mind. Maybe could you develop?
Citation: https://doi.org/10.5194/egusphere-2024-1609-RC2 -
AC3: 'Reply on RC2', Simon Horton, 26 Aug 2024
We thank the reviewer for their thoughtful comments. We recognize the shared concern with RC1 regarding the method's generalizability, as it relies on tuning parameters specific to one dataset. To address this, we are considering the following options:
- Additional analysis: Applying the method with the same hyperparameters to Avalanche Canada’s snow profile simulations from the 2023-24 winter, which are now available.
- Sensitivity analysis: A closer examination of how sensitive the clustering results are to changing hyperparameters.
- Expanded discussion: Sharing our experiences with different clustering methods across western North America over the past four winters, focusing on the impact of varying geographical domains and climates.
We can address the reviewer's specific comments with minor additions and clarifications. Here are our responses to three specific points:
- #1) Flat sheltered profiles: We agree flat sheltered treeline profiles don't capture all snowpack conditions affecting avalanche hazard. However, forecasters often find they represent regional patterns for new snow and persistent weak layers fairly well, even if they underrepresent wind-drifted and wet snow problems, which are usually more elevation/aspect specific. Also, new snow and persistent weak layers strongly influence danger levels and the delineation of forecast regions. We can address the limitations of flat sheltered profiles in both the methods and discussion sections.
- #12) Determination of k value: Could the reviewer clarify their comment on how determining k for each individual score produces a "stronger result"? Which single score would you select for an optimal solution? We find that using a single metric can be overly sensitive, whereas an ensemble average provides smoother and more consistent results over time.
- #16) Beta grid search: We agree that holding k constant during the beta grid search would improve the analysis and will re-run it accordingly.
Citation: https://doi.org/10.5194/egusphere-2024-1609-AC3
-
AC3: 'Reply on RC2', Simon Horton, 26 Aug 2024
Data sets
Clustering simulated snow profiles to form avalanche forecast regions – Code and Data Simon Horton, Florian Herla, and Pascal Haegeli https://osf.io/4u2az/
Model code and software
Clustering simulated snow profiles to form avalanche forecast regions – Code and Data Simon Horton, Florian Herla, and Pascal Haegeli https://osf.io/4u2az/
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
349 | 117 | 35 | 501 | 13 | 15 |
- HTML: 349
- PDF: 117
- XML: 35
- Total: 501
- BibTeX: 13
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1