the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Tracking the slopes: A spatio-temporal prediction model for backcountry skiing activity in the Swiss Alps using UGC
Abstract. Backcountry skiing is a popular form of recreation in Switzerland and worldwide, yet little is known about where and when people venture outside and methods to monitor skiing behaviour are limited by the vast and remote nature of backcountry terrain. With avalanche fatalities documented each year, there is a need for spatially and temporally explicit information on the persons exposed to avalanche danger for effective risk estimations. To do so, we explored over 6'800 user-generated GPS tracks and over 9 million clicks on a ski touring website to model backcountry skiing base rates on a daily scale in 126 regions in the Swiss Alps. We linked the data to weather, snow, temporal and environmental variables to train two different spatio-temporal prediction models based on the two data sources. We found that GPS and click data describe different types of behaviour (planning and real world behaviour), yet we could demonstrate that they correlate well with a 1-day time lag (ρ = 0.61), suggesting that online activity precedes actual skiing activity. Our results show that online and real-world behaviour are driven by similar underlying factors, with temporal aspects – such as weekends and the progression of the season – playing the most important role in both datasets. However, we found differences in how certain variables influenced behaviour: people tended to click on more routes in areas of high avalanche danger during more extreme weather conditions than they actually visited, and time spent on tour planning decreased as the season progressed. Our study demonstrates the potential of user-generated data sources to model skiing activity on regional and temporally fine scales, but also sheds light on specific limitations of the different data sources in approximating backcountry skiing activity.
- Preprint
(10196 KB) - Metadata XML
-
Supplement
(1214 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2344', Anonymous Referee #1, 13 Aug 2025
This paper presents an analysis of backcountry skiing activity using GPS tracks and website click data. Machine learning is used to train predictive models and analyze feature importance. The resulting importances largely align with established literature. Deviations are in line with what can be expected from click data (i.e. planning data) and actual tracking data. The use of warning regions as analysis units makes sense even though I would have liked to see a sensitivity analysis with varying spatial resolution.
My main issue is that -- while temporal cross validation is applied -- a corresponding spatial cross-validation is missing.
Other issues in order of appearance include:
- 149 "Between 2013 and 2024, over 6’800 GPS tracks were uploaded by backcountry recreationists throughout all seasons except 150 for seasons 21/22 and 22/23." ... I assume this means that the last season included in the track dataset is 23/24. Can you include information on how many individual skiers contributed GPS tracks to the database? Is the number of users per year stable? What happened to the data from 21/22/23?
- 164 "Therefore, only data from 2021 onwards is included for modelling and prediction," ... so for prediction, we only have an overlap between click and track data in 20/21 and 23/24? (This seems to be confirmed by Table 2 & 3 but might be worth making explicit in the data section.)
- 209 "mean values were calculated based on the grid points that lie in an elevation band within ±100 m of the mean track elevation (track data), respectively the mean route elevation in a given region (click data)" ... Wouldn't it make sense to further limit the weather grid cells using a maximum distance to actual skiing routes?
- 272 "This approach resulted in four (nine) training runs, each cross-validated using four (nine) different seasons for the click (track) data." ... To make this sentence easier to for the reader, I suggest to reword it instead of putting the track model info in brackets.
- 348 "The underlying driver for the systematic overprediction of the track model lay in the modelling process itself, as artificially balanced 350 numbers of presence and absence points were used for training. When verified with real-life and therefore unbalanced data, the model predicted more presence than was observed." ... Please check if the use of past tense "lay" is appropriate or if present tense "lies" should be used since the model was not adjusted after the issue was discovered and all presented results are from the overpredicting model.
Minor issues:
- 386 "Figure 9 shows the importance for each variable for the performance of the model" ... Should probably be "importance of each variable".
- 387 "from each cross-validation seasons" ... Should probably be "season".
Citation: https://doi.org/10.5194/egusphere-2025-2344-RC1 -
RC2: 'Comment on egusphere-2025-2344', John Sykes, 21 Aug 2025
Overview
This research explores user-generated content (UGC) as a means to estimate the population size of backcountry skiers in Switzerland in order to more effectively assess base rates for population scale risk analysis. The data used come from the commercial ski touring application Skitourenguru. Including GPS tracking data covering a period of roughly 10 years and click interaction data from the website covering a period of roughly 3 years. To predict the level of backcountry skiing activity across Switzerland the authors use the forecasting regions from the public avalanche forecast as their individual study areas. Two random forest machine learning models are used to predict activity in each region based on a combination of snowpack, weather, accessibility, and temporal factors. The authors fit a separate random forest model for the GPS tracking data and website clicking data and compared the output of the two approaches for estimating user activity. Due to the limited size of the GPS tracking data the RF model was designed to classify the presence or absence of backcountry skiers on a daily timescale. The click data provided a larger data set, therefore the RF model was designed using a regression approach for each region on a daily timescale. Results indicate a moderate to strong correlation between GPS tracking and clicking data, given a 1 day lag in the clicking data. This is a useful result as clicking data is much more accessible and easier to acquire compared to GPS tracks and could be used as a proxy for backcountry usage.
Overall, this paper is well written, does a good job summarizing and synthesizing relevant literature from the avalanche field and from other fields, and applies novel methods to a long standing problem in the avalanche field. The results have the potential to lead to an improvement in overall risk assessment for backcountry skiing and public communication for public avalanche forecast centers. The methods and results are presented clearly and in an approachable way given the technical analysis required to account for the spatial and temporal correlations inherent to the data. I recommend publication after minor revision.
Specific Comments
1. Intro
- The introduction provides a well written broad overview of the existing literature for estimating base usage rates of backcountry skiers. The literature encompasses a variety of techniques and identifies strengths and shortcomings of each approach.
- Line 52 to 56 - The knowledge gap is clearly identified.
- Line 61 - The research questions are well defined
- One question is why only use the Skitourenguru app as input data? This could introduce significant bias to the data set based on the characteristics of the users of the app. Broadening the data to include multiple apps (e.g. White Risk, Strava) could provide an interesting comparison and help determine if patterns apply generally or are specific to the user group of one specific application.
3. Methods
- Line 141 to 147 - Does Skitourenguru require a paid subscription to use? Is the data from this study collected only from paid subscribers? Does the app only cover the Swiss Alps or does it also cover other areas? This type of information about the app is relevant to the sample demographics and could give a better sense of how accessible the website is to different users. For example, individuals just getting into backcountry skiing or those visiting from other regions may be less likely to pay for a paid application specific to Switzerland and therefore could be systematically excluded from the sample.
- Line 155 - Do you extract the terrain characteristics of the GPS tracks prior to representing them as a single data point? Additional terrain information such as slope incline, aspect, runout exposure, percent of track in forested areas could be meaningful for more detailed understanding of the terrain characteristics. These could still be summarized to the level of GPS track or the clicked route to preserve anonymity.
- Line 162 - Why do you assume that the increase in popularity of the website in 2021 impacts the click data only and not the GPS tracking data? I assume that this decision was made because you assume that a much larger, and potentially more representative, proportion of backcountry users are engaging with Skitourenguru after 2021. Wouldn’t the same assumption apply to the GPS tracking data?
- Line 169 to 171 - I would assume that users engaged in trip planning might click on multiple routes to compare options before selecting their destination. Do you have a way to account for the fact that the ratio of clicks to actual ski tours is likely biased heavily towards clicks? Such as tracking the number of clicks per website user and assuming each user is only going to actually complete a single ski tour on the following day.
- Line 188 to 190 - Wind speed seems like a worthwhile variable to consider because it impacts avalanche hazard conditions, snow quality, and how enjoyable the experience of being in the mountains is for the day.
- Line 195 to 197 - Characterizing the desire for skiing untracked snow simply as a potential heuristic trap seems like an oversimplification. When backcountry skiers decide to undertake the risks of traveling in avalanche terrain there has to be a reward side of the equation that justifies the personal risk. While seeking untracked snow can lead skiers to make ill informed decisions, it is also a fundamental driver of what makes the activity worth pursuing. I think it would be worthwhile to consider the reward side of the decision-making process in selecting variables for your models to help balance out the focus on risk oriented factors. This is illustrated in the results by the RF importance of sunshine on the number of users.
- Line 218 to 220 - Are there additional avalanche hazard characteristics from the public forecast that could be used to give a more complete picture of the avalanche conditions. I am not very familiar with the Swiss avalanche bulletin, but examples from the North American avalanche products would include avalanche problem type, potential avalanche size, and avalanche likelihood. While the danger rating provides a useful summary, these additional avalanche characteristics provide much more nuance to the current conditions which can significantly impact backcountry skiers terrain selection and risk assessment process.
- Line 259 to 261 - Are there local experts you could consult to verify whether absence of evidence actually implies evidence of absence? For example consulting with local mountain guides to estimate whether the absence of track and click data correlates with their experience travelling in specific regions. This seems like a strong assumption based on the fact that you are using data from only 1 app, especially for forecast regions with only a few tracks/clicks throughout the period of record. I understand that this approach of inferred absence is necessary to make the models work for the present study, but acquiring all the sample data from a single source is a significant limitation. Maybe you could make a recommendation for how this assumption could be tested in future research.
- Line 269 to 273 - How did you select the season that was held out as testing data? This approach to splitting testing and training data makes sense given the nature of the dataset. However, the performance evaluation of the model could be highly dependent on the characteristics of the weather and snowpack from the testing data. If the snowpack depth, 24 hour new snow, etc. were outside the values in the training data it may skew the performance metrics.
Results
- Line 321 and 326 - Does the skewed density of clicks and tracks to a small subset of the forecast regions justify limiting the analysis to these most populated regions? Have you considered filtering out regions that do not have a minimal threshold of track or click data to reliably estimate usage patterns?
- Figure 8 - It is pretty hard to see the observed values in panel a. Perhaps you could add a black outline to the observed area or somehow increase the contrast compared to the darker green line of the predicted values.
- Line 387 - ‘validation season.’
- Line 387 - ‘‘variable importance was calculated’
- Figure 10 - This is a very useful visualization of the underlying distributions and activity for the two models. The example in panel b and c clearly illustrate the spatial correlation associated with specific conditions.
Discussion
- Line 432 - I’m not sure what you mean by GPS tracks providing limited spatial detail. In terms of providing detail on where individuals are travelling GPS tracks are probably the best type of data available.
- Line 555 - Is there any data available from Skitourenguru about the general demographics of their user base? You are claiming a few times in the discussion that click data captures a broader set of users but there is no direct evidence about the sample characteristics from this data set.
Conclusion
- Line 565 - ‘Lastly, we found that online engagement…’
Citation: https://doi.org/10.5194/egusphere-2025-2344-RC2
Model code and software
Gitlab Repository Leonie Schäfer https://gitlab.uzh.ch/geocomp/backcountry-skiing-acitivity
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
420 | 112 | 13 | 545 | 34 | 14 | 21 |
- HTML: 420
- PDF: 112
- XML: 13
- Total: 545
- Supplement: 34
- BibTeX: 14
- EndNote: 21
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1