Comparing High-Resolution Snow Mapping Approaches in Palsa Mires: UAS LiDAR vs. Machine Learning

Störmer, Alexander; Kumpula, Timo; Villoslada, Miguel; Korpelainen, Pasi; Schumacher, Henning; Burkhard, Benjamin

doi:https://doi.org/10.5194/egusphere-2024-2862

Preprints

https://doi.org/10.5194/egusphere-2024-2862

Preprints

08 Nov 2024

| 08 Nov 2024

Comparing High-Resolution Snow Mapping Approaches in Palsa Mires: UAS LiDAR vs. Machine Learning

Alexander Störmer, Timo Kumpula, Miguel Villoslada, Pasi Korpelainen, Henning Schumacher, and Benjamin Burkhard

Abstract. Snow cover has an important role in permafrost processes and dynamics, creating cooling and warming systems, impacting the aggradation and degradation of frozen soil. Despite theoretical, experimental, and remote sensing-based research, comprehensive understanding of small-scaled snow distribution at palsas remains limited. This study compares two approaches to generate spatially continuous, small-scale snow distribution models in palsa mires in northwestern Finland based on Digital Surface Models: a machine learning approach using the Random Forest algorithm with in-situ measured snow depth data and an Unmanned Aerial System (UAS) equipped with a Light Detection and Ranging (LiDAR) sensor. For the first time, snow distribution was recorded over a palsa using a UAS. The aim is to review which approach is more precise overall and which areas are not represented sufficiently accurate. In comparison to in-situ collected validation data, the machine learning results showed high accuracy, with a RMSE of 6.16 cm and an R² of 0.98, outperforming the LiDAR-based approach, which had an RMSE of 26.73 cm and an R² of 0.59. Random Forest models snow distribution significantly better at steep slopes and in vegetated areas. This considerable difference highlights the ability of machine learning to capture fine-scale snow distribution patterns in detail. However, our results indicate that UAS data enables the study of snow and permafrost interaction at a highly detailed level as well.

Generally, snow accumulation zones especially at steep edges of the palsas and inside cracks are recognizable, while thin snow cover occurs at exposed areas on top of the palsas. Correspondingly, areas with thicker snow cover at the edges and inside cracks act as potential warming spots, possibly leading to heavy degradation including block erosion. In contrast, areas with thinner snow cover on the exposed crown parts can act as cooling spots. They initially stabilize the frozen core under the crown parts, but then form steep edges and expose the frozen core, leading finally to even more block erosion and degradation.

Received: 12 Sep 2024 – Discussion started: 08 Nov 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 15999 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (15999 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

22 Sep 2025

Comparing high-resolution snow mapping approaches in palsa mires: UAS lidar vs. modelling

Alexander Störmer, Timo Kumpula, Miguel Villoslada, Pasi Korpelainen, Henning Schumacher, and Benjamin Burkhard

The Cryosphere, 19, 3949–3970, https://doi.org/10.5194/tc-19-3949-2025,https://doi.org/10.5194/tc-19-3949-2025, 2025

Short summary

Alexander Störmer, Timo Kumpula, Miguel Villoslada, Pasi Korpelainen, Henning Schumacher, and Benjamin Burkhard

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-2862', Anonymous Referee #1, 03 Dec 2024
The paper “Comparing High-Resolution Snow Mapping Approaches in Palsa Mires: UAS Lidar vs Machine Learning” by A. Störmer et al. aims to quantify the accuracy and efficiency of mapping snow depth over three palsas in northern Finland, in a spatially continuous raster-based map. Specifically, they choose two methods to compare: 1) using a Lidar sensor on a drone with two acquisition dates of data (no snow and snow), and 2) modelling snow depth based solely on a digital elevation model and using the machine learning algorithm “Random Forest”. In situ data of snow depth are collected and used for training and validation. It is an interesting idea, and the need of mapping snow-depths over permafrost features is of great interest. It is also hard work, as noted by the authors in the Discussion, and the contribution of this paper will be of use for those wishing to map snow cover over terrain that has large variations over short distance, such as palsas. The conclusion was that the Random Forest model gave superior results as compared to the UAV Lidar. However, I have some major questions about the process and conclusions that must be addressed, as I question the overly optimistic result presented from the Random Forest model. The two larger issues to be addressed are below, followed by general and specific comments.
Larger issues that need to be addressed:
Why was a Digital Surface Model and not a Digital Terrain Model used for calculating the ground in the no-snow data, and how does this affect the snow-depth measurements, and even the topographic derivatives used in the RF Model?

If the authors used cross-validation and present it as the accuracy of the model, then this result is over-optimistic and the comparison of UAV-Lidar to the Random Forest model result is biased and not a fair comparison to make.

In more detail
1 - Use of a DSM to represent ground level - It appears that the authors have made a Digital Surface Model (DSM) from the Lidar point data to represent the ground, rather than create a Digital Terrain Model (DTM) from the Lidar data. The DSM represents the height of all objects on the surface, and if there are shrubs on the palsas (which is typically the case in degraded palsas), they may be 35-50 cm tall. Therefore if a DSM was used to represent the ground in August, while insitu snow-depth measurements were taken from the ground up, the reported snow-depth will be highly affected by the height of the vegetation, and this will then vary over the whole surface of the palsa. If the authors have a reason for using a DSM rather than DTM, it is not clear in the article, and it needs to be motivated. Using a DSM will result in error in the snow depth measurements as presented. To create a DTM from your existing data is not difficult. If you look at the paper by Jacobs et al., 2021, you will see reference to papers that discuss the potential errors of snow depth measurements when DSMs are used.
In addition if the DSM was used to calculate the Topographic derivatives used as input parameters to the RF model, are these derivatives valid?
2- Cross validation - As I understand what has been done, the results of snow-depth for UAV Lidar and RF Modelling have been evaluated differently. In the case of UAV Lidar, the in situ data act as a fully independent data set used for calculating RMSE and the accuracy of the snow-depth measurements. In the case of the RF Modelling, the in situ data are used for training of the model, and the validation of the model as presented (see Fig 8) seems to have been made using a 10-fold cross-validation. In any case, the latter means that the data used to create the model are also used to evaluate the model. Cross-validation is never an assessment of the resulting map accuracy but is an assessment of the fit of the model. So it is no surprise that the authors get seemingly much better results for the RF Model – the comparison is biased in the favor of the RF Model. Figure 8 shows this clearly, and to me is misleading. So the conclusion, as in the Results on Line 367/368, that the RF Model is showing its strength without high bias, I think is not valid.
The only way to fairly compare the assessments of these two would be to develop a model using in situ data from one palsa and apply the RF model developed to the other two palsas and assess the accuracy using the in situ data from those two palsas. Or, you could take insitu data from half of each palsa and developing training and accuracy datasets. (Note that if you consider taking a random selection of the insitu data for training/accuracy it is not optimal, since you will have spatial autocorrelation issues due to the proximity of the points, which is why the previous suggestions are better. )
Other general
The title: Rather than using the term “Machine Learning”, I think it would be better to refer to this as “Modelling”, because it doesn’t make sense to me to compare it to the specific algorithm that is used, but rather that you have created a model to predict snow depth.
There have been scientific articles that have mapped snow with UAV Lidar, eg, Jacobs, J.M. et al., 2021 “Snow depth mapping with unpiloted aerial system lidar observations: a case study in Durham, New Hampshire, United States” in The Cryosphere. (https://doi.org/10.5194/tc-15-1485-2021. While this may be the first paper to be published using UAV Lidar for snow on a palsa, I think that the Introduction should review and refer to articles that have generally applied UAV Lidar mapping of snow over other landscape types.
Section 2.1 is lacking a description of vegetation heights on the palsas.
The following points all refer to Section 3.1 – Data collection
Did you Post-Process the UAV Lidar data with RINEX data from a base station? If so, what was the base station (ie, source of the RINEX data)?

Parameters for the UAV flights are needed, eg, flying altitude, were cross-wise flights used? Knowing the directions of the flight lines is important because there are some Lidar measurements of 0 cm snow depth, and 50-60 cm snow depth in the insitu data, and it might be explained (possibly?) by not acquiring Lidar data in multiple angles – but I am not sure what has been done.

Line 151/152 says that GCPs were set out. Was this for both the Lidar and the RGB images? How many GCPs? And then, what was the horizontal and vertical accuracy of your data – both the Lidar and the RGB images?

Line 153 – Change orthopictures to images, since the raw images are not orthorectified yet. That’s a later step.

Line 157/158 “Structure from Motion techniques were not applied…” I do not understand why this sentence is here. If you created an orthophoto, which you say you do in the next sentence, then you have applied photogrammetric image matching (how you define SfM and if you define it differently than photogrammetric image matching determines what term you like to use). But why even say what you haven’t done? State what you have done to produce the orthophoto.

Line 164 – I think you mean snow depth rather than snow cover.

Line 166 – RTK-GPS.

It says on line 173 that there are randomized points on the edges of Puolikkoniva, but I do not see very many of these (maybe 5 at most?). In hindsight, I would guess that you would want to have made cross-wise transects on this palsa. Take this up in the Discussion if so.

Reference (in situ) data
I think you need a separate section to describe Reference data collection – either two sub-sections under 3.1 or else 3.1 for UAS data collection and 3.2 for Reference data collection. Under the reference data collection, there should be a better description regarding how the insitu snow depth measurements were made, specifically, was the GPS Z-measurement made from the ground level? Was it a yardstick, and was a level used to make sure it was normal to the surface?

For the insitu data you need at some point to say that these also may have errors and what these errors may be caused by, and how they may affect your result. Since the RF model is completely based on the insitu data, the errors of the insitu data are simply propagated, but do not affect the evaluation. For validating the Lidar data derived snow depths, the potential measurement errors of the insitu data are only accounted for in the evaluation.

Also, think about whether the section on UAS data collection is only about data collection or if you want to describe the processing of the data here – in which case you might just name it “UAS data” or “UAS data collection and processing”.

The in situ data particularly in the case of the largest palsa Puolikkoniva were run in two transects lengthwise along the palsa, but not crosswise, over the edges where the deepest accumulation of snow may have been. Therefore the values where some of the largest differences are between the Lidar and the RF Model cannot really be assessed, making the assessment incomplete – the shortcoming must be acknowledged.

Also the Lidar may measure extremes in snow-depths, while the model will not if it does not have representative data for the extremes. Therefore there will be more variability in the Lidar data, but we cannot tell which is “wrong”.

Section 3.2 – RF algorithm
The authors state on Line 189 that no explicit hyperparameters were specified. So this means that they were not analyzed, although the outcome of the model is what is being assessed as the main objective of the article. It is not difficult to assess the hyperparameters using Grid-Search or another comparable function.

Permutation mode was used for variable importance – do you know how this works? Is it a single run of the RF model? When you run PI repeatedly, do the same variables have the same importance? The random nature of RF often requires running variable importance (or in this case PI) many times (eg, 100) and taking an average. Even then, one needs to be careful with their interpretation of variable importance.

For Line 187-188 - I’m not really sure what you have done with the model and the in situ data. You state that you have split 70% training and 30% test. Is this used by RF for internal cross-validation of the model (if you split the data 70/30 in the RF model, then it is likely this is how it is being used). Is this done with replacement? If you have removed 30% of the data for independent evaluation, then you need to clearly state this, but I don’t think this is what you have done.

Line 184 – The dependent variable for your model is snow-depth.

Line 185 –“Input parameters” are mentioned here but we don’t know what they are until later. Couldn’t you refer to Table 2 here? Otherwise we are left wondering what the parameters are.

Line 189 – delete “precise” – This is a judgmental word – leave it to your results to be the judge of that.

In addition, RF models are sensitive to imbalance in the training data, and also do not extrapolate beyond the minimum and maximum snow-depth values (or whatever the target variable may be). How are your results affected by this, and how might others in the future be affected by this and what would your recommendations be to future applications of this method?

Section 3.3 –
The first sentence needs rewriting. First of all, which “collected airborne data” is referred to here? I assume it was the August DSM from Lidar that was used? It is not stated. Were these data processed differently than what was described in Section 3.1? Declare which DEM you are working with and say specifically that you are creating parameters from this. What happens if you use a DSM and create all of these topographic derivatives as parameters? Are those new derivatives valid, such as Topographic Wetness Index, if they are based on the surface elevation which includes vegetation? This must be well-motivated if the authors believe that there is a valid reason for this.

Line 210 – If a 0.3 m buffer was used were the values for any parameters averaged within this area?

Table 2 – 12 parameters were used, but 21 are in the table. Could you indicate in a way what parameters were used?

For the Discussion: When you made the insitu measurements, it was August, and the palsa had likely subsided. Renette et al., 2024 show that the difference between elevation in September (likely maximum thaw depth of the Active Layer) and April (minimum thaw) was on average 15 cm, and up to 30 cm in some areas, albeit on a taller palsa than in the study presented here. In any case, this may mean that trying to measure snow depth using a DTM from September may introduce errors if the terrain is actually elevated some cm more than this. This is hard issue to solve with UAV Lidar, since you would need to be in place to create a DTM right after snow-melt, and all snow would need to have melted. So, you need to discuss what implications this has to your results. Also, since you have RTK-GPS data, and you have measured to the ground I assume, you actually have a dataset where you could compare the Z-measurement from March to the DTM from August, and get an estimate of the difference in height between the max-thaw and min-thaw state of the palsa.

Language
It’s my feeling that some value judgement words don’t belong in a scientific article. Such as “exemplarily” on line 53.
Line 38 – deepening instead of growth. Line 58 – deeper instead of higher.
Otherwise some minor grammatical fixes once the paper is revised can be looked over.

Specific
Line 35 – it is not only bound by peatland presence but also climatic parameters
Line 69 – “Satellite data” only names the platform. What kind of satellite data are you referring to? Optical? Radar? That is the more important aspect. Similar issue is on line 74 where the sensor type should be mentioned and not just the platform which is UAS/UAV. Look through your paper for these kind of omissions.
Line 70 – change technical limitations to properties
Line 86 – the authors mention 3 methods, but the title takes up two. The third method seems to be the insitu data, but that has been used to train the RF Model, and I don’t think you are really assessing the accuracy of the method, so I would stick to the two methods.
Line 89 – delete simulation. You are just modelling.
Table 1 – the photos are rather small. Can they be made bigger. Put the date (day-month-year) of the photos in the Table text.
Line 129 – For what year or years is that the annual mean temperature?
Line 137 – For what location is that the duration of permanent snow cover?
Figure 2 – What is shown in Fig 2? It needs to be said clearly in the Fig text. Is this an average value for 1990-2020? It would be very helpful to know what the climate conditions were for the years in which you acquired the snow data. Was it a very snowy year? Windy in the days before you visited? Warm temperatures so that the snow melted some? Knowing these conditions can help us to explain any differences between the various results, particularly if the model is solely based on the DEM. I see you mention this on Line 401/402.
Line 141 – Write which day the data were acquired. If you cannot fit it reasonably in the text, because it was different dates for different palsas, I suggest you put it in Table 1 – dates for image and Lidar acquisition.
Several of the Figures have such small text that they are difficult to read. Eg Fig 3.
Section 3 – Is August the season for maximum thaw? It’s not September? Does Verdonen et al. 2023 state that August is the max ALT? If it is August, I think you should more specifically say the end of August. If you aren’t sure or don’t have a reference to back it up, then maybe it is more reasonable to say that the end of August is near max ALT.
Line 231 – 240 feel like they belong in the section describing the RF model.
Line 231/232 – Was the 10-fold cross-validation done when creating the initial RF model, or was this something that was done afterwards and used as the “validation” data presented in Figure 8? If it is the latter, you cannot say that it was used to reduce over-fitting in the model? There is an option in Random Forest to use cross-validation to create the model, and that is one tool of several to reduce over-fitting. Other ways to reduce over-fitting is to limit tree depth, -- by the way, in Section 3.2 you mention target node depth, but I don’t see in the caret package what that refers to. Is it “maxdepth”? In that case I suggest you name the parameter in parentheses.
Line 236/237 – What are “the initially calculated values”? You are using the insitu data to train a RF model and then evaluating the model based on a cross-validation that using that same insitu data. See my point #2 under “Larger issues”.
Line 273/274 – “Only a few narrow structures with significantly higher snow can be recognized based on the UAS LiDAR data” – I do not know what this sentence is about.
Line 281 and Fig 7 and Table 3 – I don’t think we need to see all 3 model runs, just the best one.
Line 285 – rather confusing that it is stated that Elevation was removed, and now it is important. Also Fig 7 text is impossible to read because it is so small.
Line 295 and Table 4 – these areas of “Top”, etc, could you have a figure somewhere – maybe supplemental where these areas are shown? Do we know the number of samples (n) in each group?
Line 323 also Line 346 – Fig 9?
Figure 9 – Is B (Slope in degrees) based on the DSM? Is this valid then to calculated slope based on vegetation?
Line 404/405 – I guess you are referring to reflectance of the lidar from the snow/ice surface? If so I think you should have a reference here.
Citation: https://doi.org/10.5194/egusphere-2024-2862-RC1
- AC1: 'Reply on RC1', Alexander Störmer, 28 Jan 2025
  
  Please find our response to all comments by Reviewer 1 in the attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2862-AC1
RC2:
'Comment on egusphere-2024-2862', Anonymous Referee #2, 23 Dec 2024
Summary
The authors produce a random forest (RF) model based on lidar-derived topographic predictors and point observations of snow depth. The RF model is used to create a continuous snow depth field over the three independent palsas in Finland and Sweden. It is then evaluated against the point observations and compared to UAS-lidar-derived snow depths. Finally, the authors discuss the implications of snow depth variability on permafrost dynamics at the palsas. The manuscript has well-constructed figures, relies on a unique and interesting dataset, includes an assessment of a wide range of reasonable terrain predictors of snow depth, and the methods are on the right track. However, there are some significant concerns, including manuscript organization/framing, limited lidar validation/processing concerns, model overfitting, and generally weak analysis/discussion. These are detailed below.

Lastly, please do not be overwhelmed by all of the comments! Addressing the major suggestions and proofreading the manuscript thoroughly should move this study much closer to publication. The specific comments are intended as suggestions/thoughts to help steer the revision process and are generally related to the below Major Suggestions/Comments.

Major Suggestions/Comments

Concerns with Research Objectives, Methods, and Manuscript Organization
Ln 94-99: It is my opinion that the research objectives need to be refined. The first stage of the paper should be an evaluation of lidar-based snow depth, followed by an evaluation of the RF modeling approach. Only then should the authors discuss the potential implications of the depth patterns, and this should be a smaller part of the manuscript focused in the discussion. Since the authors did not explicitly collect data to link snow depth to changes in the active layer (or ice loss/gain), the outcomes are more based on expectations and assumptions – which may be valid, but to verify and to be a focus of the manuscript would require more data. The points described on Ln 97-99 are underdeveloped and unsupported by observations.

Section 4.3 needs revision. Unless the expected errors in the lidar product are further expanded upon, these are physical observations and it is standard practice to assume that these products have uncertainties errors proportional to the sensor error (e.g., ~5 cm). This can be directly evaluated from ground observations of snow depth and was to some degree. However, the errors were much larger than expected (>20 cm, Ln 298-302), raising concerns about the processing of lidar data to produce snow depth maps. This component and the framing of the analysis are significant concerns. A section early on evaluating the lidar depth products seems necessary and considering the influence of vegetation on their accuracy explicitly (for example, examining some of the outliers in Figure 8 more closely) – addressing the concern of vegetation compression should be added here and vegetation height models produced from the summer lidar point cloud

Comparing the lidar to an RF model trained and evaluated against <200 observations directly is not appropriate. As presented, the lidar depth analysis does not add much to the manuscript – I suggest it be redone (reprocessed data, more detailed lidar depth evaluation), and/or, the work reframed to simply build the RF model using lidar terrain and snow depth point observations, then a revised analysis on how these patterns are expected to influence the palsa stability.

Random Forest Modeling Concerns
Ln 189-191: It seems like little consideration was given to the hyperparameters, and several important ones (like maximum split size, and minimum node size) are not mentioned. Please clearly state the hyperparameters used, and an optimization routine should be included to select these – not just using defaults – which are likely geared towards a much larger data set. If done correctly, this will reduce overfitting (see following concerns)

Various model runs were not clear. The predictors for model 1,2, and 3 should be explicitly stated, with the appropriate reasoning within the methods section

10-fold cross-validation is not sufficient to ensure that the model is not overfit. Each model is still trained with 90% (9/10) of all data (and the training dataset is relatively small <200 snow depths). The authors should explore the influence of fewer folds (e.g., 3-10) to assess how much model performance is degraded. For such a small dataset, around four (4) folds seems more appropriate in this case

Ln 232 – 242: This is relevant to RF model training/evaluation – I suggest moving to the random forest training section

The results shown in Table 3 (R² > 0.99, RMSE less than the expected measurement uncertainty <3 cm) suggest substantial overfitting of the random forest model

Avoid analysis based on terms like ‘potentially,’ ‘possibly,’ and ‘probably’ -- you should focus your study on explaining and describing the data you have collected its likely implications with clearly stated support

The terminology of cooling and warming spots was unclear – since these are not a standard term to my knowledge, these need to be explicitly defined early on and used consistently throughout the manuscript

Ln 71-73: UAS-lidar-based snow depth monitoring approaches/literature should be sufficiently reviewed in the introduction (& by the authors). The approaches used to produce snow-depth products do not align with standard practice (e.g., classifying the vegetation-free ground surface). See work by Avanzi et al., 2018; Harder et al., 2020; Jacobs et al., 2021.

Grammatical and sentence structure issues limit communication effectiveness in the paper. A thorough proofreading by a third party before resubmission would benefit the paper. Some specific instances of this were noted in the comments below

Minor/Technical/Grammatical Suggestions
Stick with snow depth or snow height throughout – be consistent with word choice

The use of the word ‘precision’ is questionable at times (see abstract Ln 9). Precision measures the ability for repeatable measurements. Accuracy is a better term for assessing something like a random forest model. Read through the manuscript and consider if the use of ‘precision’ is appropriate throughout

Word choice should be reviewed throughout – (e.g., Ln 78: ‘very strong changes’ could be ‘control’, Ln 212: ‘realism’)

Avoid broad terminology throughout, especially before something a term is explicitly defined (e.g., Ln 150 – was not clear what ‘input parameter data’ referred to again at Ln 181, 182, 185). I suggest defining the types of input parameters earlier on – like was done in Section 3.3

Abstract
Ln 13-15: Machine learning is used to model the snow depth spatially and relies on observations. On its own, it does not capture snow depth patterns. Consider rewording

The abstract should be a single cohesive paragraph, avoid splitting into two parts

Introduction
Ln 26, 57-58: While snow cover duration is decreasing, the suggestion that snow depth is increasing substantially in these regions is less clear. This paper suggests snowfall extremes will be reduced in the study area (https://www.nature.com/articles/s41598-021-95979-4) – can you clarify this point?

Ln 32-33: Sentence structure/clarity issues – please revise

The relevance of palsas is not addressed clearly in the introduction. Please add some sentences on their general significance, e.g., Do they stabilize permafrost? Provide habitat? Have societal relevance?

Ln 53: remove ‘exemplarily’

Ln 56-57: These points seem essential for understanding the relevance of palsas – I suggest this is moved earlier in the introduction when palsas are defined

Ln 60-61: ‘in-situ measured data’ or ‘observations’ need to be clarified. Is this temperature data? Snow depth? Other?

Ln 78-79: Wording is unclear – ‘…limits information value of satellite data…’

Ln 81:’Another’ should start a new paragraph – this section is also very short relative to the prominent role that machine learning plays in the paper. I suggest adding more detail.

Ln 90: ‘…test methods for generating detailed snow distribution maps..” should lead this section. The objectives need to be clearly stated up front

Data and Methods
Ln 141: a comprehensive dataset of what? Specify

Figure 3: Should clearly state the actual observations that were collected

Ln 151: This is the first time LiDAR is mentioned. Needs to be introduced within the introduction

Ln 157-160: very confusing. No SfM, but then orthophotos were created? That relies on photogrammetry -- but then you state point cloud densities. Are these associated with lidar or RGB orthophotos? If lidar, need to put it right after the lidar. Also, should report density per square meter as it is the standard.

Ln 162-164: Revise sentence structure for clarity

Ln 166: should be ‘by an RTK GPS system’

Ln 170: word choice - ‘optimal’

Ln 172: The sampling strategy is claimed to be randomized, though it appears observations were collected along transects with some random points. Some of these could be biased, so it would be useful to add a bit more description. There are also areas with clear gaps

Related to the previous point, the distribution of snow depth observations included in the appendix should be split by site (in my opinion)

Ln 176: It isn’t easy to make out any snow-free areas on the palsas in the imagery – can these be indicated?

Ln 184-188: To clarify, the full training set is based on only 185 observations – but increased due to the buffering? Please indicate how many unique features were actually used to train the model after the buffering. Will help the reader understand the robustness of the model

Ln 193-194: Just state the metrics were normalized 0-1, with the highest output importance set as 1

Ln 205: The removal of elevation as a predictor needs more explanation – the logic that is will ‘reduce possible overfitting’ is not apparent

Ln 206-207: wordy, what is ‘initial minimal impact’?

Ln 211: Unclear how this offers a balanced representation. The idea of taking an area is usually to remove noise, reduce the influence of sampling or geolocation errors, and to grow the training set size (taking groupings of nearby points vs. a single one - which should improve the robustness of the model). Please explain further.

Table 2: Nice table! For features like TPI (which are determined to be very important), you should be more detailed in their definition. More than ‘it combines several topographic features.’ TPI is generally just the relative elevation of a point to surrounding points within some radius (or adjacent pixels)

Ln 238-239: Be careful with wording. Correlation (strength of linear relationship) and significance (based on statistical testing) are not the same thing

Results
Section 4.1: Nice job describing results clearly and sequentially by site.

Figure 5: Nice figure, it would be useful to add annotations for areas of interest referred to in Section 4.1 on the figure (e.g., the collapsed areas)

Ln 252: When stating things like ‘slightly higher,’ specify the magnitude (is this 10cm, 20cm, 5cm?). Same as Ln 257, how much lower?

Ln 272-273: sentence clarity issue

Figure 7: Nice figure! Be sure to add more specifications on the model runs in the methods section

Ln 294: How were they separated into ‘point groups’ used to produce Table 4 – how were the different areas delineated and can these be added to the maps?

Ln 310-313: Correlation analysis results should be included as a table – this could be added to the appendix if the authors do not want to include it in the body of the paper

Discussion
Ln 318-319: Revise based on previous comments

Ln 329: warming and cooling spots need to be defined more before this point. What is a good technical definition? For example, are warming spots where the net heat flux into the ground during the winter is highest - making these areas warmer? vs. Cooling spots, where the net heat flux into the ground is lowest? We need to have a clear and more scientific definition

Ln 345: ‘Cooling spots inhibit a greater active layer thickness in summer’ – is this the technical definition? It comes across as difficult to interpret. An alternative version of this: ‘Cooling spots result in shallower active layers in summer compared to warming spots.’

Figure 8 - Nice figure. The delineations are helpful. Similar delineations would help the interpretation of results in prior figures

Section 5.2: This section should be revised thoroughly – see previous comments on lidar snow depth and RF model comparison

Ln 354: Luo and Panda studies were based on satellite remotely sensed snow cover – not sure I understand the link to UAS-lidar observations. Also, not clear what ‘not in depth post-processed data’ is. I did not understand the transition of the discussion from snow depth to snow cover

Ln 363-364: How did manual probing address the issue of vegetation? The uncertainty in these observations was never discussed

Ln 366-368: This doesn't seem like only a lidar limitation - but a measurement challenge in general. Measuring snow over dense vegetation with air voids, compression, etc.. is always challenging. New approaches to correct the lidar based on the underlying vegetation type/density/height may improve lidar snow depth products.

Ln 375-387: The discussion in this paragraph was strong, and it was easier to follow the logic. This could be an example to use when revising the discussion.

Ln 392, 407-408: Why was vegetation not removed from the summer point cloud? I do not understand why this was done in this manner. This step is critical for snow depth mapping with lidar.

Ln 395-397: There is a growing body of literature on this that would be useful to review. See Buhler 2016, 2017; Adams et al., 2018; Avanzi et al., 2018, Cho et al., 2024 (Preprint), Eker et al. 2019; Harder et al. 2020 (compares lidar and RGB)

Much of the discussion relies on findings from other studies and assumed links to snow depth observed in this study to conclude – not clear to me what value the work presented here has to understanding palsa permafrost dynamics more than point observations on a transect across one of these features would. Related to previous comments on reframing and refocusing the research objectives

Ln 412-414: A fewer number of folds should be used in the model training/validation

Ln 421, 425-426: A large number of input features are used in this model and the results as presented show nearly perfect model performance – are you suggesting others should be included? If others could make the model better, why were they not included?

Ln 423-424: Good point

Once noted challenges throughout are addressed – the discussion should be re-written to align with the updated manuscript

Conclusions
As presented, the paper is focused on the evaluation of the methods for snow depth mapping and on the predictors that control the depth distribution -- discussion into the influence of these characteristics on the thermal profiles is purely assumption based -- thus reframing the conclusion in line with the revised paper and the actual results/data presented will be critical in the revised version.
Citation: https://doi.org/10.5194/egusphere-2024-2862-RC2
- AC2: 'Reply on RC2', Alexander Störmer, 28 Jan 2025
  
  Please find our response to all comments by Reviewer 2 in the attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2862-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-2862', Anonymous Referee #1, 03 Dec 2024
The paper “Comparing High-Resolution Snow Mapping Approaches in Palsa Mires: UAS Lidar vs Machine Learning” by A. Störmer et al. aims to quantify the accuracy and efficiency of mapping snow depth over three palsas in northern Finland, in a spatially continuous raster-based map. Specifically, they choose two methods to compare: 1) using a Lidar sensor on a drone with two acquisition dates of data (no snow and snow), and 2) modelling snow depth based solely on a digital elevation model and using the machine learning algorithm “Random Forest”. In situ data of snow depth are collected and used for training and validation. It is an interesting idea, and the need of mapping snow-depths over permafrost features is of great interest. It is also hard work, as noted by the authors in the Discussion, and the contribution of this paper will be of use for those wishing to map snow cover over terrain that has large variations over short distance, such as palsas. The conclusion was that the Random Forest model gave superior results as compared to the UAV Lidar. However, I have some major questions about the process and conclusions that must be addressed, as I question the overly optimistic result presented from the Random Forest model. The two larger issues to be addressed are below, followed by general and specific comments.
Larger issues that need to be addressed:
Why was a Digital Surface Model and not a Digital Terrain Model used for calculating the ground in the no-snow data, and how does this affect the snow-depth measurements, and even the topographic derivatives used in the RF Model?

If the authors used cross-validation and present it as the accuracy of the model, then this result is over-optimistic and the comparison of UAV-Lidar to the Random Forest model result is biased and not a fair comparison to make.

In more detail
1 - Use of a DSM to represent ground level - It appears that the authors have made a Digital Surface Model (DSM) from the Lidar point data to represent the ground, rather than create a Digital Terrain Model (DTM) from the Lidar data. The DSM represents the height of all objects on the surface, and if there are shrubs on the palsas (which is typically the case in degraded palsas), they may be 35-50 cm tall. Therefore if a DSM was used to represent the ground in August, while insitu snow-depth measurements were taken from the ground up, the reported snow-depth will be highly affected by the height of the vegetation, and this will then vary over the whole surface of the palsa. If the authors have a reason for using a DSM rather than DTM, it is not clear in the article, and it needs to be motivated. Using a DSM will result in error in the snow depth measurements as presented. To create a DTM from your existing data is not difficult. If you look at the paper by Jacobs et al., 2021, you will see reference to papers that discuss the potential errors of snow depth measurements when DSMs are used.
In addition if the DSM was used to calculate the Topographic derivatives used as input parameters to the RF model, are these derivatives valid?
2- Cross validation - As I understand what has been done, the results of snow-depth for UAV Lidar and RF Modelling have been evaluated differently. In the case of UAV Lidar, the in situ data act as a fully independent data set used for calculating RMSE and the accuracy of the snow-depth measurements. In the case of the RF Modelling, the in situ data are used for training of the model, and the validation of the model as presented (see Fig 8) seems to have been made using a 10-fold cross-validation. In any case, the latter means that the data used to create the model are also used to evaluate the model. Cross-validation is never an assessment of the resulting map accuracy but is an assessment of the fit of the model. So it is no surprise that the authors get seemingly much better results for the RF Model – the comparison is biased in the favor of the RF Model. Figure 8 shows this clearly, and to me is misleading. So the conclusion, as in the Results on Line 367/368, that the RF Model is showing its strength without high bias, I think is not valid.
The only way to fairly compare the assessments of these two would be to develop a model using in situ data from one palsa and apply the RF model developed to the other two palsas and assess the accuracy using the in situ data from those two palsas. Or, you could take insitu data from half of each palsa and developing training and accuracy datasets. (Note that if you consider taking a random selection of the insitu data for training/accuracy it is not optimal, since you will have spatial autocorrelation issues due to the proximity of the points, which is why the previous suggestions are better. )
Other general
The title: Rather than using the term “Machine Learning”, I think it would be better to refer to this as “Modelling”, because it doesn’t make sense to me to compare it to the specific algorithm that is used, but rather that you have created a model to predict snow depth.
There have been scientific articles that have mapped snow with UAV Lidar, eg, Jacobs, J.M. et al., 2021 “Snow depth mapping with unpiloted aerial system lidar observations: a case study in Durham, New Hampshire, United States” in The Cryosphere. (https://doi.org/10.5194/tc-15-1485-2021. While this may be the first paper to be published using UAV Lidar for snow on a palsa, I think that the Introduction should review and refer to articles that have generally applied UAV Lidar mapping of snow over other landscape types.
Section 2.1 is lacking a description of vegetation heights on the palsas.
The following points all refer to Section 3.1 – Data collection
Did you Post-Process the UAV Lidar data with RINEX data from a base station? If so, what was the base station (ie, source of the RINEX data)?

Parameters for the UAV flights are needed, eg, flying altitude, were cross-wise flights used? Knowing the directions of the flight lines is important because there are some Lidar measurements of 0 cm snow depth, and 50-60 cm snow depth in the insitu data, and it might be explained (possibly?) by not acquiring Lidar data in multiple angles – but I am not sure what has been done.

Line 151/152 says that GCPs were set out. Was this for both the Lidar and the RGB images? How many GCPs? And then, what was the horizontal and vertical accuracy of your data – both the Lidar and the RGB images?

Line 153 – Change orthopictures to images, since the raw images are not orthorectified yet. That’s a later step.

Line 157/158 “Structure from Motion techniques were not applied…” I do not understand why this sentence is here. If you created an orthophoto, which you say you do in the next sentence, then you have applied photogrammetric image matching (how you define SfM and if you define it differently than photogrammetric image matching determines what term you like to use). But why even say what you haven’t done? State what you have done to produce the orthophoto.

Line 164 – I think you mean snow depth rather than snow cover.

Line 166 – RTK-GPS.

It says on line 173 that there are randomized points on the edges of Puolikkoniva, but I do not see very many of these (maybe 5 at most?). In hindsight, I would guess that you would want to have made cross-wise transects on this palsa. Take this up in the Discussion if so.

Reference (in situ) data
I think you need a separate section to describe Reference data collection – either two sub-sections under 3.1 or else 3.1 for UAS data collection and 3.2 for Reference data collection. Under the reference data collection, there should be a better description regarding how the insitu snow depth measurements were made, specifically, was the GPS Z-measurement made from the ground level? Was it a yardstick, and was a level used to make sure it was normal to the surface?

For the insitu data you need at some point to say that these also may have errors and what these errors may be caused by, and how they may affect your result. Since the RF model is completely based on the insitu data, the errors of the insitu data are simply propagated, but do not affect the evaluation. For validating the Lidar data derived snow depths, the potential measurement errors of the insitu data are only accounted for in the evaluation.

Also, think about whether the section on UAS data collection is only about data collection or if you want to describe the processing of the data here – in which case you might just name it “UAS data” or “UAS data collection and processing”.

The in situ data particularly in the case of the largest palsa Puolikkoniva were run in two transects lengthwise along the palsa, but not crosswise, over the edges where the deepest accumulation of snow may have been. Therefore the values where some of the largest differences are between the Lidar and the RF Model cannot really be assessed, making the assessment incomplete – the shortcoming must be acknowledged.

Also the Lidar may measure extremes in snow-depths, while the model will not if it does not have representative data for the extremes. Therefore there will be more variability in the Lidar data, but we cannot tell which is “wrong”.

Section 3.2 – RF algorithm
The authors state on Line 189 that no explicit hyperparameters were specified. So this means that they were not analyzed, although the outcome of the model is what is being assessed as the main objective of the article. It is not difficult to assess the hyperparameters using Grid-Search or another comparable function.

Permutation mode was used for variable importance – do you know how this works? Is it a single run of the RF model? When you run PI repeatedly, do the same variables have the same importance? The random nature of RF often requires running variable importance (or in this case PI) many times (eg, 100) and taking an average. Even then, one needs to be careful with their interpretation of variable importance.

For Line 187-188 - I’m not really sure what you have done with the model and the in situ data. You state that you have split 70% training and 30% test. Is this used by RF for internal cross-validation of the model (if you split the data 70/30 in the RF model, then it is likely this is how it is being used). Is this done with replacement? If you have removed 30% of the data for independent evaluation, then you need to clearly state this, but I don’t think this is what you have done.

Line 184 – The dependent variable for your model is snow-depth.

Line 185 –“Input parameters” are mentioned here but we don’t know what they are until later. Couldn’t you refer to Table 2 here? Otherwise we are left wondering what the parameters are.

Line 189 – delete “precise” – This is a judgmental word – leave it to your results to be the judge of that.

In addition, RF models are sensitive to imbalance in the training data, and also do not extrapolate beyond the minimum and maximum snow-depth values (or whatever the target variable may be). How are your results affected by this, and how might others in the future be affected by this and what would your recommendations be to future applications of this method?

Section 3.3 –
The first sentence needs rewriting. First of all, which “collected airborne data” is referred to here? I assume it was the August DSM from Lidar that was used? It is not stated. Were these data processed differently than what was described in Section 3.1? Declare which DEM you are working with and say specifically that you are creating parameters from this. What happens if you use a DSM and create all of these topographic derivatives as parameters? Are those new derivatives valid, such as Topographic Wetness Index, if they are based on the surface elevation which includes vegetation? This must be well-motivated if the authors believe that there is a valid reason for this.

Line 210 – If a 0.3 m buffer was used were the values for any parameters averaged within this area?

Table 2 – 12 parameters were used, but 21 are in the table. Could you indicate in a way what parameters were used?

For the Discussion: When you made the insitu measurements, it was August, and the palsa had likely subsided. Renette et al., 2024 show that the difference between elevation in September (likely maximum thaw depth of the Active Layer) and April (minimum thaw) was on average 15 cm, and up to 30 cm in some areas, albeit on a taller palsa than in the study presented here. In any case, this may mean that trying to measure snow depth using a DTM from September may introduce errors if the terrain is actually elevated some cm more than this. This is hard issue to solve with UAV Lidar, since you would need to be in place to create a DTM right after snow-melt, and all snow would need to have melted. So, you need to discuss what implications this has to your results. Also, since you have RTK-GPS data, and you have measured to the ground I assume, you actually have a dataset where you could compare the Z-measurement from March to the DTM from August, and get an estimate of the difference in height between the max-thaw and min-thaw state of the palsa.

Language
It’s my feeling that some value judgement words don’t belong in a scientific article. Such as “exemplarily” on line 53.
Line 38 – deepening instead of growth. Line 58 – deeper instead of higher.
Otherwise some minor grammatical fixes once the paper is revised can be looked over.

Specific
Line 35 – it is not only bound by peatland presence but also climatic parameters
Line 69 – “Satellite data” only names the platform. What kind of satellite data are you referring to? Optical? Radar? That is the more important aspect. Similar issue is on line 74 where the sensor type should be mentioned and not just the platform which is UAS/UAV. Look through your paper for these kind of omissions.
Line 70 – change technical limitations to properties
Line 86 – the authors mention 3 methods, but the title takes up two. The third method seems to be the insitu data, but that has been used to train the RF Model, and I don’t think you are really assessing the accuracy of the method, so I would stick to the two methods.
Line 89 – delete simulation. You are just modelling.
Table 1 – the photos are rather small. Can they be made bigger. Put the date (day-month-year) of the photos in the Table text.
Line 129 – For what year or years is that the annual mean temperature?
Line 137 – For what location is that the duration of permanent snow cover?
Figure 2 – What is shown in Fig 2? It needs to be said clearly in the Fig text. Is this an average value for 1990-2020? It would be very helpful to know what the climate conditions were for the years in which you acquired the snow data. Was it a very snowy year? Windy in the days before you visited? Warm temperatures so that the snow melted some? Knowing these conditions can help us to explain any differences between the various results, particularly if the model is solely based on the DEM. I see you mention this on Line 401/402.
Line 141 – Write which day the data were acquired. If you cannot fit it reasonably in the text, because it was different dates for different palsas, I suggest you put it in Table 1 – dates for image and Lidar acquisition.
Several of the Figures have such small text that they are difficult to read. Eg Fig 3.
Section 3 – Is August the season for maximum thaw? It’s not September? Does Verdonen et al. 2023 state that August is the max ALT? If it is August, I think you should more specifically say the end of August. If you aren’t sure or don’t have a reference to back it up, then maybe it is more reasonable to say that the end of August is near max ALT.
Line 231 – 240 feel like they belong in the section describing the RF model.
Line 231/232 – Was the 10-fold cross-validation done when creating the initial RF model, or was this something that was done afterwards and used as the “validation” data presented in Figure 8? If it is the latter, you cannot say that it was used to reduce over-fitting in the model? There is an option in Random Forest to use cross-validation to create the model, and that is one tool of several to reduce over-fitting. Other ways to reduce over-fitting is to limit tree depth, -- by the way, in Section 3.2 you mention target node depth, but I don’t see in the caret package what that refers to. Is it “maxdepth”? In that case I suggest you name the parameter in parentheses.
Line 236/237 – What are “the initially calculated values”? You are using the insitu data to train a RF model and then evaluating the model based on a cross-validation that using that same insitu data. See my point #2 under “Larger issues”.
Line 273/274 – “Only a few narrow structures with significantly higher snow can be recognized based on the UAS LiDAR data” – I do not know what this sentence is about.
Line 281 and Fig 7 and Table 3 – I don’t think we need to see all 3 model runs, just the best one.
Line 285 – rather confusing that it is stated that Elevation was removed, and now it is important. Also Fig 7 text is impossible to read because it is so small.
Line 295 and Table 4 – these areas of “Top”, etc, could you have a figure somewhere – maybe supplemental where these areas are shown? Do we know the number of samples (n) in each group?
Line 323 also Line 346 – Fig 9?
Figure 9 – Is B (Slope in degrees) based on the DSM? Is this valid then to calculated slope based on vegetation?
Line 404/405 – I guess you are referring to reflectance of the lidar from the snow/ice surface? If so I think you should have a reference here.
Citation: https://doi.org/10.5194/egusphere-2024-2862-RC1
- AC1: 'Reply on RC1', Alexander Störmer, 28 Jan 2025
  
  Please find our response to all comments by Reviewer 1 in the attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2862-AC1
RC2:
'Comment on egusphere-2024-2862', Anonymous Referee #2, 23 Dec 2024
Summary
The authors produce a random forest (RF) model based on lidar-derived topographic predictors and point observations of snow depth. The RF model is used to create a continuous snow depth field over the three independent palsas in Finland and Sweden. It is then evaluated against the point observations and compared to UAS-lidar-derived snow depths. Finally, the authors discuss the implications of snow depth variability on permafrost dynamics at the palsas. The manuscript has well-constructed figures, relies on a unique and interesting dataset, includes an assessment of a wide range of reasonable terrain predictors of snow depth, and the methods are on the right track. However, there are some significant concerns, including manuscript organization/framing, limited lidar validation/processing concerns, model overfitting, and generally weak analysis/discussion. These are detailed below.

Lastly, please do not be overwhelmed by all of the comments! Addressing the major suggestions and proofreading the manuscript thoroughly should move this study much closer to publication. The specific comments are intended as suggestions/thoughts to help steer the revision process and are generally related to the below Major Suggestions/Comments.

Major Suggestions/Comments

Concerns with Research Objectives, Methods, and Manuscript Organization
Ln 94-99: It is my opinion that the research objectives need to be refined. The first stage of the paper should be an evaluation of lidar-based snow depth, followed by an evaluation of the RF modeling approach. Only then should the authors discuss the potential implications of the depth patterns, and this should be a smaller part of the manuscript focused in the discussion. Since the authors did not explicitly collect data to link snow depth to changes in the active layer (or ice loss/gain), the outcomes are more based on expectations and assumptions – which may be valid, but to verify and to be a focus of the manuscript would require more data. The points described on Ln 97-99 are underdeveloped and unsupported by observations.

Section 4.3 needs revision. Unless the expected errors in the lidar product are further expanded upon, these are physical observations and it is standard practice to assume that these products have uncertainties errors proportional to the sensor error (e.g., ~5 cm). This can be directly evaluated from ground observations of snow depth and was to some degree. However, the errors were much larger than expected (>20 cm, Ln 298-302), raising concerns about the processing of lidar data to produce snow depth maps. This component and the framing of the analysis are significant concerns. A section early on evaluating the lidar depth products seems necessary and considering the influence of vegetation on their accuracy explicitly (for example, examining some of the outliers in Figure 8 more closely) – addressing the concern of vegetation compression should be added here and vegetation height models produced from the summer lidar point cloud

Comparing the lidar to an RF model trained and evaluated against <200 observations directly is not appropriate. As presented, the lidar depth analysis does not add much to the manuscript – I suggest it be redone (reprocessed data, more detailed lidar depth evaluation), and/or, the work reframed to simply build the RF model using lidar terrain and snow depth point observations, then a revised analysis on how these patterns are expected to influence the palsa stability.

Random Forest Modeling Concerns
Ln 189-191: It seems like little consideration was given to the hyperparameters, and several important ones (like maximum split size, and minimum node size) are not mentioned. Please clearly state the hyperparameters used, and an optimization routine should be included to select these – not just using defaults – which are likely geared towards a much larger data set. If done correctly, this will reduce overfitting (see following concerns)

Various model runs were not clear. The predictors for model 1,2, and 3 should be explicitly stated, with the appropriate reasoning within the methods section

10-fold cross-validation is not sufficient to ensure that the model is not overfit. Each model is still trained with 90% (9/10) of all data (and the training dataset is relatively small <200 snow depths). The authors should explore the influence of fewer folds (e.g., 3-10) to assess how much model performance is degraded. For such a small dataset, around four (4) folds seems more appropriate in this case

Ln 232 – 242: This is relevant to RF model training/evaluation – I suggest moving to the random forest training section

The results shown in Table 3 (R² > 0.99, RMSE less than the expected measurement uncertainty <3 cm) suggest substantial overfitting of the random forest model

Avoid analysis based on terms like ‘potentially,’ ‘possibly,’ and ‘probably’ -- you should focus your study on explaining and describing the data you have collected its likely implications with clearly stated support

The terminology of cooling and warming spots was unclear – since these are not a standard term to my knowledge, these need to be explicitly defined early on and used consistently throughout the manuscript

Ln 71-73: UAS-lidar-based snow depth monitoring approaches/literature should be sufficiently reviewed in the introduction (& by the authors). The approaches used to produce snow-depth products do not align with standard practice (e.g., classifying the vegetation-free ground surface). See work by Avanzi et al., 2018; Harder et al., 2020; Jacobs et al., 2021.

Grammatical and sentence structure issues limit communication effectiveness in the paper. A thorough proofreading by a third party before resubmission would benefit the paper. Some specific instances of this were noted in the comments below

Minor/Technical/Grammatical Suggestions
Stick with snow depth or snow height throughout – be consistent with word choice

The use of the word ‘precision’ is questionable at times (see abstract Ln 9). Precision measures the ability for repeatable measurements. Accuracy is a better term for assessing something like a random forest model. Read through the manuscript and consider if the use of ‘precision’ is appropriate throughout

Word choice should be reviewed throughout – (e.g., Ln 78: ‘very strong changes’ could be ‘control’, Ln 212: ‘realism’)

Avoid broad terminology throughout, especially before something a term is explicitly defined (e.g., Ln 150 – was not clear what ‘input parameter data’ referred to again at Ln 181, 182, 185). I suggest defining the types of input parameters earlier on – like was done in Section 3.3

Abstract
Ln 13-15: Machine learning is used to model the snow depth spatially and relies on observations. On its own, it does not capture snow depth patterns. Consider rewording

The abstract should be a single cohesive paragraph, avoid splitting into two parts

Introduction
Ln 26, 57-58: While snow cover duration is decreasing, the suggestion that snow depth is increasing substantially in these regions is less clear. This paper suggests snowfall extremes will be reduced in the study area (https://www.nature.com/articles/s41598-021-95979-4) – can you clarify this point?

Ln 32-33: Sentence structure/clarity issues – please revise

The relevance of palsas is not addressed clearly in the introduction. Please add some sentences on their general significance, e.g., Do they stabilize permafrost? Provide habitat? Have societal relevance?

Ln 53: remove ‘exemplarily’

Ln 56-57: These points seem essential for understanding the relevance of palsas – I suggest this is moved earlier in the introduction when palsas are defined

Ln 60-61: ‘in-situ measured data’ or ‘observations’ need to be clarified. Is this temperature data? Snow depth? Other?

Ln 78-79: Wording is unclear – ‘…limits information value of satellite data…’

Ln 81:’Another’ should start a new paragraph – this section is also very short relative to the prominent role that machine learning plays in the paper. I suggest adding more detail.

Ln 90: ‘…test methods for generating detailed snow distribution maps..” should lead this section. The objectives need to be clearly stated up front

Data and Methods
Ln 141: a comprehensive dataset of what? Specify

Figure 3: Should clearly state the actual observations that were collected

Ln 151: This is the first time LiDAR is mentioned. Needs to be introduced within the introduction

Ln 157-160: very confusing. No SfM, but then orthophotos were created? That relies on photogrammetry -- but then you state point cloud densities. Are these associated with lidar or RGB orthophotos? If lidar, need to put it right after the lidar. Also, should report density per square meter as it is the standard.

Ln 162-164: Revise sentence structure for clarity

Ln 166: should be ‘by an RTK GPS system’

Ln 170: word choice - ‘optimal’

Ln 172: The sampling strategy is claimed to be randomized, though it appears observations were collected along transects with some random points. Some of these could be biased, so it would be useful to add a bit more description. There are also areas with clear gaps

Related to the previous point, the distribution of snow depth observations included in the appendix should be split by site (in my opinion)

Ln 176: It isn’t easy to make out any snow-free areas on the palsas in the imagery – can these be indicated?

Ln 184-188: To clarify, the full training set is based on only 185 observations – but increased due to the buffering? Please indicate how many unique features were actually used to train the model after the buffering. Will help the reader understand the robustness of the model

Ln 193-194: Just state the metrics were normalized 0-1, with the highest output importance set as 1

Ln 205: The removal of elevation as a predictor needs more explanation – the logic that is will ‘reduce possible overfitting’ is not apparent

Ln 206-207: wordy, what is ‘initial minimal impact’?

Ln 211: Unclear how this offers a balanced representation. The idea of taking an area is usually to remove noise, reduce the influence of sampling or geolocation errors, and to grow the training set size (taking groupings of nearby points vs. a single one - which should improve the robustness of the model). Please explain further.

Table 2: Nice table! For features like TPI (which are determined to be very important), you should be more detailed in their definition. More than ‘it combines several topographic features.’ TPI is generally just the relative elevation of a point to surrounding points within some radius (or adjacent pixels)

Ln 238-239: Be careful with wording. Correlation (strength of linear relationship) and significance (based on statistical testing) are not the same thing

Results
Section 4.1: Nice job describing results clearly and sequentially by site.

Figure 5: Nice figure, it would be useful to add annotations for areas of interest referred to in Section 4.1 on the figure (e.g., the collapsed areas)

Ln 252: When stating things like ‘slightly higher,’ specify the magnitude (is this 10cm, 20cm, 5cm?). Same as Ln 257, how much lower?

Ln 272-273: sentence clarity issue

Figure 7: Nice figure! Be sure to add more specifications on the model runs in the methods section

Ln 294: How were they separated into ‘point groups’ used to produce Table 4 – how were the different areas delineated and can these be added to the maps?

Ln 310-313: Correlation analysis results should be included as a table – this could be added to the appendix if the authors do not want to include it in the body of the paper

Discussion
Ln 318-319: Revise based on previous comments

Ln 329: warming and cooling spots need to be defined more before this point. What is a good technical definition? For example, are warming spots where the net heat flux into the ground during the winter is highest - making these areas warmer? vs. Cooling spots, where the net heat flux into the ground is lowest? We need to have a clear and more scientific definition

Ln 345: ‘Cooling spots inhibit a greater active layer thickness in summer’ – is this the technical definition? It comes across as difficult to interpret. An alternative version of this: ‘Cooling spots result in shallower active layers in summer compared to warming spots.’

Figure 8 - Nice figure. The delineations are helpful. Similar delineations would help the interpretation of results in prior figures

Section 5.2: This section should be revised thoroughly – see previous comments on lidar snow depth and RF model comparison

Ln 354: Luo and Panda studies were based on satellite remotely sensed snow cover – not sure I understand the link to UAS-lidar observations. Also, not clear what ‘not in depth post-processed data’ is. I did not understand the transition of the discussion from snow depth to snow cover

Ln 363-364: How did manual probing address the issue of vegetation? The uncertainty in these observations was never discussed

Ln 366-368: This doesn't seem like only a lidar limitation - but a measurement challenge in general. Measuring snow over dense vegetation with air voids, compression, etc.. is always challenging. New approaches to correct the lidar based on the underlying vegetation type/density/height may improve lidar snow depth products.

Ln 375-387: The discussion in this paragraph was strong, and it was easier to follow the logic. This could be an example to use when revising the discussion.

Ln 392, 407-408: Why was vegetation not removed from the summer point cloud? I do not understand why this was done in this manner. This step is critical for snow depth mapping with lidar.

Ln 395-397: There is a growing body of literature on this that would be useful to review. See Buhler 2016, 2017; Adams et al., 2018; Avanzi et al., 2018, Cho et al., 2024 (Preprint), Eker et al. 2019; Harder et al. 2020 (compares lidar and RGB)

Much of the discussion relies on findings from other studies and assumed links to snow depth observed in this study to conclude – not clear to me what value the work presented here has to understanding palsa permafrost dynamics more than point observations on a transect across one of these features would. Related to previous comments on reframing and refocusing the research objectives

Ln 412-414: A fewer number of folds should be used in the model training/validation

Ln 421, 425-426: A large number of input features are used in this model and the results as presented show nearly perfect model performance – are you suggesting others should be included? If others could make the model better, why were they not included?

Ln 423-424: Good point

Once noted challenges throughout are addressed – the discussion should be re-written to align with the updated manuscript

Conclusions
As presented, the paper is focused on the evaluation of the methods for snow depth mapping and on the predictors that control the depth distribution -- discussion into the influence of these characteristics on the thermal profiles is purely assumption based -- thus reframing the conclusion in line with the revised paper and the actual results/data presented will be critical in the revised version.
Citation: https://doi.org/10.5194/egusphere-2024-2862-RC2
- AC2: 'Reply on RC2', Alexander Störmer, 28 Jan 2025
  
  Please find our response to all comments by Reviewer 2 in the attached PDF.
  
  Citation: https://doi.org/10.5194/egusphere-2024-2862-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (10 Feb 2025) by S. McKenzie Skiles

AR by Alexander Störmer on behalf of the Authors (24 Mar 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (03 Apr 2025) by S. McKenzie Skiles

RR by Anonymous Referee #2 (17 Apr 2025)

RR by Anonymous Referee #3 (13 May 2025)

ED: Publish subject to minor revisions (review by editor) (15 May 2025) by S. McKenzie Skiles

AR by Alexander Störmer on behalf of the Authors (22 May 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (09 Jun 2025) by S. McKenzie Skiles

AR by Alexander Störmer on behalf of the Authors (11 Jun 2025)

Journal article(s) based on this preprint

22 Sep 2025

Comparing high-resolution snow mapping approaches in palsa mires: UAS lidar vs. modelling

Alexander Störmer, Timo Kumpula, Miguel Villoslada, Pasi Korpelainen, Henning Schumacher, and Benjamin Burkhard

The Cryosphere, 19, 3949–3970, https://doi.org/10.5194/tc-19-3949-2025,https://doi.org/10.5194/tc-19-3949-2025, 2025

Short summary

Alexander Störmer, Timo Kumpula, Miguel Villoslada, Pasi Korpelainen, Henning Schumacher, and Benjamin Burkhard

Viewed

Total article views: 880 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
634	124	122	880	15	31

HTML: 634
PDF: 124
XML: 122
Total: 880
BibTeX: 15
EndNote: 31

Views and downloads (calculated since 08 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	100	32	5	137
Dec 2024	57	24	2	83
Jan 2025	39	13	3	55
Feb 2025	17	5	0	22
Mar 2025	17	4	1	22
Apr 2025	18	8	38	64
May 2025	27	11	48	86
Jun 2025	29	9	23	61
Jul 2025	20	6	1	27
Aug 2025	122	10	0	132
Sep 2025	188	2	1	191

Cumulative views and downloads (calculated since 08 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	100	32	5	137
Dec 2024	57	24	2	83
Jan 2025	39	13	3	55
Feb 2025	17	5	0	22
Mar 2025	17	4	1	22
Apr 2025	18	8	38	64
May 2025	27	11	48	86
Jun 2025	29	9	23	61
Jul 2025	20	6	1	27
Aug 2025	122	10	0	132
Sep 2025	188	2	1	191

Viewed (geographical distribution)

Total article views: 914 (including HTML, PDF, and XML) Thereof 914 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Sep 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (15999 KB)
Metadata XML

Short summary

Snow has a major impact on palsa development, yet understanding its distribution at small scale remains limited. We used LiDAR UAS and ground truth data in combination with machine learning to model snow distribution at three palsa sites. We identified extremes in snow depth corresponding to palsa topography, providing insights into the influence of snow distribution on their formation. The results demonstrate the applicability of machine learning for modeling snow distribution at a small scale.


Total:	0
HTML:	0
PDF:	0
XML:	0

Comparing High-Resolution Snow Mapping Approaches in Palsa Mires: UAS LiDAR vs. Machine Learning

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)