the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Comparing High-Resolution Snow Mapping Approaches in Palsa Mires: UAS LiDAR vs. Machine Learning
Abstract. Snow cover has an important role in permafrost processes and dynamics, creating cooling and warming systems, impacting the aggradation and degradation of frozen soil. Despite theoretical, experimental, and remote sensing-based research, comprehensive understanding of small-scaled snow distribution at palsas remains limited. This study compares two approaches to generate spatially continuous, small-scale snow distribution models in palsa mires in northwestern Finland based on Digital Surface Models: a machine learning approach using the Random Forest algorithm with in-situ measured snow depth data and an Unmanned Aerial System (UAS) equipped with a Light Detection and Ranging (LiDAR) sensor. For the first time, snow distribution was recorded over a palsa using a UAS. The aim is to review which approach is more precise overall and which areas are not represented sufficiently accurate. In comparison to in-situ collected validation data, the machine learning results showed high accuracy, with a RMSE of 6.16 cm and an R2 of 0.98, outperforming the LiDAR-based approach, which had an RMSE of 26.73 cm and an R2 of 0.59. Random Forest models snow distribution significantly better at steep slopes and in vegetated areas. This considerable difference highlights the ability of machine learning to capture fine-scale snow distribution patterns in detail. However, our results indicate that UAS data enables the study of snow and permafrost interaction at a highly detailed level as well.
Generally, snow accumulation zones especially at steep edges of the palsas and inside cracks are recognizable, while thin snow cover occurs at exposed areas on top of the palsas. Correspondingly, areas with thicker snow cover at the edges and inside cracks act as potential warming spots, possibly leading to heavy degradation including block erosion. In contrast, areas with thinner snow cover on the exposed crown parts can act as cooling spots. They initially stabilize the frozen core under the crown parts, but then form steep edges and expose the frozen core, leading finally to even more block erosion and degradation.
- Preprint
(15999 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 20 Dec 2024)
-
RC1: 'Comment on egusphere-2024-2862', Anonymous Referee #1, 03 Dec 2024
reply
The paper “Comparing High-Resolution Snow Mapping Approaches in Palsa Mires: UAS Lidar vs Machine Learning” by A. Störmer et al. aims to quantify the accuracy and efficiency of mapping snow depth over three palsas in northern Finland, in a spatially continuous raster-based map. Specifically, they choose two methods to compare: 1) using a Lidar sensor on a drone with two acquisition dates of data (no snow and snow), and 2) modelling snow depth based solely on a digital elevation model and using the machine learning algorithm “Random Forest”. In situ data of snow depth are collected and used for training and validation. It is an interesting idea, and the need of mapping snow-depths over permafrost features is of great interest. It is also hard work, as noted by the authors in the Discussion, and the contribution of this paper will be of use for those wishing to map snow cover over terrain that has large variations over short distance, such as palsas. The conclusion was that the Random Forest model gave superior results as compared to the UAV Lidar. However, I have some major questions about the process and conclusions that must be addressed, as I question the overly optimistic result presented from the Random Forest model. The two larger issues to be addressed are below, followed by general and specific comments.
Larger issues that need to be addressed:
- Why was a Digital Surface Model and not a Digital Terrain Model used for calculating the ground in the no-snow data, and how does this affect the snow-depth measurements, and even the topographic derivatives used in the RF Model?
- If the authors used cross-validation and present it as the accuracy of the model, then this result is over-optimistic and the comparison of UAV-Lidar to the Random Forest model result is biased and not a fair comparison to make.
In more detail
1 - Use of a DSM to represent ground level - It appears that the authors have made a Digital Surface Model (DSM) from the Lidar point data to represent the ground, rather than create a Digital Terrain Model (DTM) from the Lidar data. The DSM represents the height of all objects on the surface, and if there are shrubs on the palsas (which is typically the case in degraded palsas), they may be 35-50 cm tall. Therefore if a DSM was used to represent the ground in August, while insitu snow-depth measurements were taken from the ground up, the reported snow-depth will be highly affected by the height of the vegetation, and this will then vary over the whole surface of the palsa. If the authors have a reason for using a DSM rather than DTM, it is not clear in the article, and it needs to be motivated. Using a DSM will result in error in the snow depth measurements as presented. To create a DTM from your existing data is not difficult. If you look at the paper by Jacobs et al., 2021, you will see reference to papers that discuss the potential errors of snow depth measurements when DSMs are used.
In addition if the DSM was used to calculate the Topographic derivatives used as input parameters to the RF model, are these derivatives valid?
2- Cross validation - As I understand what has been done, the results of snow-depth for UAV Lidar and RF Modelling have been evaluated differently. In the case of UAV Lidar, the in situ data act as a fully independent data set used for calculating RMSE and the accuracy of the snow-depth measurements. In the case of the RF Modelling, the in situ data are used for training of the model, and the validation of the model as presented (see Fig 8) seems to have been made using a 10-fold cross-validation. In any case, the latter means that the data used to create the model are also used to evaluate the model. Cross-validation is never an assessment of the resulting map accuracy but is an assessment of the fit of the model. So it is no surprise that the authors get seemingly much better results for the RF Model – the comparison is biased in the favor of the RF Model. Figure 8 shows this clearly, and to me is misleading. So the conclusion, as in the Results on Line 367/368, that the RF Model is showing its strength without high bias, I think is not valid.
The only way to fairly compare the assessments of these two would be to develop a model using in situ data from one palsa and apply the RF model developed to the other two palsas and assess the accuracy using the in situ data from those two palsas. Or, you could take insitu data from half of each palsa and developing training and accuracy datasets. (Note that if you consider taking a random selection of the insitu data for training/accuracy it is not optimal, since you will have spatial autocorrelation issues due to the proximity of the points, which is why the previous suggestions are better. )
Other general
The title: Rather than using the term “Machine Learning”, I think it would be better to refer to this as “Modelling”, because it doesn’t make sense to me to compare it to the specific algorithm that is used, but rather that you have created a model to predict snow depth.
There have been scientific articles that have mapped snow with UAV Lidar, eg, Jacobs, J.M. et al., 2021 “Snow depth mapping with unpiloted aerial system lidar observations: a case study in Durham, New Hampshire, United States” in The Cryosphere. (https://doi.org/10.5194/tc-15-1485-2021. While this may be the first paper to be published using UAV Lidar for snow on a palsa, I think that the Introduction should review and refer to articles that have generally applied UAV Lidar mapping of snow over other landscape types.
Section 2.1 is lacking a description of vegetation heights on the palsas.
The following points all refer to Section 3.1 – Data collection
- Did you Post-Process the UAV Lidar data with RINEX data from a base station? If so, what was the base station (ie, source of the RINEX data)?
- Parameters for the UAV flights are needed, eg, flying altitude, were cross-wise flights used? Knowing the directions of the flight lines is important because there are some Lidar measurements of 0 cm snow depth, and 50-60 cm snow depth in the insitu data, and it might be explained (possibly?) by not acquiring Lidar data in multiple angles – but I am not sure what has been done.
- Line 151/152 says that GCPs were set out. Was this for both the Lidar and the RGB images? How many GCPs? And then, what was the horizontal and vertical accuracy of your data – both the Lidar and the RGB images?
- Line 153 – Change orthopictures to images, since the raw images are not orthorectified yet. That’s a later step.
- Line 157/158 “Structure from Motion techniques were not applied…” I do not understand why this sentence is here. If you created an orthophoto, which you say you do in the next sentence, then you have applied photogrammetric image matching (how you define SfM and if you define it differently than photogrammetric image matching determines what term you like to use). But why even say what you haven’t done? State what you have done to produce the orthophoto.
- Line 164 – I think you mean snow depth rather than snow cover.
- Line 166 – RTK-GPS.
- It says on line 173 that there are randomized points on the edges of Puolikkoniva, but I do not see very many of these (maybe 5 at most?). In hindsight, I would guess that you would want to have made cross-wise transects on this palsa. Take this up in the Discussion if so.
Reference (in situ) data
- I think you need a separate section to describe Reference data collection – either two sub-sections under 3.1 or else 3.1 for UAS data collection and 3.2 for Reference data collection. Under the reference data collection, there should be a better description regarding how the insitu snow depth measurements were made, specifically, was the GPS Z-measurement made from the ground level? Was it a yardstick, and was a level used to make sure it was normal to the surface?
- For the insitu data you need at some point to say that these also may have errors and what these errors may be caused by, and how they may affect your result. Since the RF model is completely based on the insitu data, the errors of the insitu data are simply propagated, but do not affect the evaluation. For validating the Lidar data derived snow depths, the potential measurement errors of the insitu data are only accounted for in the evaluation.
- Also, think about whether the section on UAS data collection is only about data collection or if you want to describe the processing of the data here – in which case you might just name it “UAS data” or “UAS data collection and processing”.
- The in situ data particularly in the case of the largest palsa Puolikkoniva were run in two transects lengthwise along the palsa, but not crosswise, over the edges where the deepest accumulation of snow may have been. Therefore the values where some of the largest differences are between the Lidar and the RF Model cannot really be assessed, making the assessment incomplete – the shortcoming must be acknowledged.
Also the Lidar may measure extremes in snow-depths, while the model will not if it does not have representative data for the extremes. Therefore there will be more variability in the Lidar data, but we cannot tell which is “wrong”.
Section 3.2 – RF algorithm
- The authors state on Line 189 that no explicit hyperparameters were specified. So this means that they were not analyzed, although the outcome of the model is what is being assessed as the main objective of the article. It is not difficult to assess the hyperparameters using Grid-Search or another comparable function.
- Permutation mode was used for variable importance – do you know how this works? Is it a single run of the RF model? When you run PI repeatedly, do the same variables have the same importance? The random nature of RF often requires running variable importance (or in this case PI) many times (eg, 100) and taking an average. Even then, one needs to be careful with their interpretation of variable importance.
- For Line 187-188 - I’m not really sure what you have done with the model and the in situ data. You state that you have split 70% training and 30% test. Is this used by RF for internal cross-validation of the model (if you split the data 70/30 in the RF model, then it is likely this is how it is being used). Is this done with replacement? If you have removed 30% of the data for independent evaluation, then you need to clearly state this, but I don’t think this is what you have done.
- Line 184 – The dependent variable for your model is snow-depth.
- Line 185 –“Input parameters” are mentioned here but we don’t know what they are until later. Couldn’t you refer to Table 2 here? Otherwise we are left wondering what the parameters are.
- Line 189 – delete “precise” – This is a judgmental word – leave it to your results to be the judge of that.
- In addition, RF models are sensitive to imbalance in the training data, and also do not extrapolate beyond the minimum and maximum snow-depth values (or whatever the target variable may be). How are your results affected by this, and how might others in the future be affected by this and what would your recommendations be to future applications of this method?
Section 3.3 –
- The first sentence needs rewriting. First of all, which “collected airborne data” is referred to here? I assume it was the August DSM from Lidar that was used? It is not stated. Were these data processed differently than what was described in Section 3.1? Declare which DEM you are working with and say specifically that you are creating parameters from this. What happens if you use a DSM and create all of these topographic derivatives as parameters? Are those new derivatives valid, such as Topographic Wetness Index, if they are based on the surface elevation which includes vegetation? This must be well-motivated if the authors believe that there is a valid reason for this.
- Line 210 – If a 0.3 m buffer was used were the values for any parameters averaged within this area?
- Table 2 – 12 parameters were used, but 21 are in the table. Could you indicate in a way what parameters were used?
For the Discussion: When you made the insitu measurements, it was August, and the palsa had likely subsided. Renette et al., 2024 show that the difference between elevation in September (likely maximum thaw depth of the Active Layer) and April (minimum thaw) was on average 15 cm, and up to 30 cm in some areas, albeit on a taller palsa than in the study presented here. In any case, this may mean that trying to measure snow depth using a DTM from September may introduce errors if the terrain is actually elevated some cm more than this. This is hard issue to solve with UAV Lidar, since you would need to be in place to create a DTM right after snow-melt, and all snow would need to have melted. So, you need to discuss what implications this has to your results. Also, since you have RTK-GPS data, and you have measured to the ground I assume, you actually have a dataset where you could compare the Z-measurement from March to the DTM from August, and get an estimate of the difference in height between the max-thaw and min-thaw state of the palsa.
Language
It’s my feeling that some value judgement words don’t belong in a scientific article. Such as “exemplarily” on line 53.
Line 38 – deepening instead of growth. Line 58 – deeper instead of higher.
Otherwise some minor grammatical fixes once the paper is revised can be looked over.
Specific
Line 35 – it is not only bound by peatland presence but also climatic parameters
Line 69 – “Satellite data” only names the platform. What kind of satellite data are you referring to? Optical? Radar? That is the more important aspect. Similar issue is on line 74 where the sensor type should be mentioned and not just the platform which is UAS/UAV. Look through your paper for these kind of omissions.
Line 70 – change technical limitations to properties
Line 86 – the authors mention 3 methods, but the title takes up two. The third method seems to be the insitu data, but that has been used to train the RF Model, and I don’t think you are really assessing the accuracy of the method, so I would stick to the two methods.
Line 89 – delete simulation. You are just modelling.
Table 1 – the photos are rather small. Can they be made bigger. Put the date (day-month-year) of the photos in the Table text.
Line 129 – For what year or years is that the annual mean temperature?
Line 137 – For what location is that the duration of permanent snow cover?
Figure 2 – What is shown in Fig 2? It needs to be said clearly in the Fig text. Is this an average value for 1990-2020? It would be very helpful to know what the climate conditions were for the years in which you acquired the snow data. Was it a very snowy year? Windy in the days before you visited? Warm temperatures so that the snow melted some? Knowing these conditions can help us to explain any differences between the various results, particularly if the model is solely based on the DEM. I see you mention this on Line 401/402.
Line 141 – Write which day the data were acquired. If you cannot fit it reasonably in the text, because it was different dates for different palsas, I suggest you put it in Table 1 – dates for image and Lidar acquisition.
Several of the Figures have such small text that they are difficult to read. Eg Fig 3.
Section 3 – Is August the season for maximum thaw? It’s not September? Does Verdonen et al. 2023 state that August is the max ALT? If it is August, I think you should more specifically say the end of August. If you aren’t sure or don’t have a reference to back it up, then maybe it is more reasonable to say that the end of August is near max ALT.
Line 231 – 240 feel like they belong in the section describing the RF model.
Line 231/232 – Was the 10-fold cross-validation done when creating the initial RF model, or was this something that was done afterwards and used as the “validation” data presented in Figure 8? If it is the latter, you cannot say that it was used to reduce over-fitting in the model? There is an option in Random Forest to use cross-validation to create the model, and that is one tool of several to reduce over-fitting. Other ways to reduce over-fitting is to limit tree depth, -- by the way, in Section 3.2 you mention target node depth, but I don’t see in the caret package what that refers to. Is it “maxdepth”? In that case I suggest you name the parameter in parentheses.
Line 236/237 – What are “the initially calculated values”? You are using the insitu data to train a RF model and then evaluating the model based on a cross-validation that using that same insitu data. See my point #2 under “Larger issues”.
Line 273/274 – “Only a few narrow structures with significantly higher snow can be recognized based on the UAS LiDAR data” – I do not know what this sentence is about.
Line 281 and Fig 7 and Table 3 – I don’t think we need to see all 3 model runs, just the best one.
Line 285 – rather confusing that it is stated that Elevation was removed, and now it is important. Also Fig 7 text is impossible to read because it is so small.
Line 295 and Table 4 – these areas of “Top”, etc, could you have a figure somewhere – maybe supplemental where these areas are shown? Do we know the number of samples (n) in each group?
Line 323 also Line 346 – Fig 9?
Figure 9 – Is B (Slope in degrees) based on the DSM? Is this valid then to calculated slope based on vegetation?
Line 404/405 – I guess you are referring to reflectance of the lidar from the snow/ice surface? If so I think you should have a reference here.
Citation: https://doi.org/10.5194/egusphere-2024-2862-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
123 | 45 | 6 | 174 | 0 | 0 |
- HTML: 123
- PDF: 45
- XML: 6
- Total: 174
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1