Spatial machine learning modelling reveals that soil indicators and tree type best explain shallow landslide release
Abstract. The exploration of shallow landslide susceptibility is often impaired by incomplete and imperfect landslide inventories, and by over-optimistic performance metrics linked to inadequate models. In this article, we make use of a recently published, systematically mapped event inventory containing 571 shallow landslides triggered in southern Norway and apply a total of 32 gradient boosted decision tree models to rigorously test the effects of (1) a nested vs. simple cross-validation strategy, (2) spatial vs. non-spatial models, (3) four different cross-validation sampling strategies which were applied on (4) full vs. forest-only datasets. Model evaluation shows that the spatial model with small block nested cross-validation is a suitable compromise between modelling the spatial structure of the landslide data adequately while retaining realistic predictive power. The compromise models suggest that soil thickness explains landslide probability partly, while other important explanatory factors like elevation, aspect and bedrock weatherability serve as soil indicators, illustrating a need for improved datasets for soil thickness and heterogeneity. In the forest-only models, tree type explains landslide probability best, driven by greater landslide susceptibility in deciduous forest. Presented results suggest that susceptibility mapping may be improved significantly by considering forest variables and forest-specific threshold values.
This paper evaluates the efficacy of various cross-validation partitioning methods in capturing spatial dependencies during hyperparameter tuning for landslide spatial prediction (susceptibility). Additionally, it examines the impact of integrating gradient-boosted decision trees with mixed-effect modeling to account for spatial non-stationarity in landslide susceptibility predictions. They use this approach also explore the influence of forest properties on the spatial prediction of landslide occurrence during a 2023 major rainfall event in southern Norway. SHAP values were used to explore relative and rank of variable importance when comparing models with different spatial dependency structures and tuning approaches. Partial dependency plots of SHAP values were used to peak at general landslide response-predictor trends.
Overall, there are some very good methods and process-based findings in this paper that would serve the landslide susceptibility modelling community quite well:
Methods:
Process:
The authors should be cautious not to over-interpret results beyond what their methods can strictly support, in particular how the number of predictors and correlations between them may influence the variable importance results. Furthermore, the current structure of the Results and Discussion sections makes the narrative difficult to follow. Because interpretations are interspersed throughout both sections, it is challenging to identify coherent summaries of the data or clear trends in the authors' explanations.
Major Comments
A major limitation of this paper is the authors’ tendency to overinterpret the results supported by their methods. For example, variable importance ranking is highly variable depending on the collinearity between predictors/variables. The authors often provide conjectures on why variables, like elevation are important, without accounting for possible correlations (e.g. with precipitation and landuse).
Adding in a correlation analysis of the predictor variables can help strengthen the findings or further provide explanation for why variable rankings differ between different model setups. For example, I would guess the importance of elevation and precipitation are flipped between spatial and non-spatial due to them being highly correlated.
Also, adding in a figure/map to illustrate what the different spatial partitioning approaches look like when applied to the landslide data would help in interpreting the results. Especially, to help differentiate if some spatial partitioning approaches are favoring a more ‘locally’ (vs. globally) tuned model due to their spatially coherent groups.
Additionally, the interpretation of the results relies heavily on process-based knowledge. I think the findings would benefit greatly from a discussion that considers the spatial (structure) heterogeneity of the predictors and their potential to differential across space where landslides occur, and how these are captured by the different spatial portioning approaches, and modelling methods.
Minor Comments
L15. Need to provide context to why forest-only dataset is used in the abstract. Currently missing.
L46. Need to be more explicit to what is meant by data-driven. E.g. a physical model can be data-driven if observation data is used for it’s tuning. Maybe empirical models could be used instead.
L58. “almost perfect levels of landslide prediction accuracy” – I would argue there is no perfect prediction accuracy 😊. Maybe word they are highly overoptimistic.
L62. “Machine learning” to general here since later decision tree structure is mentioned in the sentence… SVM’s don’t have decisions trees.
L75. Brenning 2012 in IEEE IGRSS missing from references (original spatial CV paper).
L83. May want to look at Schratz et al 2019 in Ecological Modelling to add to the discussion – they similarly explore spatial partitioning schemes on hyperparameter tuning for predictions with spatial data.
L84. “The majority of recent .. studies appear to apply random or no CV”. The authors’ cannot state this without a reference to a review paper or them doing the review.
L105. Authors state they didn’t create a susceptibility prediction map because it was event specific data. I suggest they do add one to qualitatively assess the geomorphological plausibility of the prediction results, which can provide further evidence to support one method over another.
L115. Goetz et al 2015 in NHESS also assess (regionally) the influence of forest structure on landslide spatial prediction (beyond landuse).
L144-147. I would suggest removing this section. It is vague and doesn’t help explain the methods used.
L149. “The statistical and machine learning”, only ML were applied in this analysis, not statistical.
L160. “aim to establish causal relationships”, SHAP values used in the analysis represent contribution, not causation.
L176. Reference Fig. 7 here to point the reader to what this sampling of control points looks like.
L.333. A figure/map of what they sampling partition groups look like when applied to the landslide and control data would help the reader better understand the results.
L400. Figure 3. If making model comparisons, the x-axis values should be the same between each SHAP value summary plots – e.g. -0.4 to 0.75. This helps makes it easier for the reader to make comparisons.
L.562. “Shallow landslides did not occur in areas with highest rainfall intensities” – the authors interpret this as possible unreliable data. It’s probably more likely that higher rainfall intensity does not equal more landslide frequency. Most landslides will likely be triggered at threshold below the maximum rainfall intensity.
L575. Can bedrock really explain why landslides were triggered in southeastern areas – the SHAP value for bedrock was ~0.?
L585-588. Too much conjecture. Hard to explain the spatial patterns of rainfall act as spatial structure proxies in non-spatial models, especially without out seeing what the spatial partitions for model tuning look like. It could be possible that the non-spatial model looked for more global patterns vs. the spatial methods that may have relied on more local patterns for model tuning (difficult to comment without seeing what the partitions look like.
L603. “flow accumulation threshold of 5000 m2” – we cannot make this statement from the plots, we can say there is a trend around their based on the modeled data, but due to many predictors and likely high correlations, we have to be cautious to over interpret the “bumps” in the partial dependence plots.
L620-625. The authors may be mis-interpreting the results. They suggest the meteorological, vegetation and soil data may be too coarse to be good. It might be true, but it could also be highly correlated with elevation, tree cover or between each other, and thus have a lower ranking, or the parameters are splitting the importance. Hard to tell without a correlation analysis.
L716. Lower ranking of other tree properties may be because they are highly correlated with tree type.
L784. “We have shown that ML results are highly sensitive” – good to be specific here. Only gradient boosting was assessed, so cannot generalize that all ML is sensitive to CV techniques and spatial modeling choices.