How hidden variables limit the performance of shallow landslide susceptibility models
Abstract. Susceptibility mapping is critical in assessing shallow landslide hazard and sediment transport potential. Advancements in modelling techniques and the availability of high-resolution spatial data have continuously improved the performance of landslide susceptibility maps. Nevertheless, discrepancies between predicted susceptibility and observed landslide occurrence remain. In addition to shortcomings in model design and the incompleteness of landslide inventories, the accuracy and transferability of susceptibility models are critically limited by hidden variables, such as site-specific variability in soil development, that control the triggering process but are rarely available in inventories. Here we developed an extensive case study framework, and apply it to two uniquely detailed inventories in order to quantify the role of hidden variables, as well the effects of incomplete landslide inventories. The first inventory is a comprehensive regional dataset containing over 24,000 mapped landslides across 5,939 km², and the second is a field-validated dataset of 734 landslides which includes detailed documentation of hidden variables. We trained two Random Forest machine learning models using a wide range of explanatory variables, including topography, land cover, soil properties, and climate. The first model was optimized for the first dataset, and achieved high predictive performance within its training domain (mean cross-validation of the area under the curve, AUC = 0.89). However, its accuracy decreased significantly (AUC = 0.74) when applied to the second dataset, highlighting limitations in transferability. The second model was optimized for the second dataset (AUC = 0.79). A comparison of the two models revealed that regional climatic and geologic data hindered transferability to remote regions because the relationship between available and hidden variables is not properly captured by the susceptibility model. We further analysed the predicted susceptibility values as a function of the site-specific information collected in the second database, to quantitatively explore the role of hidden variables. The analysis suggested that variables related to (i) subsurface heterogeneity and (ii) vegetation complexity govern landslide initiation, but are rarely accounted for in susceptibility models. Specifically, the models underestimated susceptibility in poorly developed soils and areas with uniform forest layering. This study underscores the necessity of a process-based understanding grounded in field observations to capture the full complexity of landslide failure mechanisms, relevant to landslide susceptibility modelling.
This study presents an analysis of how certain variables may limit the performance of shallow landslide susceptibility modelling in Switzerland. I find the topic relevant and promising; however, improvements are necessary, particularly in the introduction and methodology sections, which require a more detailed description. Additional comments and suggestions are outlined below:
It would be helpful to provide more background information about the study area. The manuscript barely discusses Switzerland, yet it is important to explain why this country was selected, its historical context regarding landslides, and the current state of landslide inventory mapping and management. This additional context would help readers better understand the relevance and applicability of the study.
The manuscript uses predictor variables with spatial resolutions ranging from 2 m to 1000 m. This considerable difference in spatial resolution raises concerns about the consistency of the input data and its potential impact on the model's performance. The authors should clarify how these datasets were harmonized prior to the analysis (e.g., resampling method and target resolution) and discuss the possible implications of combining variables at such different spatial scales.