Multi-scale spatial validation and probability calibration of pixel-based landslide susceptibility modeling in the northern Peruvian Andes
Abstract. Landslides are recurrent geohazards in Andean regions, causing significant impacts on infrastructure and local communities. In spatially structured terrains, model reliability hinges on the definition of pseudo-absence samples and the treatment of spatial dependence during validation. This study evaluates pixel-based rotational landslide susceptibility in the province of Huancabamba (Piura, northern Peru) using a Random Forest classifier and seven conditioning factors derived from a photogrammetric digital elevation model and lithological data at 10 m resolution.
The landslide inventory consists of 25 field-mapped rotational landslides compiled from geomorphological surveys and high-resolution photogrammetric products. Pseudo-absence samples were selected outside mapped polygons using a buffered exclusion zone to reduce label uncertainty, and a balanced sampling scheme (1:1) was adopted. To obtain spatially realistic performance estimates, model evaluation was conducted using spatial block cross-validation with block sizes ranging from 600 to 1500 m. This provides a clear view of how spatial partitioning affects discrimination and calibration, alongside the model's stability throughout the validation folds.
Results show that discrimination performance decreases systematically as spatial block size increases, indicating that conventional random validation may overestimate predictive capacity due to spatial autocorrelation. A block size of 900 m provided a compromise between spatial independence and fold stability. Permutation importance computed under spatially independent folds identified lithology and elevation as the dominant predictors of rotational landslide occurrence, followed by aspect and topographic wetness index. Calibration metrics (Brier score and Expected Calibration Error) indicated moderate but stable reliability of susceptibility scores across spatial configurations.
The resulting susceptibility map shows spatial patterns consistent with the geomorphological setting and the mapped inventory, with high susceptibility concentrated in steep slopes developed over weak lithological units. These findings indicate that integrating spatial validation, calibration, and constrained sampling improves the reliability of pixel-based modelling in this Andean setting.
The paper assesses landslide susceptibility based on 25 rotational landslides in a relatively small study area (~32 km²) in northern Peru using a Random Forest model, with a focus on pseudo-absence selection and spatial validation. By applying and interpreting spatial block cross-validation, the author infers that conventional validation likely overestimates model performance due to spatial autocorrelation. Based on permutation-based feature importance, lithology and elevation were identified as the main drivers of slope instability. The study concludes that model performance strongly depends on the chosen validation approach, with larger validation blocks leading to lower but more realistic performance estimates (LINE 622f) “highlighting the influence of spatial autocorrelation on apparent predictive capacity”. The author finally concludes that (Line 644f) “The modelling approach developed in this study is particularly suited to rotational landslides under similar geomorphological conditions. Extension to other mass movement types would require process-tailored sampling strategies and predictor selection.”
While the manuscript addresses relevant aspects of data-driven landslide susceptibility modelling, I am not fully convinced that it is ready for publication. The scope is relatively narrow, focusing on a small number of landslides (n = 25) within a limited spatial extent. In addition, several methodological decisions remain insufficiently justified or unclear. Most importantly, the study offers limited conceptual or methodological advancement, as Random Forest-based susceptibility modelling and spatial validation strategies are already established in the literature. Overall, I consider the manuscript to be at the borderline between rejection and major revision. I formally recommend major revisions, although addressing the key concerns outlined below may require substantial changes that could alter the scope and structure of the study.
Details:
1. Sampling strategy and spatial autocorrelation
The effective sample size appears to be artificially inflated (approximately 30,000 observations within ~32 km²). From the text I understand that multiple pixels from the same landslide were treated as independent presence observations. If so, this may introduce another dimension of spatial dependence and potentially undermines the central aim of this study of addressing spatial autocorrelation. Additionally, this approach implicitly assigns (undesired?) greater weight to larger landslides, as they contribute more pixels to the model (even though magnitude/size is not part of conventional landslide susceptibility modelling).
Alternative strategies could be considered. For example, mixed-effects modelling frameworks allow explicit treatment of hierarchical structures (e.g., pixels nested within landslides via a landslide ID as a random effect). More broadly, if spatial dependence is a core topic in this study, the analysis could go beyond spatial block validation alone. There are even approaches such as spatially explicit hyperparameter tuning that tackle the topic not only during validation.
Schratz, P., Muenchow, J., Iturritxa, E., Richter, J., and Brenning, A.: Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data, Ecological Modelling, 406, 109–120, https://doi.org/10.1016/j.ecolmodel.2019.06.002, 2019.
Schlögl, M., Spiekermann, R., and Steger, S.: Towards a holistic assessment of landslide susceptibility models: insights from the Central Eastern Alps, Environ Earth Sci, 84, 113, https://doi.org/10.1007/s12665-024-12041-y, 2025.
Moreover, several studies in landslide susceptibility modelling advocate using a single representative point per landslide, preferably within the initiation zone, to better capture causal landslide conditions. Based on Figure 1, it appears that both scarp and runout areas were mapped and therefore sampled. Including runout zones may dilute the relationship between predictors and landslide initiation, potentially explaining why slope or other predictor appears unimportant while lithology and elevation dominate. These implications should be more thoroughly considered.
2. Missing engagement with spatial cross-validation literature
The manuscript does not sufficiently engage with other literature on spatial cross-validation. Several contributions (e.g., Brenning; Schratz et al.; Schlögl et al.) as well as critical perspectives (e.g., Wadoux et al.) are not adequately discussed. Incorporating these studies would help contextualize the methodological choices and clarify the degree of novelty of this work. It would also allow for a more balanced discussion of when spatial cross-validation is appropriate and what its limitations are.
Wadoux, A. M. J.-C., Heuvelink, G. B. M., de Bruin, S., and Brus, D. J.: Spatial cross-validation is not the right way to evaluate map accuracy, Ecological Modelling, 457, 109692, https://doi.org/10.1016/j.ecolmodel.2021.109692, 2021.
Brenning, A.: Spatial prediction models for landslide hazards: review, comparison and evaluation, Natural Hazards and Earth System Science, 5, 853–862, 2005.
Schratz, P., Muenchow, J., Iturritxa, E., Richter, J., and Brenning, A.: Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data, Ecological Modelling, 406, 109–120, https://doi.org/10.1016/j.ecolmodel.2019.06.002, 2019.
Schlögl, M., Spiekermann, R., and Steger, S.: Towards a holistic assessment of landslide susceptibility models: insights from the Central Eastern Alps, Environ Earth Sci, 84, 113, https://doi.org/10.1007/s12665-024-12041-y, 2025.
3. Justification of spatial partitioning strategy
The rationale behind the selected spatial block sizes (600–1500 m) remains unclear and should be more explicitly justified. In particular, it is not evident how these scales relate to the spatial characteristics of the mapped landslides or the underlying processes. Additionally, it may be important to clarify whether individual landslides can span multiple spatial blocks. If so, there is a possibility that different parts of the same landslide are included in both the training and validation datasets. This would compromise the independence between training and test data. The manuscript should explicitly address this issue and, if relevant, describe how such cases were handled.
Alternative spatial partitioning strategies are neither tested nor discussed, such as:
• clustering-based approaches (e.g., k-means),
• geomorphologically meaningful units (e.g., catchments),
• lithological or landscape-based stratification.
Acknowledging or comparing such alternatives would strengthen the methodological robustness.
4. “Spatial leakage”
The term “spatial leakage” (Line 79) is introduced without a clear definition. While it is often used to describe unintended information transfer between training and validation datasets (see also comments before) due to spatial proximity, the manuscript should explicitly define how the term is used in this study.
5. Process-based considerations
The study area is small but contains relatively large, predominantly rotational landslides (mean size ~16.7 ha), which likely refer to deep-seated processes. In such cases, subsurface conditions (e.g., material properties, hydrology, weak layers) are often key controls but are not well captured by typical surface predictors used in this study. Furthermore, as also outlined above, the manuscript does not clearly distinguish between initiation and runout zones. Including runout areas in the sampling (especially for such large landslides) may introduce noise, as these areas can occur under very different topographic and lithological conditions than the source zones. This could reduce the model ability to identify “true” causal factors.
6. Resolution and representation of pre-failure conditions
The use of a 10 m DEM raises concerns for modelling relatively large landslides in such a way. Morphometric predictors derived from such data are likely influenced by post-failure topography rather than representing pre-failure conditions. This is particularly relevant in studies of large landslides, where the terrain has been substantially altered. As a result, the model may have limited predictive capability for identifying currently stable but susceptible areas, arguably the primary objective of susceptibility modelling. This limitation should be explicitly addressed and discussed in the manuscript.
Steger, S., Schmaltz, E., and Glade, T.: The (f)utility to account for pre-failure topography in data-driven landslide susceptibility modelling, Geomorphology, 354, 107041, https://doi.org/10.1016/j.geomorph.2020.107041, 2020.
Van Den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G., and Vandekerckhove, L.: Prediction of landslide susceptibility using rare events logistic regression: A case-study in the Flemish Ardennes (Belgium), Geomorphology, 76, 392–410, https://doi.org/10.1016/j.geomorph.2005.12.003, 2006.