the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Multi-scale spatial validation and probability calibration of pixel-based landslide susceptibility modeling in the northern Peruvian Andes
Abstract. Landslides are recurrent geohazards in Andean regions, causing significant impacts on infrastructure and local communities. In spatially structured terrains, model reliability hinges on the definition of pseudo-absence samples and the treatment of spatial dependence during validation. This study evaluates pixel-based rotational landslide susceptibility in the province of Huancabamba (Piura, northern Peru) using a Random Forest classifier and seven conditioning factors derived from a photogrammetric digital elevation model and lithological data at 10 m resolution.
The landslide inventory consists of 25 field-mapped rotational landslides compiled from geomorphological surveys and high-resolution photogrammetric products. Pseudo-absence samples were selected outside mapped polygons using a buffered exclusion zone to reduce label uncertainty, and a balanced sampling scheme (1:1) was adopted. To obtain spatially realistic performance estimates, model evaluation was conducted using spatial block cross-validation with block sizes ranging from 600 to 1500 m. This provides a clear view of how spatial partitioning affects discrimination and calibration, alongside the model's stability throughout the validation folds.
Results show that discrimination performance decreases systematically as spatial block size increases, indicating that conventional random validation may overestimate predictive capacity due to spatial autocorrelation. A block size of 900 m provided a compromise between spatial independence and fold stability. Permutation importance computed under spatially independent folds identified lithology and elevation as the dominant predictors of rotational landslide occurrence, followed by aspect and topographic wetness index. Calibration metrics (Brier score and Expected Calibration Error) indicated moderate but stable reliability of susceptibility scores across spatial configurations.
The resulting susceptibility map shows spatial patterns consistent with the geomorphological setting and the mapped inventory, with high susceptibility concentrated in steep slopes developed over weak lithological units. These findings indicate that integrating spatial validation, calibration, and constrained sampling improves the reliability of pixel-based modelling in this Andean setting.
- Preprint
(1192 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2026-1318', Anonymous Referee #1, 09 Apr 2026
-
AC1: 'Reply on RC1', Wendy Quiroz, 02 May 2026
(1) Sampling strategy and spatial autocorrelation
We thank the reviewer for this important and comprehensive comment, which highlights key issues related to pseudo-replication, implicit size-weighting, methodological alternatives, and the representation of landslide processes.
We agree that treating pixels within landslides as independent observations may inflate the effective sample size and introduce within-landslide spatial dependence. This is an inherent limitation of pixel-based susceptibility modelling, and we will explicitly acknowledge this in the revised manuscript. For example, we will include the following clarification:
“Pixel-based sampling assumes independence among observations, although pixels within the same landslide share spatially structured environmental conditions. This may lead to pseudo-replication and should be considered when interpreting model performance.”
To mitigate this issue, we implemented complementary validation strategies that address different aspects of spatial dependence. Spatial block cross-validation introduces geographical separation between folds, reducing local spatial autocorrelation, while Leave-One-Landslide-Out cross-validation (LOLO-CV) evaluates model generalisation across landslides by excluding all pixels from each landslide during testing. While LOLO-CV ensures independence at the landslide level (i.e., no pixels from the test landslide are used in training), spatial dependence between nearby landslides may still influence model performance. In this sense, both validation strategies provide complementary perspectives rather than a single definitive estimate of model performance, which will be discussed in the revised manuscript.
The LOLO-CV results (mean AUC = 0.861 ± 0.147) indicate that the model is able to generalise to entirely unseen landslides. While predictive performance varies across cases, this variability is interpreted as reflecting geomorphological heterogeneity rather than methodological bias. The comparable magnitude of LOLO-CV and other validation results suggests that pseudo-replication is unlikely to fully explain for model performance.
To further assess the impact of sampling strategy, we conducted sensitivity analyses. An equitable sampling scheme (fixed number of pixels per landslide) resulted in reduced performance (AUC ≈ 0.71), indicating that model performance decreases when implicit size-weighting is controlled. This suggests that polygon-based sampling introduces a moderate bias, although it does not fully explain model performance.
We also implemented an upper-slope sampling strategy (top 30% of elevation values within each landslide polygon) as a simple elevation-based proxy to reduce the influence of lower-slope and depositional areas. This threshold is heuristic and was used only for sensitivity analysis, not as a primary modelling assumption or as a geomorphological delineation of initiation zones. The results (AUC ≈ 0.80) indicate differences in the geomorphological signal captured by upper-slope versus full-polygon sampling, highlighting the trade-off between process specificity and spatial representativeness.
In relation to landslide size, we evaluated the relationship between landslide area and predictive performance, finding a moderate negative correlation (Spearman’s ρ = −0.44, p = 0.03), which suggests that larger landslides tend to be more difficult to predict, rather than disproportionately influencing model accuracy.
We also acknowledge the reviewer’s suggestion regarding alternative modelling frameworks. Mixed-effects models provide a principled way to represent hierarchical structures (e.g., pixels nested within landslides). However, their application in this study is limited by the relatively small number of landslides (n = 25), which constrains robust estimation of group-level effects. We will clarify this limitation in the revised manuscript. Similarly, we acknowledge recent developments in spatially explicit hyperparameter tuning (e.g., Schratz et al., 2019). In this study, we adopted a fixed model configuration to avoid introducing additional sources of spatial dependence during the tuning process.
Regarding the use of a single representative point per landslide, we recognise that such approaches may better capture initiation conditions. However, they also reduce the spatial information contained within landslides. In this study, we prioritised capturing intra-landslide spatial variability, while recognising this trade-off, which we will clarify in the revised manuscript.
Finally, we agree that including both initiation and runout areas may dilute the relationship between predictors and landslide initiation processes. This may partially explain the relatively lower importance of morphometric variables such as slope compared to lithology and elevation. We will explicitly discuss this implication in the revised manuscript.
Overall, we will revise the manuscript to provide a clearer and more balanced discussion of sampling strategy, spatial dependence, and their implications for model interpretation.
(2) Engagement with spatial cross-validation literature
We thank the reviewer for this important comment, which highlights the need for a more explicit and balanced engagement with the spatial cross-validation literature.
We agree that the original manuscript did not sufficiently contextualise the methodological choices within the broader literature. In the revised version, we will expand both the Introduction and Discussion to explicitly incorporate and interpret the contributions of the cited studies.
Specifically, we will use Brenning (2005) to frame the issue of optimistic bias associated with random cross-validation in spatial prediction models, particularly in landslide susceptibility contexts. Schratz et al. (2019) will be incorporated to highlight the importance of spatially explicit evaluation strategies and to acknowledge that spatial dependence should be considered not only during validation but also during model tuning. Schlögl et al. (2025) will be discussed in relation to the need for more comprehensive and multi-perspective evaluation frameworks, emphasising that model performance should not be interpreted based on a single metric or validation scheme.
In addition, we will also explicitly engage with the critical perspective of Wadoux et al. (2021), which argues that spatial cross-validation is not universally appropriate for assessing map accuracy. In particular, Wadoux et al. highlight that spatial cross-validation may not provide unbiased estimates of prediction error when the objective is to assess absolute map accuracy. In response, we will clarify that the objective of this study is not to estimate absolute predictive accuracy in a strict statistical sense, but to evaluate model generalisation under spatial dependence. Under this objective, spatially structured validation remains appropriate, although its limitations must be acknowledged.
To reflect this, we will include a clarification such as:
“Spatial cross-validation provides a more realistic estimate of model generalisation in the presence of spatial dependence, although it does not fully eliminate spatial autocorrelation effects and its suitability depends on the modelling objective (Wadoux et al., 2021).”
We will further expand the discussion to acknowledge that spatial cross-validation does not fully remove spatial dependence, may be sensitive to partitioning design, and should be interpreted as part of a broader evaluation framework rather than as a definitive measure of model performance.
Finally, we will clarify the contribution of this study within this context. While the use of Random Forest and spatial cross-validation is not novel in itself, the contribution of this work lies in the systematic comparison of validation strategies and their impact on model interpretation in a small, spatially constrained landslide dataset.
These revisions will provide a more transparent, balanced, and literature-informed methodological framing, and align the methodological discussion with the study objective of evaluating spatial generalisation under spatial dependence.
(3) Justification of spatial partitioning strategy
We thank the reviewer for this important comment and agree that the rationale for spatial partitioning required clearer justification and discussion.
In the revised manuscript, we will explicitly clarify the criteria used to define the spatial block sizes (600–1500 m). These were selected based on three complementary considerations: (i) the spatial dimensions of mapped landslides (approximately 200–900 m), ensuring that block sizes are of comparable or larger scale than individual landslide features; (ii) the need to reduce spatial dependence between training and validation data by increasing geographical separation; and (iii) the use of a progressive range of block sizes to evaluate the sensitivity of model performance to spatial partitioning scale in line with the objective of assessing spatial generalisation under increasing levels of spatial independence.
To make this explicit, we will include a clarification such as:
“Block sizes were selected to be comparable to or larger than the spatial extent of mapped landslides and were varied systematically (600–1500 m) to assess the sensitivity of model performance to increasing spatial separation between training and validation data.”
We acknowledge that individual landslides may span multiple spatial blocks, which can result in different portions of the same landslide being included in both training and validation datasets under spatial block cross-validation. This may introduce partial spatial dependence and potentially optimistic bias in performance estimates, as spatially proximate observations may still share similar predictor values.
To address this limitation, we used Leave-One-Landslide-Out cross-validation (LOLO-CV) as a complementary validation strategy. LOLO-CV ensures independence at the landslide level (i.e., no pixels from the test landslide are used in training), although spatial dependence between nearby landslides may still influence model performance. Compared to spatial block cross-validation, LOLO-CV provides a more conservative assessment of model generalisation across landslides, and should therefore be interpreted as a stricter, complementary evaluation rather than a replacement for spatial block-based approaches.
The consistency of model performance across validation schemes (random CV ≈ 0.83–0.86; spatial block CV ≈ 0.83; LOLO-CV ≈ 0.86) suggests broadly consistent behaviour across validation configurations, although differences between methods highlight the influence of spatial dependence on performance estimates, and reinforce that these approaches provide complementary perspectives on model performance rather than a single definitive value.
We also acknowledge the reviewer’s suggestion regarding alternative spatial partitioning strategies, including clustering-based approaches, geomorphological units (e.g., catchments), and lithological stratification. While these approaches can provide meaningful partitions, their application in this study is constrained by the relatively small study area and limited number of landslides (n = 25), which would likely result in highly unbalanced folds or reduced statistical robustness, thereby limiting their practical applicability in this specific context. We will clarify this limitation in the revised manuscript.
These revisions will provide a clearer and more transparent justification of the spatial partitioning strategy and its implications for model evaluation, and align the validation framework with the study objective of evaluating spatial generalisation under spatial dependence.
(4) Spatial leakage
We thank the reviewer for pointing out the need for a clearer and more explicit definition of the term “spatial leakage.”
In the revised manuscript, we will define spatial leakage at first use as follows:
“Spatial leakage refers to the unintended transfer of information between training and validation datasets due to spatial proximity and autocorrelation. When geographically close observations share similar predictor values, the validation data are no longer fully independent from the training data, which can lead to overly optimistic estimates of model performance.”
We will further clarify that spatial leakage can occur at two levels relevant to this study: (i) within-landslide leakage, when pixels belonging to the same landslide are split between training and validation datasets, and (ii) between-landslide leakage, when neighbouring landslides share similar environmental conditions due to spatial proximity, leading to residual dependence even when datasets are spatially separated.
We will also clarify how the validation strategies used in this study address these effects. Spatial block cross-validation reduces local spatial dependence by increasing the geographical separation between folds, while Leave-One-Landslide-Out cross-validation (LOLO-CV) eliminates within-landslide leakage by ensuring that all pixels from a given landslide are excluded from training during validation. However, neither approach fully removes spatial dependence between nearby landslides, and this limitation will be explicitly acknowledged, consistent with the study objective of evaluating model generalisation under spatial dependence rather than absolute independence.
These revisions will provide a clearer and more precise definition of spatial leakage and its implications for model evaluation, and ensure consistency with the broader validation framework described above.
(5) Process-based considerations
We thank the reviewer for this important comment, which highlights key limitations related to process representation and the interpretation of data-driven susceptibility models.
We agree that deep-seated rotational landslides are strongly influenced by subsurface conditions (e.g., material properties, hydrology, and weak layers) that are not directly captured by surface-derived predictors. This represents an inherent limitation of the modelling approach. In the revised manuscript, we will explicitly acknowledge this and include a clarification such as:
“The predictors used in this study represent surface expressions of geomorphological and geological conditions and should be interpreted as indirect proxies for deeper processes. As such, the model captures relative spatial patterns of susceptibility rather than fully representing the underlying physical mechanisms controlling landslide initiation.”
We also agree that including both initiation and runout areas in the sampling may introduce additional variability and potentially mix distinct geomorphological signals. Runout zones can occur under different topographic and lithological conditions compared to source areas, which may introduce noise into the predictor–response relationships and reduce the model’s ability to isolate causal factors, particularly under pixel-based sampling frameworks where spatial heterogeneity is explicitly represented.
This effect may partially explain the relatively lower importance of morphometric predictors such as slope, as these variables differ significantly between initiation and depositional zones. We will explicitly discuss this implication in the revised manuscript, in connection with the observed dominance of broader-scale predictors such as lithology and elevation.
To explore this issue, we conducted a sensitivity analysis using an upper-slope sampling strategy (top 30% of elevation values within each landslide polygon) as a simple proxy to reduce the influence of lower-slope and depositional areas. This resulted in AUC ≈ 0.80. However, this approach is heuristic and does not represent a geomorphologically mapped initiation zone. The results therefore indicate differences in the geomorphological signal captured by upper-slope versus full-polygon sampling, but should not be interpreted as direct evidence of process-based relationships, and are instead intended as a sensitivity test within the broader validation framework described above.
These findings highlight a trade-off between process specificity (sampling focused on upper-slope areas) and spatial representativeness (full-polygon sampling), which we will explicitly discuss in the revised manuscript, and which reflects a broader methodological trade-off inherent to pixel-based susceptibility modelling under spatial dependence.
Finally, we will clarify that polygon-based sampling represents a simplification of landslide processes and that future work would benefit from more detailed inventories that explicitly distinguish between initiation and runout zones, as well as from the integration of subsurface information where available, particularly in studies aiming to improve process-based interpretability.
(6) Resolution and pre-failure conditions
We thank the reviewer for this important comment, which highlights a key limitation in data-driven landslide susceptibility modelling.
We agree that the use of a 10 m DEM representing current topography introduces important constraints, particularly for relatively large and deep-seated landslides. Morphometric predictors derived from such data may reflect post-failure terrain rather than pre-failure conditions, as discussed by Steger et al. (2020). We will explicitly acknowledge this in the revised manuscript and include a clarification such as:
“The 10 m DEM may smooth local landslide morphology (e.g., scarps, toes, and internal structures), and morphometric variables derived from it may partially reflect post-failure terrain. This may introduce circularity, whereby the model captures characteristics of already-failed terrain rather than predisposing conditions, which is consistent with the interpretation of the model as identifying relative spatial patterns rather than strictly causal relationships.”
We also acknowledge that this limitation may affect the predictive capability of the model, particularly when extrapolating to currently stable areas. If morphometric predictors are influenced by post-failure topography, the model may have reduced ability to identify susceptible areas that have not yet undergone failure, especially when model evaluation is interpreted in terms of spatial generalisation rather than absolute predictive accuracy.
This limitation will be explicitly discussed, and we will emphasise that:
“These results should be interpreted as representing relative spatial patterns of susceptibility under present-day conditions, rather than as a direct reconstruction of pre-failure terrain or a purely process-based prediction.”
We will also incorporate Van Den Eeckhaut et al. (2006) to further support the discussion of limitations associated with terrain-based predictors in susceptibility modelling.
Additionally, this issue may influence the relative importance of predictors. For example, morphometric variables such as slope or curvature may be affected by post-failure modification, while lithology and elevation reflect broader and more stable controls. This may partially explain the observed dominance of lithology and elevation in the model, as these variables are less sensitive to local post-failure terrain alteration.
Finally, we will clarify that susceptibility maps derived from present-day DEMs should be interpreted with these limitations in mind, and that future work could benefit from incorporating pre-failure terrain reconstruction, higher-resolution data, or process-based constraints where available, particularly in studies aiming to improve predictive performance in currently stable areas.
Citation: https://doi.org/10.5194/egusphere-2026-1318-AC1
-
AC1: 'Reply on RC1', Wendy Quiroz, 02 May 2026
-
RC2: 'Comment on egusphere-2026-1318', Anonymous Referee #2, 19 Apr 2026
The manuscript should be rejected as its overall scientific quality is very low and does not meet the standard required for publication. Although the topic of spatial validation in landslide susceptibility modelling is potentially relevant, the study is fundamentally limited by extremely poor data quality and insufficient scientific depth. Most critically, the landslide inventory contains only 25 events within a very small study area (~32 km²), which is far from adequate to support a pixel-based machine learning model or any statistically meaningful validation. The modelling framework therefore lacks representativeness, robustness, and generalizability. The so-called “multi-scale spatial validation” is essentially a technical exercise with limited novelty, as similar block cross-validation approaches have already been widely discussed in the literature, and the manuscript fails to provide any substantive methodological advancement. In addition, the use of balanced pseudo-absence sampling further undermines the physical interpretability of the results, yet this issue is not rigorously addressed. The discussion section is largely descriptive and repetitive, lacking critical analysis, mechanism interpretation, or connection to broader hazard processes. No new insights into landslide processes, hazard mechanisms, or practical applications are provided. Figures and results mainly confirm well-known patterns (e.g., lithology and elevation dominance), offering little scientific contribution. Overall, the study is a routine application of an existing method on a very limited dataset, with weak innovation, insufficient data support, and poor scientific significance. Therefore, rejection is recommended.
Citation: https://doi.org/10.5194/egusphere-2026-1318-RC2 -
AC2: 'Reply on RC2', Wendy Quiroz, 02 May 2026
We thank the reviewer for their assessment and for highlighting concerns regarding data limitations, methodological novelty, and the overall scientific contribution of the manuscript.
We acknowledge that the study is based on a relatively small landslide inventory (n = 25) within a spatially constrained area (~32 km²). This is an inherent limitation of the study design, which we now explicitly recognise and discuss in the revised manuscript. Rather than aiming for broad generalisation, the study is framed as a detailed evaluation of modelling behaviour and validation strategies under conditions of limited data availability, which are common in many landslide-prone regions.
Regarding the use of pixel-based modelling and the associated sample size, we clarify that the objective is not to infer independent statistical observations in a strict sense, but to evaluate spatial patterns and model generalisation under spatial dependence. To address concerns related to pseudo-replication and spatial structure, we implemented complementary validation strategies (spatial block cross-validation and Leave-One-Landslide-Out cross-validation), and we explicitly discuss their implications and limitations in the revised manuscript.
We also acknowledge that the methodological components employed (e.g., Random Forest, spatial cross-validation) are not novel in isolation. The contribution of this study lies in the systematic comparison of validation strategies and their influence on model interpretation, particularly in a small and spatially structured dataset. We have clarified this positioning in the revised manuscript and strengthened the discussion to better situate our work within the existing literature.
Concerning pseudo-absence sampling, we recognise its limitations in terms of physical interpretability and now explicitly discuss the implications of balanced sampling strategies, including their effect on model behaviour and interpretation.
We further revised the Discussion section to reduce descriptive elements and improve the interpretation of results, including a clearer connection between model outputs, geomorphological context, and broader limitations of data-driven susceptibility modelling.
Finally, we highlight that the main contribution of the study lies in the systematic analysis of how validation design and spatial dependence influence the performance and interpretation of susceptibility models, particularly in contexts with limited data availability.
We believe that these revisions substantially improve the clarity, transparency, and scientific positioning of the manuscript, and we respectfully submit that the study provides a meaningful contribution within its defined scope.
Citation: https://doi.org/10.5194/egusphere-2026-1318-AC2
-
AC2: 'Reply on RC2', Wendy Quiroz, 02 May 2026
Status: closed
-
RC1: 'Comment on egusphere-2026-1318', Anonymous Referee #1, 09 Apr 2026
The paper assesses landslide susceptibility based on 25 rotational landslides in a relatively small study area (~32 km²) in northern Peru using a Random Forest model, with a focus on pseudo-absence selection and spatial validation. By applying and interpreting spatial block cross-validation, the author infers that conventional validation likely overestimates model performance due to spatial autocorrelation. Based on permutation-based feature importance, lithology and elevation were identified as the main drivers of slope instability. The study concludes that model performance strongly depends on the chosen validation approach, with larger validation blocks leading to lower but more realistic performance estimates (LINE 622f) “highlighting the influence of spatial autocorrelation on apparent predictive capacity”. The author finally concludes that (Line 644f) “The modelling approach developed in this study is particularly suited to rotational landslides under similar geomorphological conditions. Extension to other mass movement types would require process-tailored sampling strategies and predictor selection.”
While the manuscript addresses relevant aspects of data-driven landslide susceptibility modelling, I am not fully convinced that it is ready for publication. The scope is relatively narrow, focusing on a small number of landslides (n = 25) within a limited spatial extent. In addition, several methodological decisions remain insufficiently justified or unclear. Most importantly, the study offers limited conceptual or methodological advancement, as Random Forest-based susceptibility modelling and spatial validation strategies are already established in the literature. Overall, I consider the manuscript to be at the borderline between rejection and major revision. I formally recommend major revisions, although addressing the key concerns outlined below may require substantial changes that could alter the scope and structure of the study.
Details:
1. Sampling strategy and spatial autocorrelation
The effective sample size appears to be artificially inflated (approximately 30,000 observations within ~32 km²). From the text I understand that multiple pixels from the same landslide were treated as independent presence observations. If so, this may introduce another dimension of spatial dependence and potentially undermines the central aim of this study of addressing spatial autocorrelation. Additionally, this approach implicitly assigns (undesired?) greater weight to larger landslides, as they contribute more pixels to the model (even though magnitude/size is not part of conventional landslide susceptibility modelling).
Alternative strategies could be considered. For example, mixed-effects modelling frameworks allow explicit treatment of hierarchical structures (e.g., pixels nested within landslides via a landslide ID as a random effect). More broadly, if spatial dependence is a core topic in this study, the analysis could go beyond spatial block validation alone. There are even approaches such as spatially explicit hyperparameter tuning that tackle the topic not only during validation.
Schratz, P., Muenchow, J., Iturritxa, E., Richter, J., and Brenning, A.: Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data, Ecological Modelling, 406, 109–120, https://doi.org/10.1016/j.ecolmodel.2019.06.002, 2019.
Schlögl, M., Spiekermann, R., and Steger, S.: Towards a holistic assessment of landslide susceptibility models: insights from the Central Eastern Alps, Environ Earth Sci, 84, 113, https://doi.org/10.1007/s12665-024-12041-y, 2025.
Moreover, several studies in landslide susceptibility modelling advocate using a single representative point per landslide, preferably within the initiation zone, to better capture causal landslide conditions. Based on Figure 1, it appears that both scarp and runout areas were mapped and therefore sampled. Including runout zones may dilute the relationship between predictors and landslide initiation, potentially explaining why slope or other predictor appears unimportant while lithology and elevation dominate. These implications should be more thoroughly considered.
2. Missing engagement with spatial cross-validation literature
The manuscript does not sufficiently engage with other literature on spatial cross-validation. Several contributions (e.g., Brenning; Schratz et al.; Schlögl et al.) as well as critical perspectives (e.g., Wadoux et al.) are not adequately discussed. Incorporating these studies would help contextualize the methodological choices and clarify the degree of novelty of this work. It would also allow for a more balanced discussion of when spatial cross-validation is appropriate and what its limitations are.
Wadoux, A. M. J.-C., Heuvelink, G. B. M., de Bruin, S., and Brus, D. J.: Spatial cross-validation is not the right way to evaluate map accuracy, Ecological Modelling, 457, 109692, https://doi.org/10.1016/j.ecolmodel.2021.109692, 2021.
Brenning, A.: Spatial prediction models for landslide hazards: review, comparison and evaluation, Natural Hazards and Earth System Science, 5, 853–862, 2005.
Schratz, P., Muenchow, J., Iturritxa, E., Richter, J., and Brenning, A.: Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data, Ecological Modelling, 406, 109–120, https://doi.org/10.1016/j.ecolmodel.2019.06.002, 2019.
Schlögl, M., Spiekermann, R., and Steger, S.: Towards a holistic assessment of landslide susceptibility models: insights from the Central Eastern Alps, Environ Earth Sci, 84, 113, https://doi.org/10.1007/s12665-024-12041-y, 2025.
3. Justification of spatial partitioning strategy
The rationale behind the selected spatial block sizes (600–1500 m) remains unclear and should be more explicitly justified. In particular, it is not evident how these scales relate to the spatial characteristics of the mapped landslides or the underlying processes. Additionally, it may be important to clarify whether individual landslides can span multiple spatial blocks. If so, there is a possibility that different parts of the same landslide are included in both the training and validation datasets. This would compromise the independence between training and test data. The manuscript should explicitly address this issue and, if relevant, describe how such cases were handled.
Alternative spatial partitioning strategies are neither tested nor discussed, such as:
• clustering-based approaches (e.g., k-means),
• geomorphologically meaningful units (e.g., catchments),
• lithological or landscape-based stratification.
Acknowledging or comparing such alternatives would strengthen the methodological robustness.
4. “Spatial leakage”
The term “spatial leakage” (Line 79) is introduced without a clear definition. While it is often used to describe unintended information transfer between training and validation datasets (see also comments before) due to spatial proximity, the manuscript should explicitly define how the term is used in this study.
5. Process-based considerations
The study area is small but contains relatively large, predominantly rotational landslides (mean size ~16.7 ha), which likely refer to deep-seated processes. In such cases, subsurface conditions (e.g., material properties, hydrology, weak layers) are often key controls but are not well captured by typical surface predictors used in this study. Furthermore, as also outlined above, the manuscript does not clearly distinguish between initiation and runout zones. Including runout areas in the sampling (especially for such large landslides) may introduce noise, as these areas can occur under very different topographic and lithological conditions than the source zones. This could reduce the model ability to identify “true” causal factors.
6. Resolution and representation of pre-failure conditions
The use of a 10 m DEM raises concerns for modelling relatively large landslides in such a way. Morphometric predictors derived from such data are likely influenced by post-failure topography rather than representing pre-failure conditions. This is particularly relevant in studies of large landslides, where the terrain has been substantially altered. As a result, the model may have limited predictive capability for identifying currently stable but susceptible areas, arguably the primary objective of susceptibility modelling. This limitation should be explicitly addressed and discussed in the manuscript.
Steger, S., Schmaltz, E., and Glade, T.: The (f)utility to account for pre-failure topography in data-driven landslide susceptibility modelling, Geomorphology, 354, 107041, https://doi.org/10.1016/j.geomorph.2020.107041, 2020.
Van Den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G., and Vandekerckhove, L.: Prediction of landslide susceptibility using rare events logistic regression: A case-study in the Flemish Ardennes (Belgium), Geomorphology, 76, 392–410, https://doi.org/10.1016/j.geomorph.2005.12.003, 2006.Citation: https://doi.org/10.5194/egusphere-2026-1318-RC1 -
AC1: 'Reply on RC1', Wendy Quiroz, 02 May 2026
(1) Sampling strategy and spatial autocorrelation
We thank the reviewer for this important and comprehensive comment, which highlights key issues related to pseudo-replication, implicit size-weighting, methodological alternatives, and the representation of landslide processes.
We agree that treating pixels within landslides as independent observations may inflate the effective sample size and introduce within-landslide spatial dependence. This is an inherent limitation of pixel-based susceptibility modelling, and we will explicitly acknowledge this in the revised manuscript. For example, we will include the following clarification:
“Pixel-based sampling assumes independence among observations, although pixels within the same landslide share spatially structured environmental conditions. This may lead to pseudo-replication and should be considered when interpreting model performance.”
To mitigate this issue, we implemented complementary validation strategies that address different aspects of spatial dependence. Spatial block cross-validation introduces geographical separation between folds, reducing local spatial autocorrelation, while Leave-One-Landslide-Out cross-validation (LOLO-CV) evaluates model generalisation across landslides by excluding all pixels from each landslide during testing. While LOLO-CV ensures independence at the landslide level (i.e., no pixels from the test landslide are used in training), spatial dependence between nearby landslides may still influence model performance. In this sense, both validation strategies provide complementary perspectives rather than a single definitive estimate of model performance, which will be discussed in the revised manuscript.
The LOLO-CV results (mean AUC = 0.861 ± 0.147) indicate that the model is able to generalise to entirely unseen landslides. While predictive performance varies across cases, this variability is interpreted as reflecting geomorphological heterogeneity rather than methodological bias. The comparable magnitude of LOLO-CV and other validation results suggests that pseudo-replication is unlikely to fully explain for model performance.
To further assess the impact of sampling strategy, we conducted sensitivity analyses. An equitable sampling scheme (fixed number of pixels per landslide) resulted in reduced performance (AUC ≈ 0.71), indicating that model performance decreases when implicit size-weighting is controlled. This suggests that polygon-based sampling introduces a moderate bias, although it does not fully explain model performance.
We also implemented an upper-slope sampling strategy (top 30% of elevation values within each landslide polygon) as a simple elevation-based proxy to reduce the influence of lower-slope and depositional areas. This threshold is heuristic and was used only for sensitivity analysis, not as a primary modelling assumption or as a geomorphological delineation of initiation zones. The results (AUC ≈ 0.80) indicate differences in the geomorphological signal captured by upper-slope versus full-polygon sampling, highlighting the trade-off between process specificity and spatial representativeness.
In relation to landslide size, we evaluated the relationship between landslide area and predictive performance, finding a moderate negative correlation (Spearman’s ρ = −0.44, p = 0.03), which suggests that larger landslides tend to be more difficult to predict, rather than disproportionately influencing model accuracy.
We also acknowledge the reviewer’s suggestion regarding alternative modelling frameworks. Mixed-effects models provide a principled way to represent hierarchical structures (e.g., pixels nested within landslides). However, their application in this study is limited by the relatively small number of landslides (n = 25), which constrains robust estimation of group-level effects. We will clarify this limitation in the revised manuscript. Similarly, we acknowledge recent developments in spatially explicit hyperparameter tuning (e.g., Schratz et al., 2019). In this study, we adopted a fixed model configuration to avoid introducing additional sources of spatial dependence during the tuning process.
Regarding the use of a single representative point per landslide, we recognise that such approaches may better capture initiation conditions. However, they also reduce the spatial information contained within landslides. In this study, we prioritised capturing intra-landslide spatial variability, while recognising this trade-off, which we will clarify in the revised manuscript.
Finally, we agree that including both initiation and runout areas may dilute the relationship between predictors and landslide initiation processes. This may partially explain the relatively lower importance of morphometric variables such as slope compared to lithology and elevation. We will explicitly discuss this implication in the revised manuscript.
Overall, we will revise the manuscript to provide a clearer and more balanced discussion of sampling strategy, spatial dependence, and their implications for model interpretation.
(2) Engagement with spatial cross-validation literature
We thank the reviewer for this important comment, which highlights the need for a more explicit and balanced engagement with the spatial cross-validation literature.
We agree that the original manuscript did not sufficiently contextualise the methodological choices within the broader literature. In the revised version, we will expand both the Introduction and Discussion to explicitly incorporate and interpret the contributions of the cited studies.
Specifically, we will use Brenning (2005) to frame the issue of optimistic bias associated with random cross-validation in spatial prediction models, particularly in landslide susceptibility contexts. Schratz et al. (2019) will be incorporated to highlight the importance of spatially explicit evaluation strategies and to acknowledge that spatial dependence should be considered not only during validation but also during model tuning. Schlögl et al. (2025) will be discussed in relation to the need for more comprehensive and multi-perspective evaluation frameworks, emphasising that model performance should not be interpreted based on a single metric or validation scheme.
In addition, we will also explicitly engage with the critical perspective of Wadoux et al. (2021), which argues that spatial cross-validation is not universally appropriate for assessing map accuracy. In particular, Wadoux et al. highlight that spatial cross-validation may not provide unbiased estimates of prediction error when the objective is to assess absolute map accuracy. In response, we will clarify that the objective of this study is not to estimate absolute predictive accuracy in a strict statistical sense, but to evaluate model generalisation under spatial dependence. Under this objective, spatially structured validation remains appropriate, although its limitations must be acknowledged.
To reflect this, we will include a clarification such as:
“Spatial cross-validation provides a more realistic estimate of model generalisation in the presence of spatial dependence, although it does not fully eliminate spatial autocorrelation effects and its suitability depends on the modelling objective (Wadoux et al., 2021).”
We will further expand the discussion to acknowledge that spatial cross-validation does not fully remove spatial dependence, may be sensitive to partitioning design, and should be interpreted as part of a broader evaluation framework rather than as a definitive measure of model performance.
Finally, we will clarify the contribution of this study within this context. While the use of Random Forest and spatial cross-validation is not novel in itself, the contribution of this work lies in the systematic comparison of validation strategies and their impact on model interpretation in a small, spatially constrained landslide dataset.
These revisions will provide a more transparent, balanced, and literature-informed methodological framing, and align the methodological discussion with the study objective of evaluating spatial generalisation under spatial dependence.
(3) Justification of spatial partitioning strategy
We thank the reviewer for this important comment and agree that the rationale for spatial partitioning required clearer justification and discussion.
In the revised manuscript, we will explicitly clarify the criteria used to define the spatial block sizes (600–1500 m). These were selected based on three complementary considerations: (i) the spatial dimensions of mapped landslides (approximately 200–900 m), ensuring that block sizes are of comparable or larger scale than individual landslide features; (ii) the need to reduce spatial dependence between training and validation data by increasing geographical separation; and (iii) the use of a progressive range of block sizes to evaluate the sensitivity of model performance to spatial partitioning scale in line with the objective of assessing spatial generalisation under increasing levels of spatial independence.
To make this explicit, we will include a clarification such as:
“Block sizes were selected to be comparable to or larger than the spatial extent of mapped landslides and were varied systematically (600–1500 m) to assess the sensitivity of model performance to increasing spatial separation between training and validation data.”
We acknowledge that individual landslides may span multiple spatial blocks, which can result in different portions of the same landslide being included in both training and validation datasets under spatial block cross-validation. This may introduce partial spatial dependence and potentially optimistic bias in performance estimates, as spatially proximate observations may still share similar predictor values.
To address this limitation, we used Leave-One-Landslide-Out cross-validation (LOLO-CV) as a complementary validation strategy. LOLO-CV ensures independence at the landslide level (i.e., no pixels from the test landslide are used in training), although spatial dependence between nearby landslides may still influence model performance. Compared to spatial block cross-validation, LOLO-CV provides a more conservative assessment of model generalisation across landslides, and should therefore be interpreted as a stricter, complementary evaluation rather than a replacement for spatial block-based approaches.
The consistency of model performance across validation schemes (random CV ≈ 0.83–0.86; spatial block CV ≈ 0.83; LOLO-CV ≈ 0.86) suggests broadly consistent behaviour across validation configurations, although differences between methods highlight the influence of spatial dependence on performance estimates, and reinforce that these approaches provide complementary perspectives on model performance rather than a single definitive value.
We also acknowledge the reviewer’s suggestion regarding alternative spatial partitioning strategies, including clustering-based approaches, geomorphological units (e.g., catchments), and lithological stratification. While these approaches can provide meaningful partitions, their application in this study is constrained by the relatively small study area and limited number of landslides (n = 25), which would likely result in highly unbalanced folds or reduced statistical robustness, thereby limiting their practical applicability in this specific context. We will clarify this limitation in the revised manuscript.
These revisions will provide a clearer and more transparent justification of the spatial partitioning strategy and its implications for model evaluation, and align the validation framework with the study objective of evaluating spatial generalisation under spatial dependence.
(4) Spatial leakage
We thank the reviewer for pointing out the need for a clearer and more explicit definition of the term “spatial leakage.”
In the revised manuscript, we will define spatial leakage at first use as follows:
“Spatial leakage refers to the unintended transfer of information between training and validation datasets due to spatial proximity and autocorrelation. When geographically close observations share similar predictor values, the validation data are no longer fully independent from the training data, which can lead to overly optimistic estimates of model performance.”
We will further clarify that spatial leakage can occur at two levels relevant to this study: (i) within-landslide leakage, when pixels belonging to the same landslide are split between training and validation datasets, and (ii) between-landslide leakage, when neighbouring landslides share similar environmental conditions due to spatial proximity, leading to residual dependence even when datasets are spatially separated.
We will also clarify how the validation strategies used in this study address these effects. Spatial block cross-validation reduces local spatial dependence by increasing the geographical separation between folds, while Leave-One-Landslide-Out cross-validation (LOLO-CV) eliminates within-landslide leakage by ensuring that all pixels from a given landslide are excluded from training during validation. However, neither approach fully removes spatial dependence between nearby landslides, and this limitation will be explicitly acknowledged, consistent with the study objective of evaluating model generalisation under spatial dependence rather than absolute independence.
These revisions will provide a clearer and more precise definition of spatial leakage and its implications for model evaluation, and ensure consistency with the broader validation framework described above.
(5) Process-based considerations
We thank the reviewer for this important comment, which highlights key limitations related to process representation and the interpretation of data-driven susceptibility models.
We agree that deep-seated rotational landslides are strongly influenced by subsurface conditions (e.g., material properties, hydrology, and weak layers) that are not directly captured by surface-derived predictors. This represents an inherent limitation of the modelling approach. In the revised manuscript, we will explicitly acknowledge this and include a clarification such as:
“The predictors used in this study represent surface expressions of geomorphological and geological conditions and should be interpreted as indirect proxies for deeper processes. As such, the model captures relative spatial patterns of susceptibility rather than fully representing the underlying physical mechanisms controlling landslide initiation.”
We also agree that including both initiation and runout areas in the sampling may introduce additional variability and potentially mix distinct geomorphological signals. Runout zones can occur under different topographic and lithological conditions compared to source areas, which may introduce noise into the predictor–response relationships and reduce the model’s ability to isolate causal factors, particularly under pixel-based sampling frameworks where spatial heterogeneity is explicitly represented.
This effect may partially explain the relatively lower importance of morphometric predictors such as slope, as these variables differ significantly between initiation and depositional zones. We will explicitly discuss this implication in the revised manuscript, in connection with the observed dominance of broader-scale predictors such as lithology and elevation.
To explore this issue, we conducted a sensitivity analysis using an upper-slope sampling strategy (top 30% of elevation values within each landslide polygon) as a simple proxy to reduce the influence of lower-slope and depositional areas. This resulted in AUC ≈ 0.80. However, this approach is heuristic and does not represent a geomorphologically mapped initiation zone. The results therefore indicate differences in the geomorphological signal captured by upper-slope versus full-polygon sampling, but should not be interpreted as direct evidence of process-based relationships, and are instead intended as a sensitivity test within the broader validation framework described above.
These findings highlight a trade-off between process specificity (sampling focused on upper-slope areas) and spatial representativeness (full-polygon sampling), which we will explicitly discuss in the revised manuscript, and which reflects a broader methodological trade-off inherent to pixel-based susceptibility modelling under spatial dependence.
Finally, we will clarify that polygon-based sampling represents a simplification of landslide processes and that future work would benefit from more detailed inventories that explicitly distinguish between initiation and runout zones, as well as from the integration of subsurface information where available, particularly in studies aiming to improve process-based interpretability.
(6) Resolution and pre-failure conditions
We thank the reviewer for this important comment, which highlights a key limitation in data-driven landslide susceptibility modelling.
We agree that the use of a 10 m DEM representing current topography introduces important constraints, particularly for relatively large and deep-seated landslides. Morphometric predictors derived from such data may reflect post-failure terrain rather than pre-failure conditions, as discussed by Steger et al. (2020). We will explicitly acknowledge this in the revised manuscript and include a clarification such as:
“The 10 m DEM may smooth local landslide morphology (e.g., scarps, toes, and internal structures), and morphometric variables derived from it may partially reflect post-failure terrain. This may introduce circularity, whereby the model captures characteristics of already-failed terrain rather than predisposing conditions, which is consistent with the interpretation of the model as identifying relative spatial patterns rather than strictly causal relationships.”
We also acknowledge that this limitation may affect the predictive capability of the model, particularly when extrapolating to currently stable areas. If morphometric predictors are influenced by post-failure topography, the model may have reduced ability to identify susceptible areas that have not yet undergone failure, especially when model evaluation is interpreted in terms of spatial generalisation rather than absolute predictive accuracy.
This limitation will be explicitly discussed, and we will emphasise that:
“These results should be interpreted as representing relative spatial patterns of susceptibility under present-day conditions, rather than as a direct reconstruction of pre-failure terrain or a purely process-based prediction.”
We will also incorporate Van Den Eeckhaut et al. (2006) to further support the discussion of limitations associated with terrain-based predictors in susceptibility modelling.
Additionally, this issue may influence the relative importance of predictors. For example, morphometric variables such as slope or curvature may be affected by post-failure modification, while lithology and elevation reflect broader and more stable controls. This may partially explain the observed dominance of lithology and elevation in the model, as these variables are less sensitive to local post-failure terrain alteration.
Finally, we will clarify that susceptibility maps derived from present-day DEMs should be interpreted with these limitations in mind, and that future work could benefit from incorporating pre-failure terrain reconstruction, higher-resolution data, or process-based constraints where available, particularly in studies aiming to improve predictive performance in currently stable areas.
Citation: https://doi.org/10.5194/egusphere-2026-1318-AC1
-
AC1: 'Reply on RC1', Wendy Quiroz, 02 May 2026
-
RC2: 'Comment on egusphere-2026-1318', Anonymous Referee #2, 19 Apr 2026
The manuscript should be rejected as its overall scientific quality is very low and does not meet the standard required for publication. Although the topic of spatial validation in landslide susceptibility modelling is potentially relevant, the study is fundamentally limited by extremely poor data quality and insufficient scientific depth. Most critically, the landslide inventory contains only 25 events within a very small study area (~32 km²), which is far from adequate to support a pixel-based machine learning model or any statistically meaningful validation. The modelling framework therefore lacks representativeness, robustness, and generalizability. The so-called “multi-scale spatial validation” is essentially a technical exercise with limited novelty, as similar block cross-validation approaches have already been widely discussed in the literature, and the manuscript fails to provide any substantive methodological advancement. In addition, the use of balanced pseudo-absence sampling further undermines the physical interpretability of the results, yet this issue is not rigorously addressed. The discussion section is largely descriptive and repetitive, lacking critical analysis, mechanism interpretation, or connection to broader hazard processes. No new insights into landslide processes, hazard mechanisms, or practical applications are provided. Figures and results mainly confirm well-known patterns (e.g., lithology and elevation dominance), offering little scientific contribution. Overall, the study is a routine application of an existing method on a very limited dataset, with weak innovation, insufficient data support, and poor scientific significance. Therefore, rejection is recommended.
Citation: https://doi.org/10.5194/egusphere-2026-1318-RC2 -
AC2: 'Reply on RC2', Wendy Quiroz, 02 May 2026
We thank the reviewer for their assessment and for highlighting concerns regarding data limitations, methodological novelty, and the overall scientific contribution of the manuscript.
We acknowledge that the study is based on a relatively small landslide inventory (n = 25) within a spatially constrained area (~32 km²). This is an inherent limitation of the study design, which we now explicitly recognise and discuss in the revised manuscript. Rather than aiming for broad generalisation, the study is framed as a detailed evaluation of modelling behaviour and validation strategies under conditions of limited data availability, which are common in many landslide-prone regions.
Regarding the use of pixel-based modelling and the associated sample size, we clarify that the objective is not to infer independent statistical observations in a strict sense, but to evaluate spatial patterns and model generalisation under spatial dependence. To address concerns related to pseudo-replication and spatial structure, we implemented complementary validation strategies (spatial block cross-validation and Leave-One-Landslide-Out cross-validation), and we explicitly discuss their implications and limitations in the revised manuscript.
We also acknowledge that the methodological components employed (e.g., Random Forest, spatial cross-validation) are not novel in isolation. The contribution of this study lies in the systematic comparison of validation strategies and their influence on model interpretation, particularly in a small and spatially structured dataset. We have clarified this positioning in the revised manuscript and strengthened the discussion to better situate our work within the existing literature.
Concerning pseudo-absence sampling, we recognise its limitations in terms of physical interpretability and now explicitly discuss the implications of balanced sampling strategies, including their effect on model behaviour and interpretation.
We further revised the Discussion section to reduce descriptive elements and improve the interpretation of results, including a clearer connection between model outputs, geomorphological context, and broader limitations of data-driven susceptibility modelling.
Finally, we highlight that the main contribution of the study lies in the systematic analysis of how validation design and spatial dependence influence the performance and interpretation of susceptibility models, particularly in contexts with limited data availability.
We believe that these revisions substantially improve the clarity, transparency, and scientific positioning of the manuscript, and we respectfully submit that the study provides a meaningful contribution within its defined scope.
Citation: https://doi.org/10.5194/egusphere-2026-1318-AC2
-
AC2: 'Reply on RC2', Wendy Quiroz, 02 May 2026
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 584 | 241 | 54 | 879 | 55 | 66 |
- HTML: 584
- PDF: 241
- XML: 54
- Total: 879
- BibTeX: 55
- EndNote: 66
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The paper assesses landslide susceptibility based on 25 rotational landslides in a relatively small study area (~32 km²) in northern Peru using a Random Forest model, with a focus on pseudo-absence selection and spatial validation. By applying and interpreting spatial block cross-validation, the author infers that conventional validation likely overestimates model performance due to spatial autocorrelation. Based on permutation-based feature importance, lithology and elevation were identified as the main drivers of slope instability. The study concludes that model performance strongly depends on the chosen validation approach, with larger validation blocks leading to lower but more realistic performance estimates (LINE 622f) “highlighting the influence of spatial autocorrelation on apparent predictive capacity”. The author finally concludes that (Line 644f) “The modelling approach developed in this study is particularly suited to rotational landslides under similar geomorphological conditions. Extension to other mass movement types would require process-tailored sampling strategies and predictor selection.”
While the manuscript addresses relevant aspects of data-driven landslide susceptibility modelling, I am not fully convinced that it is ready for publication. The scope is relatively narrow, focusing on a small number of landslides (n = 25) within a limited spatial extent. In addition, several methodological decisions remain insufficiently justified or unclear. Most importantly, the study offers limited conceptual or methodological advancement, as Random Forest-based susceptibility modelling and spatial validation strategies are already established in the literature. Overall, I consider the manuscript to be at the borderline between rejection and major revision. I formally recommend major revisions, although addressing the key concerns outlined below may require substantial changes that could alter the scope and structure of the study.
Details:
1. Sampling strategy and spatial autocorrelation
The effective sample size appears to be artificially inflated (approximately 30,000 observations within ~32 km²). From the text I understand that multiple pixels from the same landslide were treated as independent presence observations. If so, this may introduce another dimension of spatial dependence and potentially undermines the central aim of this study of addressing spatial autocorrelation. Additionally, this approach implicitly assigns (undesired?) greater weight to larger landslides, as they contribute more pixels to the model (even though magnitude/size is not part of conventional landslide susceptibility modelling).
Alternative strategies could be considered. For example, mixed-effects modelling frameworks allow explicit treatment of hierarchical structures (e.g., pixels nested within landslides via a landslide ID as a random effect). More broadly, if spatial dependence is a core topic in this study, the analysis could go beyond spatial block validation alone. There are even approaches such as spatially explicit hyperparameter tuning that tackle the topic not only during validation.
Schratz, P., Muenchow, J., Iturritxa, E., Richter, J., and Brenning, A.: Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data, Ecological Modelling, 406, 109–120, https://doi.org/10.1016/j.ecolmodel.2019.06.002, 2019.
Schlögl, M., Spiekermann, R., and Steger, S.: Towards a holistic assessment of landslide susceptibility models: insights from the Central Eastern Alps, Environ Earth Sci, 84, 113, https://doi.org/10.1007/s12665-024-12041-y, 2025.
Moreover, several studies in landslide susceptibility modelling advocate using a single representative point per landslide, preferably within the initiation zone, to better capture causal landslide conditions. Based on Figure 1, it appears that both scarp and runout areas were mapped and therefore sampled. Including runout zones may dilute the relationship between predictors and landslide initiation, potentially explaining why slope or other predictor appears unimportant while lithology and elevation dominate. These implications should be more thoroughly considered.
2. Missing engagement with spatial cross-validation literature
The manuscript does not sufficiently engage with other literature on spatial cross-validation. Several contributions (e.g., Brenning; Schratz et al.; Schlögl et al.) as well as critical perspectives (e.g., Wadoux et al.) are not adequately discussed. Incorporating these studies would help contextualize the methodological choices and clarify the degree of novelty of this work. It would also allow for a more balanced discussion of when spatial cross-validation is appropriate and what its limitations are.
Wadoux, A. M. J.-C., Heuvelink, G. B. M., de Bruin, S., and Brus, D. J.: Spatial cross-validation is not the right way to evaluate map accuracy, Ecological Modelling, 457, 109692, https://doi.org/10.1016/j.ecolmodel.2021.109692, 2021.
Brenning, A.: Spatial prediction models for landslide hazards: review, comparison and evaluation, Natural Hazards and Earth System Science, 5, 853–862, 2005.
Schratz, P., Muenchow, J., Iturritxa, E., Richter, J., and Brenning, A.: Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data, Ecological Modelling, 406, 109–120, https://doi.org/10.1016/j.ecolmodel.2019.06.002, 2019.
Schlögl, M., Spiekermann, R., and Steger, S.: Towards a holistic assessment of landslide susceptibility models: insights from the Central Eastern Alps, Environ Earth Sci, 84, 113, https://doi.org/10.1007/s12665-024-12041-y, 2025.
3. Justification of spatial partitioning strategy
The rationale behind the selected spatial block sizes (600–1500 m) remains unclear and should be more explicitly justified. In particular, it is not evident how these scales relate to the spatial characteristics of the mapped landslides or the underlying processes. Additionally, it may be important to clarify whether individual landslides can span multiple spatial blocks. If so, there is a possibility that different parts of the same landslide are included in both the training and validation datasets. This would compromise the independence between training and test data. The manuscript should explicitly address this issue and, if relevant, describe how such cases were handled.
Alternative spatial partitioning strategies are neither tested nor discussed, such as:
• clustering-based approaches (e.g., k-means),
• geomorphologically meaningful units (e.g., catchments),
• lithological or landscape-based stratification.
Acknowledging or comparing such alternatives would strengthen the methodological robustness.
4. “Spatial leakage”
The term “spatial leakage” (Line 79) is introduced without a clear definition. While it is often used to describe unintended information transfer between training and validation datasets (see also comments before) due to spatial proximity, the manuscript should explicitly define how the term is used in this study.
5. Process-based considerations
The study area is small but contains relatively large, predominantly rotational landslides (mean size ~16.7 ha), which likely refer to deep-seated processes. In such cases, subsurface conditions (e.g., material properties, hydrology, weak layers) are often key controls but are not well captured by typical surface predictors used in this study. Furthermore, as also outlined above, the manuscript does not clearly distinguish between initiation and runout zones. Including runout areas in the sampling (especially for such large landslides) may introduce noise, as these areas can occur under very different topographic and lithological conditions than the source zones. This could reduce the model ability to identify “true” causal factors.
6. Resolution and representation of pre-failure conditions
The use of a 10 m DEM raises concerns for modelling relatively large landslides in such a way. Morphometric predictors derived from such data are likely influenced by post-failure topography rather than representing pre-failure conditions. This is particularly relevant in studies of large landslides, where the terrain has been substantially altered. As a result, the model may have limited predictive capability for identifying currently stable but susceptible areas, arguably the primary objective of susceptibility modelling. This limitation should be explicitly addressed and discussed in the manuscript.
Steger, S., Schmaltz, E., and Glade, T.: The (f)utility to account for pre-failure topography in data-driven landslide susceptibility modelling, Geomorphology, 354, 107041, https://doi.org/10.1016/j.geomorph.2020.107041, 2020.
Van Den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G., and Vandekerckhove, L.: Prediction of landslide susceptibility using rare events logistic regression: A case-study in the Flemish Ardennes (Belgium), Geomorphology, 76, 392–410, https://doi.org/10.1016/j.geomorph.2005.12.003, 2006.