The dominant role of latent spatial structure in landslide susceptibility

Aristizábal, Edier; Lombardo, Luigi; Korup, Oliver

doi:10.5194/egusphere-2026-1749

Preprints

https://doi.org/10.5194/egusphere-2026-1749

Preprints

18 May 2026

| 18 May 2026

Status: this preprint is open for discussion and under review for Natural Hazards and Earth System Sciences (NHESS).

The dominant role of latent spatial structure in landslide susceptibility

Edier Aristizábal, Luigi Lombardo, and Oliver Korup

Abstract. Landslide susceptibility mapping is essential for risk management and mitigation. Traditional multivariate models have achieved high nominal accuracy in predicting landslide occurrence, but the underlying methods rarely account for spatial dependence in the input data. We test how spatial Hierarchical Generalized Linear Models (HGLMs) and spatial autoregressive models might enhance both accuracy and reliability of landslide susceptibility. We estimate the frequency of landslides in a catchment, drawing on a catalogue of 10,837 landslides and several predictors (e.g., rainfall, elevation, slope) in the northern Colombian Andes. Our HGLMs integrate the effects of spatial dependency and heterogeneity through Markov Random Field (MRF) models, i.e. Intrinsic Conditional Autoregressive (ICAR), Besag-York-Mollié (BYM), and Leroux models. Our results show that landslide frequency is significantly influenced by spatially-dependent unobserved factors, likely representing contiguous geological formations or shared soil properties, which we therefore incorporate as latent variables. Even after accounting for known covariates, the HGLMs capture spatial dependencies that non-spatial models fail to address. Incorporating spatial structure in the data improves model performance, judging from model selection metrics such as the Deviance Information Criterion (DIC) or the Watanabe–Akaike Information Criterion (WAIC). By accounting for latent spatial effects, spatial HGLMs produce smoother and more reliable susceptibility maps. This approach overcomes a key limitation of traditional models: the underestimation of landslide frequency in high-density areas where unobserved, spatially-structured factors are most influential. Our findings highlight the importance of integrating spatial dependence and heterogeneity in landslide susceptibility models to achieve enhanced predictive performance and reliability.

Received: 27 Mar 2026 – Discussion started: 18 May 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Edier Aristizábal, Luigi Lombardo, and Oliver Korup

Status: open (until 15 Aug 2026)

Post a comment Subscribe to comment alert

RC1:
'Comment on egusphere-2026-1749', Anonymous Referee #1, 27 May 2026 reply
General assessment
This manuscript addresses an important methodological question, namely the role of spatial dependence in landslide susceptibility modeling and the use of hierarchical spatial models. The comparison of non-spatial and spatial models is potentially relevant for NHESS readers, and the manuscript is clearly structured and generally well written. However, I cannot recommend publication in its current form. My concerns are conceptual as well as technical: (1) The study operates at the level of terrain units that is not appropriate for landslide susceptibility modeling, which undermines model interpretation and relevance of findings. (2) The study conflates statistical concepts, potentially over-interpreting and over-stating its findings. I therefore recommend rejection.
Major concerns
1. Terrain mapping units are inadequate for susceptibility modeling
The manuscript models landslide susceptibility using hydrographic catchments of approximately one hundrd to a few hundred km² as terrain mapping units. This support is inadequate for a process known to be governed by slope-scale conditions. Yet the predictors are aggregated to catchment means (mean slope, mean elevation, rainfall summaries), suppressing the variability that controls slope failure. This severe change-of-support problem likely dilutes predictor-response relationships by construction, resulting in inflated residual variation. In my view, this issue fundamentally undermines the manuscript’s central conclusion.
2. Over-interpretation of latent spatial effects
The manuscript repeatedly interprets latent spatial effects as evidence for contiguous lithology, soil properties, tectonic domains, or shared physical controls. However, the CAR formulation only encodes adjacency among catchments. The model demonstrates that neighboring polygons have similar residual structure; it does not identify geology or tectonic controls. No attempt is made to relate latent spatial structure to known geological or other structures in this vast mountain area, and no attempt is made to use such (even coarse) information to construct additional process-related covariates. The physical interpretation is therefore speculative and substantially overstated.
The use of three large hydrographic basins as higher-level units is particularly weakly justified. No evidence is presented that these basins correspond to meaningful geophysical groupings for slope stability, and there is no general reason to expect “tectonic domains” to align with hydrographic catchments.
3. Model interpretation and diagnostics are insufficient
The manuscript repeatedly makes strong claims (“reliable”, “accurate”, “better estimates”, “overconfident predictions”) without clearly distinguishing precision and bias of coefficient estimtors, predictive performance, and goodness of fit. Residual diagnostics also raise concerns. Pearson residuals appear extremely large (up to 20-30), suggesting major lack of fit or overdispersion (perhaps related to points 1 and 2 above), yet this is not discussed.
4. Predictor selection and process representation are insufficiently justified
The predictor set is extremely limited (mean slope, mean elevation, rainfall summary). Relevant predictors commonly used in susceptibility modeling (e.g., land cover/land use, lithology, anthropogenic disturbance such as roads, topographic wetness index, and of course local slope angle) are absent without justification. Predictor selection (correlation analysis and PCA) is insufficiently described, and the manuscript does not justify linear predictor-response assumptions despite well-known nonlinearities, particularly for slope angle.

Additional comments
Critical information on the landslide inventory and the predictor variables is missing, e.g. concerning completeness, consistency in digitizing landslides, homogeneous availability of multitemporal satellite imagery. Size distribution information would help to assess completeness for small landslides.

Clarify coefficient interpretation across models. M1 appears to use unstandardized predictors and rate ratios, whereas M2–M5 report posterior means after standardization. Table 1 therefore appears internally inconsistent and coefficients are not directly comparable.

Report confidence intervals and credible intervals consistently and prefer interpretable effect sizes, e.g. 100 m elevation difference not 1 m.

Moran’s I should be defined in Methods, not introduced implicitly in Results/Figure captions.

The Results section contains methodological material (e.g., model definitions) that belongs in Methods.

The Introduction overstates hierarchical spatial autoregressive models as the primary solution while overlooking alternative approaches (e.g., spatial smoothers, autologistic models, spatial bootstrap approaches), some of which have already been used in the landslide literature.

The Discussion does not discuss the findings in the context of the literature.

Additional minor comments
L11 "model performance" - or rather "goodness of fit" in the usual statistical terminology; "performance" suggests estimation on independent test set

L22-23 "cross-sectional" and "panel data" are terms that are not commonly used in the environmental statistics literature; prefer more descriptive alternatives? e.g. longitudinal or simply temporal or time series?

L24 TMU is not a concept from spatial statistics

L39 "model residuals" - definition not so obvious in the context of classification problems

L41 "interactions" - concept not obvious in this context; unobserved confounders are not interactions - not statistically and not otherwise

L44-51 "statistical oversight" is very strong wording, considering that most susceptibility analyses do not interpret model coefficients in detail and do not perform statistical inference.

Reply
Citation: https://doi.org/10.5194/egusphere-2026-1749-RC1
- AC1: 'Reply on RC1', Edier Vicente Aristizábal Giraldo, 06 Jun 2026 reply
  
  Major Concern 1: Terrain mapping units are inadequate for susceptibility modeling
  We respectfully disagree with this assessment. While landslides are inherently slope-scale processes, this does not preclude their analysis at coarser spatial units. The choice of analysis scale is governed by the study objectives, data availability, and the desired spatial resolution of the output map. Regional and national-scale susceptibility assessments, which necessarily operate at resolutions far coarser than the individual slope, are well established in the literature (Guzzetti et al., 2005, Geomorphology, 72:272–299; Stanley & Kirschbaum, 2017, Natural Hazards, 87:145–164; Nowicki Jessee et al., 2018, JGR: Earth Surface, 123:1351–1369). At those scales, predictors such as geology, mean slope, and climatic variables effectively serve as regional proxies for local susceptibility conditions.
  Critically, our dependent variable is not the binary presence/absence of a landslide on a slope, which would indeed require slope-unit or pixel resolution, but rather the count (and implicitly density) of landslides per catchment. This Poisson count formulation is specifically designed for aggregated data and is theoretically consistent with catchment-level predictors (Lombardo et al., 2020, Earth-Science Reviews, 209:103318). Mean catchment slope and elevation are appropriate summaries of terrain conditions that drive bulk susceptibility at this scale, a design choice analogous to using basin-wide lithology fractions or mean annual rainfall in regional hazard assessments. We will add a paragraph in the Methods section explicitly justifying the catchment unit in the context of the Poisson count model.
  
  Major Concern 2: Over-interpretation of latent spatial effects
  We partially agree and will revise the Discussion accordingly. The CAR/ICAR/BYM formulations encode spatial adjacency structure in the residuals; they do not identify specific geological or tectonic domains. Our Discussion text listing "geological characteristics, tectonic settings, and anthropogenic effects" was intended to enumerate plausible sources of unobserved spatial structure, not to assert that the model identifies them. We will rephrase this to make the distinction explicit, in line with standard practice in spatial hierarchical modelling (Besag et al., 1991; Lawson, 2018, Bayesian Disease Mapping; Morris et al., 2019, Spatial and Spatio-temporal Epidemiology, 31:100301).
  Regarding the three basins: the Atrato, Cauca, and Magdalena drainage systems correspond to distinct structural and lithological domains of the northern Colombian Andes, the Pacific-facing Western Cordillera (Atrato), the inter-Andean Cauca depression flanked by volcanic and metamorphic lithologies, and the Magdalena valley between the Central and Eastern Cordilleras, making them physically motivated grouping units for spatial heterogeneity at the upper level. We will substantiate this choice with supporting references in the revised manuscript.
  
  Major Concern 3: Model interpretation and diagnostics are insufficient
  We agree and will revise accordingly on both points.
  
  Major Concern 4: Predictor selection and process representation are insufficiently justified
  We acknowledge this concern and will expand the predictor selection section.
  We tested a broader initial set including topographic wetness index (TWI), curvature, drainage density, and NDVI, but these showed Pearson correlations |r| > 0.75 with the retained variables in the correlation and PCA screening, and were excluded to avoid multicollinearity (Reichenbach et al., 2018, Earth-Science Reviews, 180:60–91). We will report the full correlation matrix as supplementary material.
  
  Response to Additional Comments and Minor Comments
  We thank the reviewer for these constructive observations, which we consider will substantially improve the clarity and rigour of the manuscript. We commit to addressing all of them in the revised version.
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2026-1749-AC1
- CC1: 'Reply on RC1. i have a doubt in M3 model . code suggests it has positive Morans value but in paper it is negative.', Basit Ahad Raina, 09 Jun 2026 reply
  
  i have a doubt in M3 model . code suggests it has positive Morans value but in paper it is negative.
  
  Reply
  
  Citation: https://doi.org/10.5194/egusphere-2026-1749-CC1

Edier Aristizábal, Luigi Lombardo, and Oliver Korup

Data sets

database Edier Aristizabal https://github.com/edieraristizabal/PAPER_BHGLM

Edier Aristizábal, Luigi Lombardo, and Oliver Korup

Viewed

Total article views: 264 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
186	59	19	264	15	18

HTML: 186
PDF: 59
XML: 19
Total: 264
BibTeX: 15
EndNote: 18

Views and downloads (calculated since 18 May 2026)

Month	HTML	PDF	XML	Total
May 2026	134	42	8	184
Jun 2026	27	8	5	40
Jul 2026	25	9	6	40

Cumulative views and downloads (calculated since 18 May 2026)

Month	HTML	PDF	XML	Total
May 2026	134	42	8	184
Jun 2026	27	8	5	40
Jul 2026	25	9	6	40

Viewed (geographical distribution)

Total article views: 240 (including HTML, PDF, and XML) Thereof 240 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 18 Jul 2026

Short summary

Traditional models for landslide susceptibility often ignore spatially-dependent unobserved factors. This study improves accuracy in the Colombian Andes by incorporating spatial dependence and landscape variety, which are natural in environmental data. By accounting for these hidden physical connections, we found that landslide occurrences are more clustered than once thought.


Total:	0
HTML:	0
PDF:	0
XML:	0