the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Towards automated identification of mass movements in spaceborne interferograms: Comparing expert mapping and deep learning approaches
Abstract. We compare the performance of domain experts and deep learning algorithms in mapping mass movements in alpine scenarios by relying on Sentinel-1 wrapped phase interferograms. First, statistical assessment suggest that same mass movements are not consistently delineated, with generally low intersection over union (IoU) values (0.21–0.41), reflecting the difficulty of consistently distinguishing between active/inactive and coherent/incoherent phase patterns. Second, we tested deep learning (DL) architectures and strategies trained on > 1000 manually mapped coherent phase patterns to identify the best-performing model. Among the tested DL models, U-Net++ with a ResNet-18 encoder and specific optimisations herein developed, achieved the highest performance. We found an IoU of 0.61 relative to the training labels and, when compared in the ten selected case-studies, DL fell within the range of inter-expert variability (mean IoU of 0.494 ± 0.045, Dice coefficient of 0.661 ± 0.041). Our results show that optimised DL approaches allow detecting mass-movement-related patterns in Sentinel-1 interferograms achieving performances in the same range or higher than domain-experts. DL can provide a substantial reduction in manual mapping efforts, consequently achieving higher levels of standardisation, homogeneity and reliability in the generation of mass movement catalogues based on radar interferograms.
- Preprint
(43220 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 26 Mar 2026)
- RC1: 'Comment on egusphere-2026-375', Anonymous Referee #1, 02 Mar 2026 reply
-
RC2: 'Comment on egusphere-2026-375', Anonymous Referee #2, 06 Mar 2026
reply
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-375/egusphere-2026-375-RC2-supplement.pdf
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 300 | 113 | 22 | 435 | 16 | 31 |
- HTML: 300
- PDF: 113
- XML: 22
- Total: 435
- BibTeX: 16
- EndNote: 31
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The manuscript presents research on automated detection of mass movement-related phase patterns in Sentinel-1 wrapped interferograms using deep learning (DL) approaches. The authors established a custom-labelled training dataset containing over 1,000 manually mapped coherent phase patterns from four interferograms covering the canton of Valais, Switzerland, and benchmarked multiple semantic segmentation architectures. A domain-expert variability study was also conducted across ten selected case studies to quantify task-intrinsic uncertainty and contextualize DL performance. The work is methodologically thorough and of clear relevance to the landslide remote sensing community. However, several aspects of the experimental design and discussion require improvement before the manuscript can be accepted for publication. The following comments are offered to help strengthen the work:
The guiding instruction provided to participants, "Delineate the mass movement-related phase patterns that you would expect an automated mapping approach to detect on this interferogram", may unintentionally conflate two distinct cognitive tasks: geoscientific interpretation of the interferogram and subjective expectation of DL model capability. Because experts likely differ substantially in their assumptions about what automated systems can or cannot detect, a portion of the observed low inter-expert IoU (ranging from 0.18 to 0.49) may reflect inconsistency in perceived model capability rather than genuine disagreement in geomorphological judgment. This distinction is important, as it affects how the expert variability results should be interpreted and used as a benchmark for DL performance. The authors are therefore encouraged to explicitly discuss these two potential sources of disagreement in Section 4.1, and to clarify what design choices were made during the study setup to minimize this ambiguity in participant annotations.
As noted in Section 2.3.2, all training labels were produced by a single individual, and the multi-round quality control described in Section 4.2.1 was also performed exclusively by the same mapper. Given that pairwise IoU values among the six experts ranged from as low as 0.18 to 0.49, this raises a substantive concern that the DL models may have learned the idiosyncratic annotation style of one operator rather than generalizable geoscientific characteristics of mass movement phase patterns. While the authors acknowledge this limitation in Section 4.2.1, no concrete mitigation strategy is presented. The manuscript would be considerably strengthened if the authors either: (a) engaged a second annotator to independently label at least a representative subset of the training data and reported cross-annotator consistency using IoU or a comparable metric; or (b) conducted a repeat-labelling exercise by the primary mapper on a subset of scenes to quantify intra-rater reliability. Either approach would provide a more transparent characterization of the label quality and its potential influence on model behavior and generalizability.
The decision to discard all negative patches during training (Section 2.3.4), motivated by concerns about model conservatism, warrants further scrutiny in light of the precision results reported in Section 3.2.1. At an IoU threshold of t = 0.4, the model achieves a precision of approximately 64%, implying that roughly 36% of predicted detections do not correspond to true mass movement signals. By withholding negative examples entirely during training, the model may have had limited exposure to confounding signals, such as atmospheric artefacts, local noise patterns, or geometric distortions, that can visually resemble coherent phase patterns. Furthermore, since the training labels were produced by a single mapper and the magnitude-frequency analysis demonstrates systematic undersampling of smaller mass movements below approximately 5,600 m² (Section 2.3.2, Fig. 5), patches discarded as negative due to the absence of annotations may in fact contain unmapped or sub-threshold deformation signals, introducing a subtle but non-trivial form of label contamination into the training process. This issue is well recognized in the broader landslide prediction literature, where the definition and selection of true negative samples has been shown to substantially affect model behavior and generalizability (e.g., https://doi.org/10.1007/s10346-020-01473-9; https://doi.org/10.3390/rs15123200). The authors are therefore encouraged to investigate this potential trade-off more rigorously, either by incorporating a targeted set of hard negative samples, for example, patches dominated by atmospheric phase gradients or located in layover and shadow regions, or by conducting an ablation study comparing different positive-to-negative sampling ratios and their effect on the false positive rate. This would provide important practical guidance for users seeking to deploy the model operationally over large areas.
The authors identify the classification of phase patterns as coherent versus incoherent as the primary driver of inter-expert disagreement, with mean IoU values rising substantially from 0.21–0.41 to 0.496–0.644 when this distinction is not enforced (Section 3.1). This finding is significant, yet its implications for the DL training design are not fully explored in the discussion. Specifically, because the boundary between coherent and incoherent signals is itself ambiguous among experts, as demonstrated empirically by the study, the decision to train the model exclusively on coherent phase patterns may introduce compounding label uncertainty at the class boundary and could constrain the model's ability to generalize across the full spectrum of mass movement signatures encountered in operational interferograms. The authors are encouraged to discuss this connection more explicitly in Section 4.2.1 or 4.2.3, and to clarify how the observed ambiguity in coherent versus incoherent classification informed the DL model's scope and how users should interpret predictions near this classification boundary when applying the model in practice.
Section 2.1 describes the D-InSAR processing workflow using Sentinel-1 data acquired in ascending (track A088) and descending (track D066) geometries. While the authors have appropriately leveraged both viewing geometries for training data generation, there is a growing body of literature demonstrating that three-dimensional surface displacement fields can be retrieved by fusing multiple line-of-sight InSAR datasets from complementary acquisition geometries. Such multi-looking approaches, combining ascending and descending passes, and in some cases integrating azimuth offset tracking, can provide substantially richer kinematic information about mass movement processes, including the decomposition of horizontal and vertical displacement components that is often critical for interpreting failure mechanisms and movement styles. Given that the study already processes both ascending and descending interferograms, a brief discussion of how three-dimensional displacement retrieval could complement or extend the proposed DL detection framework would strengthen the contextual framing of the work and better position it within the current state of the art. The authors are encouraged to at minimum acknowledge these capabilities and their relevance to the broader landslide monitoring workflow in the introduction or discussion. Relevant examples from the recent literature include:
doi:10.3390/rs11030241
doi.org/10.1016/j.geoai.2026.100061
doi:10.1016/j.earscirev.2014.02.005