the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Exploring seismic mass-movement data with anomaly detection and dynamic time warping
Abstract. Catastrophic mass movements, such as rock avalanches, glacier collapses, and destructive debris flows, are typically rare events. Their detection is consequently challenging as annotated and verified events used as training data for instrumentation and algorithm tuning are absent or limited. In this work, we explore seismic mass-movement data through the lens of anomaly detection. The idea is to screen out segments of the data that are unlikely to contain mass movements by focusing only on anomalous signals, thereby reducing the number of signals to be studied, making downstream tasks such as expert labeling and clustering of events easier. To extract anomalous signals, we design a triggering algorithm using an anomaly score computed from an isolation forest obtained from sliding windows taken from the continuous data. The extracted signals are subjected to expert labeling and/or further analyzed by dynamic time warping, a popular technique used to evaluate the dissimilarity between different types of signals. We illustrate our approach by (a) mining for seismic signals of hazardous debris-flows in Switzerland's Illgraben catchment and (b) labeling of seismic mass movement data obtained from a Greenland seismometer network.
- Preprint
(24287 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 06 Jan 2026)
- RC1: 'Comment on egusphere-2025-3864', Anonymous Referee #1, 29 Sep 2025 reply
-
RC2: 'Comment on egusphere-2025-3864', Andreas Karakonstantis, 13 Dec 2025
reply
Kamper et al. present a manuscript in which they apply anomaly detection to automatically identify patterns in seismic data that may signal dangerous mass-movement events such as landslides, glacier collapses, or debris flows. They underline the gravity of analysing such movements and the necessity of reducing the time spent on data analysis to detect them, demonstrating the usefulness of their approach by mining for mass movements in Switzerland and Greenland.
The manuscript presents a critical topic, and a serious effort has been made to demonstrate the results of their approach in such events. My only concern is that some figures that provide a clearer view of the case are located mainly in the appendix (Figures D4, E1, E4), which somewhat underestimates the importance of the study because they are not present in the main body of the paper.
In conclusion, this manuscript should be published after minor revisions, primarily by upgrading the central part of the study by moving some figures from the Appendix.
Citation: https://doi.org/10.5194/egusphere-2025-3864-RC2 -
RC3: 'Interesting approach; could benefit from more clarity', Martijn van den Ende, 15 Dec 2025
reply
This manuscript proposes an anomaly detection method (IF), optionally supplemented with a time-series processing method (DTW), to scan seismic time-series for anomalies. While the underlying methods are not new, this is (possibly) the first application to seismic data. This paper therefore brings the IF/DTW algorithms to the field of seismology.
This was an interesting manuscript to read, though it required a lot of focus from my side to fully understand everything the authors wrote down. I think that the majority of readers will not dedicate as much focus, and I fear that most of them would not fully appreciate the efforts that the authors put in in this work. In my comments below I include several suggestions that could make this work more accessible to a non-computer science audience.
Introduction:
I took the liberty to read the comments from the other reviewers, who posted their reports before me. I agree with the point made by Reviewer #1 that the referencing is a bit light, though I don’t think that this is a crucial point. I personally don’t like extensive introductions for the sake of just citing a lot of different studies, but in this case there seems to be a bit of a gap between the first use of isolation forests in 2008 and the present day. A Google Scholar search of “isolation forest” yields 3,470,000 results, suggesting that there are numerous applications, modifications, and extensions of this method that could be relevant for this study. I’m not suggesting that the authors should adopt a different, more recent version of IFs, but it could be helpful to a reader to provide some pointers for future developments (based on recent work). This is done to some extent at the end of the Discussion section, but there it reads a bit like an afterthought.
I also want to remark that the motivation for using the proposed signal analysis methods (last paragraph) is not a direct consequence of the problem statement(s) laid out in the introduction. The authors state that STA/LTA is not suitable for environmental seismology due to the difficulty of discrimination, which is not solved by IFs or DTW (at least not without an additional clustering/classification step, which could likewise be applied to STA/LTA). In lines 31-39, the authors present a more philosophical argument of what it means for something to be classified as “noise”, which is again something that IFs/DTW doesn’t solve. In my opinion, the main advantage of IFs is their favourable computational/memory complexity, their enormous flexibility to combine different data sources (multi-sensor detection) and representations (time-series, images, tabular data, …), and the limited number of free parameters that could affect their performance (for evaluation only “hmin” is a user parameter, which is not used/discussed in this study). Because of these reasons, IFs is a “one-method-fits-all” technique that could become a default choice for seismological data exploration, which would improve consistency across studies and ease of interpretation of their results. Working backward from this notion, the authors could highlight some challenges with multi-sensor, multi-dimensional data analysis (seismometers, GNSS, radar, pluviometry, Insar, DAS, …) and consistency/interpretability of unsupervised detection and classification methods (STA/LTA, k-NN, t-SNE, auto-encoders, …).
Methods:
Starting with Section 2.2: for someone who doesn’t work with tree-based algorithms on a daily basis, it is not at all obvious that anomalies exhibit short paths in a tree (on average). This is explained in the Liu papers, but I think it would help the reader appreciate this method if the authors dedicate a few lines clarifying some statistical properties of anomalies, and how those are leveraged by IFs.
Then in lines 70-74 the authors describe the procedures for training and evaluation lumped together, which could be confusing/misleading to a casual reader. I would recommend the authors to be more explicit and explain that during training individual iTrees are initialised on different subsets of the data set, after which they remain static. At the evaluation stage, a new sample traverses each (static/calibrated) iTree, which yields the number of steps between the root and terminal node (h(x)). The path length for each iTree is averaged to obtain E[h(x)], which is converted into an anomaly detection metric. It would also be helpful to explicitly state that x is an entire time window, and not just one recorded sample in that window; for a time window of size N, x is therefore a point in N-dimensional space. This has interesting implications for the application of IFs to time-series data, but that’s beyond the scope of this review...
In Section 2.2.2, the authors define the use of “segment” as the collection of time windows for which the anomaly threshold exceeds the onset/offset conditions. This is where things might get a little confusing for someone who reads the paper with less attention; “segment” and “time window” are sufficiently generic terms that their specific meaning as used here might get lost as the reader progresses through the manuscript. For example, “Take all sliding windows over a segment” (line 125) is difficult to understand if the realisation has not yet fully set in that a “segment” has a specific meaning in this context. It might be helpful to the reader to systematically refer to “IF segment” in favour of just “segment”. This tiny addition may seem insignificant, but it serves as a reminder of what it is that we’re talking about, and to snap the brain out of the default mode of thinking about the meaning of “segment”.
Section 2.3 introduces DTW, which is a technique that is best explained visually. It would be most helpful if the authors could create a figure explaining the basic concepts of Section 2.2 (how a time window traverses a tree), Section 2.2.2 (what is a “segment” and “IF segment anomaly score”), and Section 2.3 (how does DTW work, how are template/segment DTW defined).
Case studies:
My remarks about the use of “segment” also apply to the word “catalog” as used in Section 3.1. If one takes a pre-existing catalogue and creates a new catalogue with IF, then you end up with two catalogues. But, if I understood correctly, in this manuscript “catalog” exclusively refers to the pre-existing catalogue, so it would make sense to systematically refer to it as “existing catalog” or “WSL catalog”. This confusion is exacerbated by lines 196-197, which are written as if the detections are considered the ground truth, assigning a label to the existing catalogue entries. My suggestion for rewriting this sentence: “A detection is labeled a true positive (TP) if it overlaps with an entry in the existing catalog, otherwise it is labeled a false positive (FP). If no detection coincides with an existing catalog entry, it is labeled a false negative (FN)”.
Specific smaller comments:
- Lines 41-42: the prominent placement of DTW seems to suggest that this is the cornerstone algorithm of this study, while it is my impression that IF does more of the heavy lifting (it is also the basis for DTW). Perhaps it makes more sense to not mention DTW here and introduce the acronym in line 50.
- Lines 46-47: the authors could mention here that IFs have an evaluation cost and memory footprint that scale as O(N). These are strong arguments in favour of IFs over pairwise-distance based algorithms that tend to scale as O(N^2).
- Line 79: it would help the reader to state here that the harmonic number is approximated as H(n) ≈ ln(n) + 0.577... for n >> 1, or show the Taylor expansion.
- Section 2.2: what about the “height limit” (“hlim” in Liu et al. 2018) that is part of the original algorithm proposed by Liu? Is this value set to infinity (= smallest granularity)? Did the authors consider this parameter at all? If all attributes of a data point x follow the same (normal) distribution, then this parameter should have no effect because only a single cluster exist that is centred on the origin.
- Figures A1,A2: my PDF reader was struggling a bit to render the large number of data points. Since we can’t see any details when 1000 symbols overlap, it would be more rendering-friendly to rasterise these figures, or at least the data points.
- Line 472, “non-linear wave-like pattern”: I suppose that the authors describe the shape of the data in analogy of a ocean wave breaking on the beach? Since this paper will be read by seismologists, “wave-like” will invoke a very different mental picture than the pattern of the data in these figures. I would use the term “hook-like”.
- Line 474, “they are fairly highly rank in the log standard deviation bands in which they appear”: what does this mean? That ILL{11,12,13,18} achieve higher IF scores than the other stations?
- Figures D1-4: in all cases, the IF score seems to increase a lot before the visible onset of the anomaly in the time series. Is this a result of acausal filtering/processing? If so, it would be good to mention that somewhere. If not, that would suggest IF is able to pick up an anomalous signal before it becomes visible in the time-series.
Citation: https://doi.org/10.5194/egusphere-2025-3864-RC3
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,649 | 136 | 29 | 1,814 | 28 | 37 |
- HTML: 1,649
- PDF: 136
- XML: 29
- Total: 1,814
- BibTeX: 28
- EndNote: 37
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The manuscript presents an unsupervised approach for detecting and cataloging seismic signals generated by mass movements, based on isolation forest (IF) anomaly detection combined with dynamic time warping (DTW) for the characterization of signal dissimilarities. The methodology is applied to two case studies: the refinement of an existing debris-flow catalog in Illgraben, Switzerland, and the generation of a new catalog from seismometer data in Greenland. The results are compared against the widely used STA-LTA triggering method, showing that the proposed IF-based approach often outperforms the baseline. The topic is timely and of high relevance to environmental seismology, where labeled data remain scarce and the detection of rare but hazardous events requires robust and automated strategies.
Nevertheless, there are several important weaknesses that must be addressed before the manuscript can be considered for publication. A first and major concern lies in the perplexing incompleteness of the bibliography. The reference list remains very narrow and omits several first-order contributions to both mass-wasting and debris-flow seismology, as well as recent methodological developments in anomaly detection and machine learning applied to seismology (for mass-movement, but also for glaciers, volcanoes, earthquakes etc). Without these references, the manuscript, which is, first and foremost, a methodological paper, does not convincingly situate its contribution in the broader research landscape. A substantial expansion of the literature review is mandatory in order to properly contextualize the approach, demonstrate novelty, and ensure that the article reaches the visibility it deserves within the field.
In addition the manuscript currently places an excessive amount of important content in the appendices, including crucial figures (e.g., D2 or D3, E1, E3) showing seismic signals and examples of detected events. These visual results are central to a seismological study and should appear in the main body of the paper. Beyond this, the exposition of the methodology is at times poorly balanced. Sections presenting detailed mathematical derivations of the methodology, such as the full formalism of the isolation forest, could be more appropriately relegated to appendices. In the main text, a more didactic explanation would be far more valuable. It would make the methodological contribution both clearer and more accessible to the readership. At present, the paper tends to emphasize equations at the expense of interpretability and understandability.
The evaluation of the results, although rigorous, would benefit from a clearer and more accessible presentation. Metrics such as precision, recall, and IoU are appropriate, but their distribution across dense tables makes comparisons difficult to follow. Averaged summary values or graphical representations would make the performance differences between methods easier to grasp. Similarly, the role of DTW, while potentially promising, is not convincingly established. At some stations DTW improves detection, but in other contexts its added value is marginal. A more explicit discussion of the specific conditions under which DTW enhances performance would strengthen the manuscript considerably.
The generalization and scalability of the approach also deserve further elaboration. The manuscript focuses on two case studies, but it would be important to reflect on the applicability of the methodology to larger seismic networks, to other types of gravitational mass movements, and to real-time operational monitoring. A presentation and a discussion of all the hyper-parameters used and their values is mandatory.
Figures and visualization more broadly need to be improved. Beyond the introductory material, the reader is given few direct visualizations of the detection process or of anomaly scores. Examples of time series with IF anomaly scores, accompanied by a schematic representation of the full workflow, would make the study more intuitive and strengthen its appeal for a seismological audience. Related to this, the terminology is sometimes confusing. The distinction between “trigger segments,” “detections,” and "catalog entries" is central but not always presented with sufficient clarity. A clear diagram of the complete processing chain would help avoid such ambiguities.
I think the manuscript presents a promising and relevant study with strong potential impact in environmental seismology, but it requires major revisions in order to address its most significant shortcomings. The bibliography must be expanded to include key references in the field, the structure must be rebalanced to highlight results over technical appendices, the methodology should be streamlined, the role of DTW clarified, and the results presented in a more intuitive and visual way. Without these improvements, the contribution remains incomplete and risks being undervalued in the literature.
Minor comments :
- Larose et al. (2015) focuses exclusively on seismic noise monitoring. There are many other references that would more accurately illustrate the point being made here.
- Bahavar et al. (2019) and Collins et al. (2022) represent significant contributions, but they are not the only efforts (particularly regarding machine learning) which is directly relevant to the present study.
- L21–25: STA/LTA is a detector, not a discriminator. The current phrasing is misleading.
- L28: Replace “see for example” with “e.g.,” followed by citations. More exhaustive referencing is needed to 1) provide the correct context for the study and 2) guide readers to other relevant works.
- L34–36: The bibliography on background noise monitoring is more complete than that on machine learning in environmental seismology, even though the latter is central to this paper…
- L47: Clarify what “vanilla” refers to. In seismology or in machine learning? Many algorithms now exist that combine anomaly detection and classification (e.g., VAEs, contrastive learning).
- Sections 3.1.3/3.1.4: These are methodological and should not appear in the results section.
- The datasets description is insufficient (number of samples, classes distribution, training/validation/test splits).
- L217: Clarify how grid search is performed in an unsupervised context. Does this not undermine the intended advantage of IF as a parameter-free exploratory tool? And the use of IF for true unsupervised exploration ?
- Tables: Highlight best-performing results in bold to facilitate interpretation.
- L272: Provide justification for onset/offset thresholds; where do these “rule-of-thumb” values come from?
- L272–273: Why the “top 50” segments? What if more than 50 are of interest? This seems to be central to your approach, and should be thoroughly discussed
- L283–286: The explanation is unclear. A diagram of the complete processing chain would help. Also specify the inconsistency threshold used in agglomerative clustering.
- L289–290: Define the metric by which segments are “most anomalous.” Provide values. Clarify what is meant by “further emphasized by the agglomerative clustering.”
- Section 3.2.3: Replace dendrograms with examples of seismic signals in clusters (A, B, C and D). For a seismological audience, the waveforms themselves are far more informative.
- L295: The description of four clusters “in increasing order of diversity” reflects a subjective choice. Justify why the dendrogram splits were regrouped this way and acknowledge the subjectivity involved.
- Figure 4 is not useful for the discussion and could be removed or moved to the appendices.
- L346–355: This discussion belongs in the introduction, not in the conclusion.