the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Feature Selection for Landslide Forecasting Models in Southern Andes
Abstract. Rainfall-induced landslide (RIL) forecasting is crucial for early warning systems developed to mitigate the devastating impacts of these events on human lives, infrastructure, and the environment. Currently, dense instrumental networks for early warning require large datasets to identify precursor patterns in current machine learning models. Topographic, lithological, vegetation, soil moisture, and climatic characteristics are among the most commonly used variables for training these models. However, there are no universal designs, so it is necessary to adapt the requirements to each context and to the available variables that characterise it. To develop a RIL forecasting model for the Southern Andes, this study gathers data from various local soil and climate databases to identify the most relevant variables. Feature selection is crucial for improving the design of machine learning models, reducing the dimensionality of input data, enhancing computational efficiency, and preventing overfitting. We assessed the impact of various features, both individually and in combination, on the performance of predictive models. Methods such as Classification and Regression Tree and Genetic Algorithms are employed to perform the feature selection. A national landslide database was enriched using techniques such as buffer control sampling, PU Bagging, and clustering methods to incorporate negative examples (non-landslide) data. Various predictive models were tested. The results reveal some consistent variables as the most significant in forecasting landslides in four southern Chilean regions.
- Preprint
(11522 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2764', Anonymous Referee #1, 31 Jul 2025
-
AC2: 'Reply on RC1', Ivo Fustos, 26 Nov 2025
We appreciate the reviewer's comments and agree with all of them. All the comments allow for improvement in the newer version of the manuscript. Now, we include additional sections and corrections to the information gap and low-quality figures. We would appreciate it if the reviewer could revise the attached document. I am sharing the answer to the comments with you.
-
AC2: 'Reply on RC1', Ivo Fustos, 26 Nov 2025
-
RC2: 'Comment on egusphere-2025-2764', Anonymous Referee #2, 05 Sep 2025
This manuscript provides an interesting study on identifying the primary factors controlling rainfall-induced landslides in four Chilean regions. The framework and methodologies are sound. However, I regret to say that the core innovation of this study has not been sufficiently articulated. As the authors note (line 86-87), the study area is unique and complex in geological and climatological features. Nevertheless, the manuscript does not rigorously discuss which features emerge as the most representative variables for volcanic, sedimentary, and glacial terrains, nor how these features influence susceptibility mapping. Instead, the emphasis is placed on machine learning and feature selection techniques, which are widely used and cannot be highlighted as innovative in comparison with the unique geological setting of the study area. Meanwhile, the writing and structure of the manuscript are not well organised, which makes it difficult for the reader to follow the authors’ idea. For these reasons, I do not consider the manuscript is suitable for publication in its current form. Substantial revisions are required to improve its quality. Some detailed comments are as follows.
- L12-13: The abstract ends abruptly without presenting any concrete results. Please expand the abstract to include the key findings and avoid vague statements such as “various predictive models were tested.”
- L41-42: Seismic activity is not relevant to this study and should not be included in the introduction.
- L87-88: The diverse geological composition of the study area should be emphasized as one of the most important aspects. Please elaborate on how different soil and lithological types correspond to the selected controlling features.
- L119-121: The phrase “considerable attention” is unclear. Please specify the exact steps taken to ensure data quality.
- L127: The abbreviation “PP” is used without being defined beforehand. Please define it at first mention.
- Figure 7 and 8: These figures are not properly prepared. They contain non-English words, and their captions are incomplete. Please revise accordingly.
- Figure 9: Please add the coordinates to the map.
Citation: https://doi.org/10.5194/egusphere-2025-2764-RC2 -
AC1: 'Reply on RC2', Ivo Fustos, 26 Nov 2025
We are grateful to the reviewer for their careful and accurate assessment of our manuscript. We appreciate the positive recognition of the study's sound framework and methodologies and acknowledge the critical feedback regarding the insufficient articulation of the core innovation and the overall structure. The detailed comments have been invaluable in improving the quality and clarity of the revised submission. We agree with the reviewer's observation that the original manuscript did not sufficiently articulate the unique contribution in the context of the study area's complex geology, and this has been addressed in the new version. Moreover, the reviewer correctly identified that the emphasis on well-established machine learning and feature selection techniques (CART and GA) may have obscured the core novelty of our work. Now it was corrected and improved.
We wish to clarify that the primary objective of this study is not the construction of a novel landslide susceptibility map, but rather to systematically identify the most representative and influential variables that should be prioritised in monitoring networks and future, localised susceptibility models for rainfall-induced landslides in the Southern Andes. Our contribution is focused on filling a critical gap in South American landslide hazard assessment, where monitoring surveys often lack clear, evidence-based prioritisation of variables, especially across diverse, complex geological terrains (volcanic, sedimentary, glacial). In the revised manuscript, we have substantially re-focused the discussion to address the reviewer's point rigorously. Detailing the physical significance of the selected features (e.g., the importance of soil hydraulic properties like bulk density and saturated water content), which reflects the influence of the region’s heterogeneous soil and shallow geology on landslide initiation. Moreover, we connected the results directly to practical recommendations for monitoring, thus reinforcing that the predictive power is a means to determine variable importance, not an end in itself for producing a static susceptibility map.
We sincerely apologise for the original writing and structure, which made the manuscript difficult to follow. We recognise that a lack of clear organisation can severely hinder the transmission of the study's ideas. We have performed a comprehensive revision of the entire manuscript’s structure and writing to improve coherence and readability. The Introduction has been revised to clearly state the gap (lack of variable prioritisation for monitoring) and the study's specific goal (feature identification for early warning systems). The Methodology section is now more logically organised. The Discussion has been restructured first to present the feature selection results, then provide an in-depth analysis of their physical meaning and implications for regional monitoring/early warning system design, before briefly discussing model performance. We trust that these revisions have significantly enhanced the clarity, quality, and focus of the manuscript, making the unique contribution easily identifiable. We would appreciate it if the reviewer could revise the attached document. Please, revise the attached document with the answers.
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 803 | 180 | 19 | 1,002 | 17 | 26 |
- HTML: 803
- PDF: 180
- XML: 19
- Total: 1,002
- BibTeX: 17
- EndNote: 26
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This study presents a machine learning-based approach for landslide forecasting in the Southern Andes, combining feature selection methods (CART and genetic algorithms) with multiple classifiers (SVM, RF, XGB). The research design is sound, the methodology is robust, and the results hold practical significance, particularly in the context of early warning systems for geological hazards. The paper is recommended for publication after addressing the following points.
Major comments:
Minor comments:
The first paragraph of the conclusion (lines 465–475) could be condensed to avoid redundancy with earlier sections.