Feature Selection for Landslide Forecasting Models in Southern Andes

Labbe, Manuel; Curilem, Millaray; Fustos-Toribio, Ivo; Pooley, Mario

doi:10.5194/egusphere-2025-2764

Preprints

https://doi.org/10.5194/egusphere-2025-2764

Preprints

30 Jun 2025

| 30 Jun 2025

Feature Selection for Landslide Forecasting Models in Southern Andes

Manuel Labbe, Millaray Curilem, Ivo Fustos-Toribio, and Mario Pooley

Abstract. Rainfall-induced landslide (RIL) forecasting is crucial for early warning systems developed to mitigate the devastating impacts of these events on human lives, infrastructure, and the environment. Currently, dense instrumental networks for early warning require large datasets to identify precursor patterns in current machine learning models. Topographic, lithological, vegetation, soil moisture, and climatic characteristics are among the most commonly used variables for training these models. However, there are no universal designs, so it is necessary to adapt the requirements to each context and to the available variables that characterise it. To develop a RIL forecasting model for the Southern Andes, this study gathers data from various local soil and climate databases to identify the most relevant variables. Feature selection is crucial for improving the design of machine learning models, reducing the dimensionality of input data, enhancing computational efficiency, and preventing overfitting. We assessed the impact of various features, both individually and in combination, on the performance of predictive models. Methods such as Classification and Regression Tree and Genetic Algorithms are employed to perform the feature selection. A national landslide database was enriched using techniques such as buffer control sampling, PU Bagging, and clustering methods to incorporate negative examples (non-landslide) data. Various predictive models were tested. The results reveal some consistent variables as the most significant in forecasting landslides in four southern Chilean regions.

Received: 11 Jun 2025 – Discussion started: 30 Jun 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 11522 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (11522 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

14 Jul 2026

Feature selection for landslide forecasting models in Southern Andes

Manuel Labbé, Millaray Curilem, Ivo Fustos-Toribio, and Mario Pooley

Nat. Hazards Earth Syst. Sci., 26, 3253–3272, https://doi.org/10.5194/nhess-26-3253-2026,https://doi.org/10.5194/nhess-26-3253-2026, 2026

Short summary

Manuel Labbe, Millaray Curilem, Ivo Fustos-Toribio, and Mario Pooley

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-2764', Anonymous Referee #1, 31 Jul 2025
This study presents a machine learning-based approach for landslide forecasting in the Southern Andes, combining feature selection methods (CART and genetic algorithms) with multiple classifiers (SVM, RF, XGB). The research design is sound, the methodology is robust, and the results hold practical significance, particularly in the context of early warning systems for geological hazards. The paper is recommended for publication after addressing the following points.
Major comments:
The introduction currently provides a broad overview of landslide forecasting but could better highlight the specific innovations of this study. For example: (1) The proposed solutions for data scarcity in the Southern Andes (e.g., PU Bagging and buffer control sampling); (2) The unique advantages of the hybrid feature selection approach (GA + CART) in landslide prediction.

The conclusion should more explicitly summarize the improvements this study offers to existing landslide early warning systems and its practical implications.

The abstract should be refined to convey more insightful information.

The paper mentions multiple databases (e.g., ERA5, CLSoilMaps) but lacks details on their temporal coverage, resolution consistency, and handling of missing data. These details should be added.

The methodology for generating "negative examples" (non-landslide data) via Buffer Control Sampling and PU Bagging requires further justification.

Tables 4 and 5 present performance metrics for different methods but lack statistical significance tests (e.g., p-values or confidence intervals). Such tests would strengthen the claim that GA-based optimization outperforms other methods.

The similar performance of GA XGB and GA RF (both with 10.95% error rates) warrants discussion on their trade-offs in real-world applications (e.g., computational efficiency, interpretability).

The study focuses on two regions in southern Chile (Los Lagos and Los Ríos). The conclusions should clarify whether they are applicable to areas with different geological or climatic conditions.

The discussion should more thoroughly address model limitations.

Minor comments:
Labels in Figures 3 and 4 (correlation matrices) are too small and should be enlarged or provided in higher resolution.

Abbreviated variable names in Tables 1 and 2 (e.g., "AvMoist") should be explained in the main text or footnotes (e.g., "Available Moisture").

Inconsistent formatting of terms (e.g., "GA XGB" vs. "GA_XGB") should be unified.

The first paragraph of the conclusion (lines 465–475) could be condensed to avoid redundancy with earlier sections.
Citation: https://doi.org/10.5194/egusphere-2025-2764-RC1
- AC2: 'Reply on RC1', Ivo Fustos, 26 Nov 2025
  
  We appreciate the reviewer's comments and agree with all of them. All the comments allow for improvement in the newer version of the manuscript. Now, we include additional sections and corrections to the information gap and low-quality figures. We would appreciate it if the reviewer could revise the attached document. I am sharing the answer to the comments with you.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2764-AC2
RC2:
'Comment on egusphere-2025-2764', Anonymous Referee #2, 05 Sep 2025
This manuscript provides an interesting study on identifying the primary factors controlling rainfall-induced landslides in four Chilean regions. The framework and methodologies are sound. However, I regret to say that the core innovation of this study has not been sufficiently articulated. As the authors note (line 86-87), the study area is unique and complex in geological and climatological features. Nevertheless, the manuscript does not rigorously discuss which features emerge as the most representative variables for volcanic, sedimentary, and glacial terrains, nor how these features influence susceptibility mapping. Instead, the emphasis is placed on machine learning and feature selection techniques, which are widely used and cannot be highlighted as innovative in comparison with the unique geological setting of the study area. Meanwhile, the writing and structure of the manuscript are not well organised, which makes it difficult for the reader to follow the authors’ idea. For these reasons, I do not consider the manuscript is suitable for publication in its current form. Substantial revisions are required to improve its quality. Some detailed comments are as follows.
L12-13: The abstract ends abruptly without presenting any concrete results. Please expand the abstract to include the key findings and avoid vague statements such as “various predictive models were tested.”

L41-42: Seismic activity is not relevant to this study and should not be included in the introduction.

L87-88: The diverse geological composition of the study area should be emphasized as one of the most important aspects. Please elaborate on how different soil and lithological types correspond to the selected controlling features.

L119-121: The phrase “considerable attention” is unclear. Please specify the exact steps taken to ensure data quality.

L127: The abbreviation “PP” is used without being defined beforehand. Please define it at first mention.

Figure 7 and 8: These figures are not properly prepared. They contain non-English words, and their captions are incomplete. Please revise accordingly.

Figure 9: Please add the coordinates to the map.
Citation: https://doi.org/10.5194/egusphere-2025-2764-RC2
- AC1: 'Reply on RC2', Ivo Fustos, 26 Nov 2025
  
  We are grateful to the reviewer for their careful and accurate assessment of our manuscript. We appreciate the positive recognition of the study's sound framework and methodologies and acknowledge the critical feedback regarding the insufficient articulation of the core innovation and the overall structure. The detailed comments have been invaluable in improving the quality and clarity of the revised submission. We agree with the reviewer's observation that the original manuscript did not sufficiently articulate the unique contribution in the context of the study area's complex geology, and this has been addressed in the new version. Moreover, the reviewer correctly identified that the emphasis on well-established machine learning and feature selection techniques (CART and GA) may have obscured the core novelty of our work. Now it was corrected and improved.
  We wish to clarify that the primary objective of this study is not the construction of a novel landslide susceptibility map, but rather to systematically identify the most representative and influential variables that should be prioritised in monitoring networks and future, localised susceptibility models for rainfall-induced landslides in the Southern Andes. Our contribution is focused on filling a critical gap in South American landslide hazard assessment, where monitoring surveys often lack clear, evidence-based prioritisation of variables, especially across diverse, complex geological terrains (volcanic, sedimentary, glacial). In the revised manuscript, we have substantially re-focused the discussion to address the reviewer's point rigorously. Detailing the physical significance of the selected features (e.g., the importance of soil hydraulic properties like bulk density and saturated water content), which reflects the influence of the region’s heterogeneous soil and shallow geology on landslide initiation. Moreover, we connected the results directly to practical recommendations for monitoring, thus reinforcing that the predictive power is a means to determine variable importance, not an end in itself for producing a static susceptibility map.
  We sincerely apologise for the original writing and structure, which made the manuscript difficult to follow. We recognise that a lack of clear organisation can severely hinder the transmission of the study's ideas. We have performed a comprehensive revision of the entire manuscript’s structure and writing to improve coherence and readability. The Introduction has been revised to clearly state the gap (lack of variable prioritisation for monitoring) and the study's specific goal (feature identification for early warning systems). The Methodology section is now more logically organised. The Discussion has been restructured first to present the feature selection results, then provide an in-depth analysis of their physical meaning and implications for regional monitoring/early warning system design, before briefly discussing model performance. We trust that these revisions have significantly enhanced the clarity, quality, and focus of the manuscript, making the unique contribution easily identifiable. We would appreciate it if the reviewer could revise the attached document. Please, revise the attached document with the answers.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2764-AC1

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-2764', Anonymous Referee #1, 31 Jul 2025
This study presents a machine learning-based approach for landslide forecasting in the Southern Andes, combining feature selection methods (CART and genetic algorithms) with multiple classifiers (SVM, RF, XGB). The research design is sound, the methodology is robust, and the results hold practical significance, particularly in the context of early warning systems for geological hazards. The paper is recommended for publication after addressing the following points.
Major comments:
The introduction currently provides a broad overview of landslide forecasting but could better highlight the specific innovations of this study. For example: (1) The proposed solutions for data scarcity in the Southern Andes (e.g., PU Bagging and buffer control sampling); (2) The unique advantages of the hybrid feature selection approach (GA + CART) in landslide prediction.

The conclusion should more explicitly summarize the improvements this study offers to existing landslide early warning systems and its practical implications.

The abstract should be refined to convey more insightful information.

The paper mentions multiple databases (e.g., ERA5, CLSoilMaps) but lacks details on their temporal coverage, resolution consistency, and handling of missing data. These details should be added.

The methodology for generating "negative examples" (non-landslide data) via Buffer Control Sampling and PU Bagging requires further justification.

Tables 4 and 5 present performance metrics for different methods but lack statistical significance tests (e.g., p-values or confidence intervals). Such tests would strengthen the claim that GA-based optimization outperforms other methods.

The similar performance of GA XGB and GA RF (both with 10.95% error rates) warrants discussion on their trade-offs in real-world applications (e.g., computational efficiency, interpretability).

The study focuses on two regions in southern Chile (Los Lagos and Los Ríos). The conclusions should clarify whether they are applicable to areas with different geological or climatic conditions.

The discussion should more thoroughly address model limitations.

Minor comments:
Labels in Figures 3 and 4 (correlation matrices) are too small and should be enlarged or provided in higher resolution.

Abbreviated variable names in Tables 1 and 2 (e.g., "AvMoist") should be explained in the main text or footnotes (e.g., "Available Moisture").

Inconsistent formatting of terms (e.g., "GA XGB" vs. "GA_XGB") should be unified.

The first paragraph of the conclusion (lines 465–475) could be condensed to avoid redundancy with earlier sections.
Citation: https://doi.org/10.5194/egusphere-2025-2764-RC1
- AC2: 'Reply on RC1', Ivo Fustos, 26 Nov 2025
  
  We appreciate the reviewer's comments and agree with all of them. All the comments allow for improvement in the newer version of the manuscript. Now, we include additional sections and corrections to the information gap and low-quality figures. We would appreciate it if the reviewer could revise the attached document. I am sharing the answer to the comments with you.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2764-AC2
RC2:
'Comment on egusphere-2025-2764', Anonymous Referee #2, 05 Sep 2025
This manuscript provides an interesting study on identifying the primary factors controlling rainfall-induced landslides in four Chilean regions. The framework and methodologies are sound. However, I regret to say that the core innovation of this study has not been sufficiently articulated. As the authors note (line 86-87), the study area is unique and complex in geological and climatological features. Nevertheless, the manuscript does not rigorously discuss which features emerge as the most representative variables for volcanic, sedimentary, and glacial terrains, nor how these features influence susceptibility mapping. Instead, the emphasis is placed on machine learning and feature selection techniques, which are widely used and cannot be highlighted as innovative in comparison with the unique geological setting of the study area. Meanwhile, the writing and structure of the manuscript are not well organised, which makes it difficult for the reader to follow the authors’ idea. For these reasons, I do not consider the manuscript is suitable for publication in its current form. Substantial revisions are required to improve its quality. Some detailed comments are as follows.
L12-13: The abstract ends abruptly without presenting any concrete results. Please expand the abstract to include the key findings and avoid vague statements such as “various predictive models were tested.”

L41-42: Seismic activity is not relevant to this study and should not be included in the introduction.

L87-88: The diverse geological composition of the study area should be emphasized as one of the most important aspects. Please elaborate on how different soil and lithological types correspond to the selected controlling features.

L119-121: The phrase “considerable attention” is unclear. Please specify the exact steps taken to ensure data quality.

L127: The abbreviation “PP” is used without being defined beforehand. Please define it at first mention.

Figure 7 and 8: These figures are not properly prepared. They contain non-English words, and their captions are incomplete. Please revise accordingly.

Figure 9: Please add the coordinates to the map.
Citation: https://doi.org/10.5194/egusphere-2025-2764-RC2
- AC1: 'Reply on RC2', Ivo Fustos, 26 Nov 2025
  
  We are grateful to the reviewer for their careful and accurate assessment of our manuscript. We appreciate the positive recognition of the study's sound framework and methodologies and acknowledge the critical feedback regarding the insufficient articulation of the core innovation and the overall structure. The detailed comments have been invaluable in improving the quality and clarity of the revised submission. We agree with the reviewer's observation that the original manuscript did not sufficiently articulate the unique contribution in the context of the study area's complex geology, and this has been addressed in the new version. Moreover, the reviewer correctly identified that the emphasis on well-established machine learning and feature selection techniques (CART and GA) may have obscured the core novelty of our work. Now it was corrected and improved.
  We wish to clarify that the primary objective of this study is not the construction of a novel landslide susceptibility map, but rather to systematically identify the most representative and influential variables that should be prioritised in monitoring networks and future, localised susceptibility models for rainfall-induced landslides in the Southern Andes. Our contribution is focused on filling a critical gap in South American landslide hazard assessment, where monitoring surveys often lack clear, evidence-based prioritisation of variables, especially across diverse, complex geological terrains (volcanic, sedimentary, glacial). In the revised manuscript, we have substantially re-focused the discussion to address the reviewer's point rigorously. Detailing the physical significance of the selected features (e.g., the importance of soil hydraulic properties like bulk density and saturated water content), which reflects the influence of the region’s heterogeneous soil and shallow geology on landslide initiation. Moreover, we connected the results directly to practical recommendations for monitoring, thus reinforcing that the predictive power is a means to determine variable importance, not an end in itself for producing a static susceptibility map.
  We sincerely apologise for the original writing and structure, which made the manuscript difficult to follow. We recognise that a lack of clear organisation can severely hinder the transmission of the study's ideas. We have performed a comprehensive revision of the entire manuscript’s structure and writing to improve coherence and readability. The Introduction has been revised to clearly state the gap (lack of variable prioritisation for monitoring) and the study's specific goal (feature identification for early warning systems). The Methodology section is now more logically organised. The Discussion has been restructured first to present the feature selection results, then provide an in-depth analysis of their physical meaning and implications for regional monitoring/early warning system design, before briefly discussing model performance. We trust that these revisions have significantly enhanced the clarity, quality, and focus of the manuscript, making the unique contribution easily identifiable. We would appreciate it if the reviewer could revise the attached document. Please, revise the attached document with the answers.
  
  Citation: https://doi.org/10.5194/egusphere-2025-2764-AC1

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (11 Dec 2025) by Federica Fiorucci

AR by Ivo Fustos on behalf of the Authors (19 Jan 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (13 Feb 2026) by Federica Fiorucci

RR by Anonymous Referee #1 (28 Feb 2026)

RR by Anonymous Referee #2 (06 Mar 2026)

ED: Publish subject to minor revisions (review by editor) (24 Mar 2026) by Federica Fiorucci

Dear Authors,

Thank you for submitting the revised version of your manuscript.

After evaluation of the revised paper and the reviewers’ comments, I am pleased to inform you that the manuscript is accepted for publication, subject to the correction of the remaining minor revisions requested by the reviewers.

Both reviewers acknowledged the substantial improvements made in the revised version.
Before the manuscript can proceed to the final stage, please carefully address the following remaining points:

Reviewer 1
The genetic algorithm highlights volumetric water content, bulk density, and parameters related to the soil water retention curve as influential features. In Section 6, please add a brief physical interpretation of how these variables relate to infiltration, near surface saturation, and pore pressure response, and clarify their relevance to rainfall triggered slope initiation.

The manuscript states that key genetic algorithm settings, such as the crossover and mutation probabilities, were chosen by trial and error. For reproducibility, please report the tested ranges or candidate sets and the selection criterion, and briefly note whether multiple independent runs or other measures were used to reduce the risk of premature convergence or local optima.

In Figure 8, the y axis feature labels are difficult to read at normal viewing size. Please increase the font size or adjust the layout to improve readability.

Reviwer 2 Here, I have listed a few remaining points that could be useful to further enhance the quality of the paper:
1. Eq. 1: Please check the writing of this equation. The summation operator appears to be rendered incorrectly as an "X" rather than Sigma. I suspect this may be a printing error.
2. Fig. 8: In the response file to the previous reviews, the authors suggested that Figure 8 would be eliminated. However, the figure remains in the revised manuscript. Why?
3. Section 3: In the sentences beginning with "We used an approach to establish the critical…" and ending with “…the full spectrum of lithological and climatic features within the study area,", what is exactly the approach used for the preparation of the dataset? Are you referring to the series of operations detailed in the subsequent paragraph? I suggest explicitly introducing this approach at the beginning of the section, which may help readers to understand the specific steps taken to prepare the dataset.

Hide

AR by Ivo Fustos on behalf of the Authors (01 Apr 2026) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (17 Apr 2026) by Federica Fiorucci

ED: Publish as is (25 Jun 2026) by Gregor C. Leckebusch (Executive editor)

AR by Ivo Fustos on behalf of the Authors (26 Jun 2026) Manuscript

Journal article(s) based on this preprint

14 Jul 2026

Feature selection for landslide forecasting models in Southern Andes

Manuel Labbé, Millaray Curilem, Ivo Fustos-Toribio, and Mario Pooley

Nat. Hazards Earth Syst. Sci., 26, 3253–3272, https://doi.org/10.5194/nhess-26-3253-2026,https://doi.org/10.5194/nhess-26-3253-2026, 2026

Short summary

Manuel Labbe, Millaray Curilem, Ivo Fustos-Toribio, and Mario Pooley

Viewed

Total article views: 8,253 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
5,835	2,161	257	8,253	210	268

HTML: 5,835
PDF: 2,161
XML: 257
Total: 8,253
BibTeX: 210
EndNote: 268

Views and downloads (calculated since 30 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	125	5	0	130
Jul 2025	410	163	40	613
Aug 2025	1,620	184	15	1,819
Sep 2025	2,657	115	22	2,794
Oct 2025	160	165	15	340
Nov 2025	195	373	15	583
Dec 2025	135	298	35	468
Jan 2026	140	332	85	557
Feb 2026	116	118	9	243
Mar 2026	185	241	14	440
Apr 2026	34	72	2	108
May 2026	34	76	3	113
Jun 2026	5	11	0	16
Jul 2026	19	8	2	29

Cumulative views and downloads (calculated since 30 Jun 2025)

Month	HTML	PDF	XML	Total
Jun 2025	125	5	0	130
Jul 2025	410	163	40	613
Aug 2025	1,620	184	15	1,819
Sep 2025	2,657	115	22	2,794
Oct 2025	160	165	15	340
Nov 2025	195	373	15	583
Dec 2025	135	298	35	468
Jan 2026	140	332	85	557
Feb 2026	116	118	9	243
Mar 2026	185	241	14	440
Apr 2026	34	72	2	108
May 2026	34	76	3	113
Jun 2026	5	11	0	16
Jul 2026	19	8	2	29

Viewed (geographical distribution)

Total article views: 8,215 (including HTML, PDF, and XML) Thereof 8,215 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 27 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (11522 KB)
Metadata XML

Short summary

We investigated methods to improve the prediction of landslides triggered by heavy rainfall in southern Chile, utilising local soil and climate data. We tested different models and selected the most critical environmental factors. We improved the process for making forecasts in areas with limited monitoring. Our results help create faster and more reliable warnings and can guide safety planning in other mountain regions facing similar risks.

Feature Selection for Landslide Forecasting Models in Southern Andes

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Suggestions for revision or reasons for rejection

Journal article(s) based on this preprint

Viewed

Viewed (geographical distribution)


Total:	0
HTML:	0
PDF:	0
XML:	0