Preprints
https://doi.org/10.5194/egusphere-2023-1417
https://doi.org/10.5194/egusphere-2023-1417
10 Aug 2023
 | 10 Aug 2023

Addressing Class Imbalance in Soil Movement Predictions

Praveen Kumar, Priyanka Priyanka, Kala Venkata Uday, and Varun Dutt

Abstract. Landslides threaten human life and infrastructure, resulting in fatalities and economic losses. Monitoring stations provide valuable data for predicting soil movement, which is crucial in mitigating this threat. Accurately predicting soil movement from monitoring data is challenging due to its complexity and inherent class imbalance. This study proposes developing machine learning (ML) models with oversampling techniques to address the class imbalance issue and develop a robust soil movement prediction system. The dataset, comprising two years (2019–2021) of monitoring data from a landslide in Uttarakhand, was split into a 70:30 ratio for training and testing. To tackle the class imbalance problem, various oversampling techniques, including Synthetic Minority Oversampling Technique (SMOTE), K-Means SMOTE, Borderline SMOTE, Support Vector Machine SMOTE, and Adaptive SMOTE (ADASYN), were applied to the dataset. Several ML models, namely Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (Light GBM), Adaptive Boosting (AdaBoost), Category Boosting (CatBoost), Long Short-Term Memory (LSTM), Multilayer Perceptron (MLP), and dynamic ensemble models, were trained and compared for soil movement prediction. Among these models, the dynamic ensemble model with K-Means SMOTE performed the best in testing, with an accuracy, precision, and recall rate of 99.68 % each and an F1-score of 0.9968. The RF model with K-Means SMOTE stood out as the second-best performer, achieving an impressive accuracy, precision, and recall rate of 99.64 % each and an F1-score of 0.9964. These results show that ML models with class imbalance techniques have the potential to significantly improve soil movement predictions in landslide-prone areas.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Praveen Kumar, Priyanka Priyanka, Kala Venkata Uday, and Varun Dutt

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-1417', Anonymous Referee #1, 19 Dec 2023
    • AC1: 'Reply on RC1', Praveen Kumar, 04 Jan 2024
      • RC2: 'Reply on AC1', Anonymous Referee #1, 22 Jan 2024
        • AC2: 'Reply on RC2', Praveen Kumar, 22 Jan 2024
  • RC3: 'Comment on egusphere-2023-1417', Anonymous Referee #2, 28 Feb 2024
    • AC3: 'Reply on RC3', Praveen Kumar, 18 Mar 2024

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2023-1417', Anonymous Referee #1, 19 Dec 2023
    • AC1: 'Reply on RC1', Praveen Kumar, 04 Jan 2024
      • RC2: 'Reply on AC1', Anonymous Referee #1, 22 Jan 2024
        • AC2: 'Reply on RC2', Praveen Kumar, 22 Jan 2024
  • RC3: 'Comment on egusphere-2023-1417', Anonymous Referee #2, 28 Feb 2024
    • AC3: 'Reply on RC3', Praveen Kumar, 18 Mar 2024
Praveen Kumar, Priyanka Priyanka, Kala Venkata Uday, and Varun Dutt
Praveen Kumar, Priyanka Priyanka, Kala Venkata Uday, and Varun Dutt

Viewed

Total article views: 396 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
252 112 32 396 20 17
  • HTML: 252
  • PDF: 112
  • XML: 32
  • Total: 396
  • BibTeX: 20
  • EndNote: 17
Views and downloads (calculated since 10 Aug 2023)
Cumulative views and downloads (calculated since 10 Aug 2023)

Viewed (geographical distribution)

Total article views: 401 (including HTML, PDF, and XML) Thereof 401 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 23 May 2024
Download
Short summary
Our study focuses on predicting soil movement to mitigate landslide risks. We develop machine learning models with oversampling techniques to address the class imbalance in monitoring data. The dynamic ensemble model with K-Means SMOTE achieves high accuracy (99.68 %), precision, recall, and F1-score, followed by RF with K-Means SMOTE. Our findings highlight the potential of these models to improve soil movement predictions in landslide-prone areas.