the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Enhanced Landslide Detection from Remote Sensing Imagery Using an Attention-Optimized UNet-CBAM
Abstract. Landslide deformation monitoring is crucial for disaster prevention and protecting infrastructure, ecosystems, and lives in vulnerable regions. Traditional methods, though useful, often lack the precision required for complex terrains, limiting their effectiveness in landslide-prone areas. This study presents the UNet-Convolutional Block Attention Module (CBAM) framework, which combines the UNet architecture with CBAM to enhance landslide detection and segmentation in remote sensing imagery. The integration of CBAM improves the model's ability to focus on spatially significant features, leading to more accurate and efficient extraction of landslide-related information. Experimental results demonstrate that the UNet-CBAM outperforms the baseline UNet by 10 % in performance over the UNet, with a notable improvement in the Area Under Curve (AUC) metric. The proposed model shows robustness in diverse and challenging landscapes, proving its effectiveness for landslide monitoring. This enhancement offers significant potential for improving early-warning systems, disaster preparedness, and risk management strategies in landslide-prone areas.
- Preprint
(2365 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2348', Anonymous Referee #1, 19 Aug 2025
The work on this paper is interesting but far from acceptable because of the major scientific challanges:
1. Section 2.4 describes the derivation of indices such as NDVI, NDMI, slope, elevation, GNDVI, brightness, and BSI, but the manuscript never makes clear why those variables are needed for the model. An ablation study must be conduted to understand if all those variables are needed. In my opinion just the high resolution RGB image is enough, but scientific test must be conducted
2. The figure 1 architecture of the model is confusing to say the least, it does not provide any information on how the model is developed and what does the numbers represent on top of conv blocks. Also, what is the modification/significant contribution from your work compared to existing UNet-CBAM architecture.
3. The comparison with other models is not sufficient, you must compare it with state of the art approaches and not 10 year old methods such as UNet. Please look at the works of Bhuyan and others to make a solid comparison.
4. If I look at the Figure 4, your model starts to overfit after epoch 45 and at epoch 45 the Unet and your approach have same validation accuracy. Please justify how you prevented overfitting and if what we see is not a result of overfit.
5. Based on the ROC curve, it is clear that the improvement on your approach is negligible (both 99%), but you claim 10% improvement. How so?
6. The study relies exclusively on Landslide4Sense-2022. While this is a curated and high-quality dataset, it is single-source and patch-based, which raises concerns about generalization. Without testing on an external dataset, or at least partitioning the training and validation by geographic region, it is not possible to demonstrate that the model generalizes beyond the specific data distribution provided by Landslide4Sense.
7. Overall figure quality is really bad, I do not assume the manuscript itself needs loss curves, please put them in supplement.
8. Given your problem is image segmentation and the data is unabalanced (as pixels without landslides are more than pixels with landsldies), IOU/F1 score is the most reliable metric. I see the best model has an IOU of 0.246; this is no-where sufficient. Also compare your IOU to others who have used the same dataset. Also, do not emphasize on accuracy on your disucssion but on IOU.
9. The paper also lacks details on model complexity and runtime. It notes that computational demand may hinder deployment, but does not quantify model size, number of parameters, training time, inference speed, or hardware requirements. Such information is critical for assessing the operational feasibility of deploying UNet-CBAM in real-world monitoring systems. Without it, the reader cannot evaluate whether the proposed method is practical beyond academic experimentation.
10. The explainability of the attention mechanism is missing. The CBAM module is designed to guide the network’s focus toward important features, yet the paper does not present any visualizations of attention maps or saliency overlays.
11. Reproducibility is another concern. The authors note that code and data are not publicly available, only upon request. For a methods-oriented paper, it is strongly encouraged to make at least trained model weights, preprocessing scripts, or example inference pipelines publicly accessible. Without such resources, independent validation and application of the model will be limited. In general DL papers without publicly accessible code, data, and weights to test independently should not be sent out for review and can never be accepted mainly because of the reproducibility concerns.
12. In general Figures are of low-quality and needs much better captions to explain the context of the figure itself.
Citation: https://doi.org/10.5194/egusphere-2025-2348-RC1 -
RC2: 'Comment on egusphere-2025-2348', Anonymous Referee #2, 29 Aug 2025
Manuscript ID: egusphere-2025-2348
Title: Enhanced Landslide Detection from Remote Sensing Imagery Using an Attention-Optimized UNet-CBAM
Overall Evaluation: This manuscript explores the use of a UNet-CBAM architecture for landslide detection from remote sensing imagery, applied to the Landslide4Sense-2022 dataset. The research problem is relevant and the attempt to enhance UNet with attention is reasonable. However, the manuscript has significant weaknesses in novelty, scientific justification, experimental rigor, interpretation, and reproducibility. The current version does not meet the standards required for publication in a leading geoscience journal.
Strengths:
- Addresses an important geohazard monitoring challenge with potential societal applications.
- Provides detailed descriptions of the CBAM module and integrates it into a UNet framework.
- Demonstrates incremental improvements over baseline models such as UNet, SegNet, and DP-FCN.
Major Concerns:
1. Input Features (Sec. 2.4)
Auxiliary indices (NDVI, NDMI, slope, elevation, GNDVI, brightness, BSI) are included without clear scientific justification.
No ablation study is presented to verify their contribution compared to RGB-only input.
2. Architecture and Novelty (Fig. 1)
The architecture diagram is unclear, with unexplained numbers above convolution blocks.
The proposed contribution appears incremental, with limited distinction from existing UNet-CBAM approaches.
3. Comparisons with State-of-the-Art
Comparisons are limited to older models (UNet, SegNet, DP-FCN).
Recent transformer-based or advanced attention-based models are not considered, weakening the claim of advancement.
4. Overfitting and Training Behavior (Fig. 4)
Training/validation curves suggest potential overfitting for UNet after ~45 epochs.
Although the authors argue CBAM reduces overfitting, there is no deeper discussion or mitigation strategy (e.g., early stopping, augmentation, regularization).
5. Performance Claims
The abstract claims a “10% improvement,” yet both UNet and UNet-CBAM achieve AUC ≈ 0.99.
Reported improvements are marginal, and claims are overstated.
6. Dataset Limitation
Reliance solely on Landslide4Sense-2022, a patch-based dataset, restricts generalization.
No geographic hold-out or external validation is performed, undermining claims of robustness.
7. Evaluation Metrics
Reported IoU for UNet-CBAM is only 0.246, which is very low for a segmentation task.
Accuracy is emphasized despite class imbalance; IoU and F1 should be prioritized.
No benchmarking is provided against other studies using the same dataset.
8. Operational Feasibility
Although computational demands are noted, no details on model complexity, parameter count, runtime, or inference speed are given.
Without this, practical deployability cannot be assessed.
9. Explainability of Attention
The CBAM module is included but not validated with attention maps or saliency visualizations.
Its true contribution to feature learning is unclear.
10. Reproducibility
Code and trained weights are “available on request” only.
For deep learning studies, public release of code, weights, or inference pipelines is essential for reproducibility.
11. Figures and Captions
Figures are low-resolution and captions lack sufficient explanatory detail.
Visual comparisons (e.g., Fig. 6, Fig. 9) are anecdotal and not systematically analyzed.
12. Methodological Focus
Sections 2.1–2.3 are overly descriptive, re-explaining CNN and UNet basics rather than focusing on novel contributions.
13. Geological Context
Input features (NDVI, slope, BSI, etc.) are mathematically defined but not linked to the physical processes of landslides.
For a geoscience journal, stronger integration of geological reasoning is needed.
14. Error and Failure Analysis
No systematic analysis of failure cases is presented (e.g., misclassification by terrain type, vegetation cover, or landslide type).
This limit understanding of the model’s practical strengths and weaknesses.
15. Discussion Section
The discussion primarily restates results without deeper interpretation.
Reasons for low IoU or marginal CBAM improvements are not critically analyzed.
16. Robustness Testing
The manuscript repeatedly claims the model is “robust,” but robustness is never tested (e.g., noisy data, reduced training data, cross-regional generalization).
Citation: https://doi.org/10.5194/egusphere-2025-2348-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
377 | 167 | 15 | 559 | 8 | 27 |
- HTML: 377
- PDF: 167
- XML: 15
- Total: 559
- BibTeX: 8
- EndNote: 27
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1