the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Enhanced Landslide Detection from Remote Sensing Imagery Using an Attention-Optimized UNet-CBAM
Abstract. Landslide deformation monitoring is crucial for disaster prevention and protecting infrastructure, ecosystems, and lives in vulnerable regions. Traditional methods, though useful, often lack the precision required for complex terrains, limiting their effectiveness in landslide-prone areas. This study presents the UNet-Convolutional Block Attention Module (CBAM) framework, which combines the UNet architecture with CBAM to enhance landslide detection and segmentation in remote sensing imagery. The integration of CBAM improves the model's ability to focus on spatially significant features, leading to more accurate and efficient extraction of landslide-related information. Experimental results demonstrate that the UNet-CBAM outperforms the baseline UNet by 10 % in performance over the UNet, with a notable improvement in the Area Under Curve (AUC) metric. The proposed model shows robustness in diverse and challenging landscapes, proving its effectiveness for landslide monitoring. This enhancement offers significant potential for improving early-warning systems, disaster preparedness, and risk management strategies in landslide-prone areas.
- Preprint
(2365 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2348', Anonymous Referee #1, 19 Aug 2025
-
AC1: 'Reply on RC1', Jing Wang, 14 Oct 2025
The work on this paper is interesting but far from acceptable because of the major scientific challanges:
1、Section 2.4 describes the derivation of indices such as NDVI, NDMI, slope, elevation, GNDVI, brightness, and BSI, but the manuscript never makes clear why those variables are needed for the model. An ablation study must be conduted to understand if all those variables are needed. In my opinion just the high resolution RGB image is enough, but scientific test must be conducted
Response: Thank you for your suggestion. We have added a comprehensive ablation study on (Page 15, Figure 7) to evaluate the contribution of different input features to the proposed UNet-CBAM model. Ten experimental settings were tested, including individual modalities such as RGB, spectral, terrain, NDVI, slope, elevation, GNDVI, brightness, and BSI, as well as their combinations. The results clearly demonstrate that the full 11-channel input configuration achieves the best overall performance, with F1 and mIoU values 9.3% and 12.1% higher than the next best configuration, respectively. This confirms that integrating all selected features significantly enhances the model segmentation accuracy and robustness compared with using only high-resolution RGB images.
2、The figure 1 architecture of the model is confusing to say the least, it does not provide any information on how the model is developed and what does the numbers represent on top of conv blocks. Also, what is the modification/significant contribution from your work compared to existing UNet-CBAM architecture.
Response: Thank you for your suggestion. We have added a detailed description of Figure 1 (Page 3, Line 65-71), clarifying the architecture of the proposed UNet-CBAM model. The numbers on top of the convolutional blocks now indicate the number of output channels for each layer. We also explicitly highlight the modifications introduced in this work compared with the standard UNet-CBAM architecture, including the integration of multisource inputs (RGB, spectral, and terrain features), additional encoder and decoder layers, dual auxiliary outputs for intermediate supervision, and enhanced attention mechanisms to improve feature extraction and segmentation accuracy in complex terrains. These enhancements constitute the significant contributions of our model.
3、The comparison with other models is not sufficient, you must compare it with state of the art approaches and not 10 year old methods such as UNet. Please look at the works of Bhuyan and others to make a solid comparison.
Response: Thank you for your suggestion. We have updated the comparison on Page 16, Line 355-357, replacing the older baseline models with more recent state-of-the-art methods, including DP-FCN, BBN-UNet, ASK-UNet++, LandslideSegNet, LandslideNet, and SegFormer. This provides a more rigorous and up-to-date evaluation of the proposed UNet-CBAM model’s performance against current advanced approaches in landslide detection.
4、If I look at the Figure 4, your model starts to overfit after epoch 45 and at epoch 45 the Unet and your approach have same validation accuracy. Please justify how you prevented overfitting and if what we see is not a result of overfit.
Response: Thank you for your suggestion. We have clarified on Page 11, Line 282-284 that to prevent overfitting, the training of the proposed UNet-CBAM model incorporates a learning rate scheduling strategy and an early stopping mechanism. The learning rate scheduler adjusts the learning rate dynamically to stabilize training, while early stopping terminates training when the validation performance ceases to improve, ensuring that the observed results are not due to overfitting.
5、Based on the ROC curve, it is clear that the improvement on your approach is negligible (both 99%), but you claim 10% improvement. How so?
Response: Thank you for your suggestion. We would like to clarify that the previously stated ‘10% improvement’ was a typographical error. The actual improvements in the revised manuscript are reflected in the F1 score and mIoU metrics: the proposed UNet-CBAM achieves approximately a 3.0% increase in F1 score and nearly a 5.0% increase in mIoU compared to the next best model, while the ROC AUC remains consistently high at 0.99 for both models. This clarification has been updated in the current version of the manuscript.
6、The study relies exclusively on Landslide4Sense-2022. While this is a curated and high-quality dataset, it is single-source and patch-based, which raises concerns about generalization. Without testing on an external dataset, or at least partitioning the training and validation by geographic region, it is not possible to demonstrate that the model generalizes beyond the specific data distribution provided by Landslide4Sense.
Response: Thank you for your suggestion. We appreciate your comment. Currently, no other publicly available datasets contain all the input elements used by our proposed model. To address generalization concerns, in our experiments we partitioned the Landslide4Sense-2022 dataset by splitting 1500 images for training and using the remaining images for validation. This setup ensures that the model is evaluated on unseen data, providing a preliminary assessment of its generalization capability within the dataset.
7、Overall figure quality is really bad, I do not assume the manuscript itself needs loss curves, please put them in supplement.
Response: Thank you for your suggestion. We have carefully updated all figures throughout the manuscript to significantly enhance their resolution and overall visual quality, ensuring that details are clearly visible and easily interpretable. These have been relocated to the supplementary material to avoid cluttering the main text while still providing full transparency of the model training process.
8、Given your problem is image segmentation and the data is unabalanced (as pixels without landslides are more than pixels with landsldies), IOU/F1 score is the most reliable metric. I see the best model has an IOU of 0.246; this is no-where sufficient. Also compare your IOU to others who have used the same dataset. Also, do not emphasize on accuracy on your disucssion but on IOU.
Response: Thank you for your suggestion. We appreciate your observation. The previous IOU statistics were incorrectly calculated on a per-image basis rather than across the dataset. In the revised manuscript, this has been corrected, and the IOU values now reflect a proper dataset-level evaluation, as shown in Table 1 on Page17. Accordingly, the discussion has been revised to emphasize IOU and F1 score as the primary performance metrics, rather than overall accuracy, providing a more reliable assessment of the model segmentation performance and facilitating comparison with other studies using the same Landslide4Sense dataset.
9、The paper also lacks details on model complexity and runtime. It notes that computational demand may hinder deployment, but does not quantify model size, number of parameters, training time, inference speed, or hardware requirements. Such information is critical for assessing the operational feasibility of deploying UNet-CBAM in real-world monitoring systems. Without it, the reader cannot evaluate whether the proposed method is practical beyond academic experimentation.
Response: Thank you for your suggestion. We have addressed your concern by adding quantitative details on model complexity and runtime. Specifically, Table 1 on Page 17 now includes FLOPs and inference time metrics, providing a clear picture of computational demand. Additionally, we have added a description of the training strategy and the hardware used, including GPU specifications, to clarify the feasibility and practical deployment considerations of UNet-CBAM in real-world monitoring systems (Page 11, Line 282-284).
10、The explainability of the attention mechanism is missing. The CBAM module is designed to guide the network’s focus toward important features, yet the paper does not present any visualizations of attention maps or saliency overlays.
Response: Thank you for your suggestion. We have addressed your concern regarding the explainability of the attention mechanism by adding visualizations of the model focus. Specifically, Figure 4 presents the Terrain Attention visualization, illustrating how the CBAM module emphasizes topographically relevant features, while Figure 5 shows the Grad-CAM visualization, highlighting the regions that most strongly influence the model segmentation decisions. These additions provide an intuitive understanding of how UNet-CBAM leverages attention to enhance feature localization and prediction accuracy.
11、Reproducibility is another concern. The authors note that code and data are not publicly available, only upon request. For a methods-oriented paper, it is strongly encouraged to make at least trained model weights, preprocessing scripts, or example inference pipelines publicly accessible. Without such resources, independent validation and application of the model will be limited. In general DL papers without publicly accessible code, data, and weights to test independently should not be sent out for review and can never be accepted mainly because of the reproducibility concerns.
Response: Thank you for your suggestion. We have now ensured full reproducibility of our study by providing public access to both the dataset and the complete code. The Landslide4Sense-2022 dataset, which underpins all experiments in this work, can be downloaded from https://pan.baidu.com/s/1vUhSJGMvD0OsnAbUTx5TBg?pwd=xkqd
. Additionally, the full codebase, including preprocessing scripts, model training routines, and example inference pipelines, is available at https://github.com/xoxofreja/NHESS-code
. By providing these resources, we enable independent validation, reproduction of our results, and practical application of the proposed UNet-CBAM framework, addressing concerns regarding reproducibility and facilitating further research in landslide segmentation using multisource remote sensing imagery.
12、In general Figures are of low-quality and needs much better captions to explain the context of the figure itself.
Response: Thank you for your suggestion. We have updated all figures in the manuscript, enhancing their resolution and overall visual quality. In addition, figure captions have been revised to provide clearer and more informative descriptions, explicitly explaining the context, content, and significance of each figure to ensure they are fully interpretable and self-contained for the reader.
Citation: https://doi.org/10.5194/egusphere-2025-2348-AC1
-
AC1: 'Reply on RC1', Jing Wang, 14 Oct 2025
-
RC2: 'Comment on egusphere-2025-2348', Anonymous Referee #2, 29 Aug 2025
Manuscript ID: egusphere-2025-2348
Title: Enhanced Landslide Detection from Remote Sensing Imagery Using an Attention-Optimized UNet-CBAM
Overall Evaluation: This manuscript explores the use of a UNet-CBAM architecture for landslide detection from remote sensing imagery, applied to the Landslide4Sense-2022 dataset. The research problem is relevant and the attempt to enhance UNet with attention is reasonable. However, the manuscript has significant weaknesses in novelty, scientific justification, experimental rigor, interpretation, and reproducibility. The current version does not meet the standards required for publication in a leading geoscience journal.
Strengths:
- Addresses an important geohazard monitoring challenge with potential societal applications.
- Provides detailed descriptions of the CBAM module and integrates it into a UNet framework.
- Demonstrates incremental improvements over baseline models such as UNet, SegNet, and DP-FCN.
Major Concerns:
1. Input Features (Sec. 2.4)
Auxiliary indices (NDVI, NDMI, slope, elevation, GNDVI, brightness, BSI) are included without clear scientific justification.
No ablation study is presented to verify their contribution compared to RGB-only input.
2. Architecture and Novelty (Fig. 1)
The architecture diagram is unclear, with unexplained numbers above convolution blocks.
The proposed contribution appears incremental, with limited distinction from existing UNet-CBAM approaches.
3. Comparisons with State-of-the-Art
Comparisons are limited to older models (UNet, SegNet, DP-FCN).
Recent transformer-based or advanced attention-based models are not considered, weakening the claim of advancement.
4. Overfitting and Training Behavior (Fig. 4)
Training/validation curves suggest potential overfitting for UNet after ~45 epochs.
Although the authors argue CBAM reduces overfitting, there is no deeper discussion or mitigation strategy (e.g., early stopping, augmentation, regularization).
5. Performance Claims
The abstract claims a “10% improvement,” yet both UNet and UNet-CBAM achieve AUC ≈ 0.99.
Reported improvements are marginal, and claims are overstated.
6. Dataset Limitation
Reliance solely on Landslide4Sense-2022, a patch-based dataset, restricts generalization.
No geographic hold-out or external validation is performed, undermining claims of robustness.
7. Evaluation Metrics
Reported IoU for UNet-CBAM is only 0.246, which is very low for a segmentation task.
Accuracy is emphasized despite class imbalance; IoU and F1 should be prioritized.
No benchmarking is provided against other studies using the same dataset.
8. Operational Feasibility
Although computational demands are noted, no details on model complexity, parameter count, runtime, or inference speed are given.
Without this, practical deployability cannot be assessed.
9. Explainability of Attention
The CBAM module is included but not validated with attention maps or saliency visualizations.
Its true contribution to feature learning is unclear.
10. Reproducibility
Code and trained weights are “available on request” only.
For deep learning studies, public release of code, weights, or inference pipelines is essential for reproducibility.
11. Figures and Captions
Figures are low-resolution and captions lack sufficient explanatory detail.
Visual comparisons (e.g., Fig. 6, Fig. 9) are anecdotal and not systematically analyzed.
12. Methodological Focus
Sections 2.1–2.3 are overly descriptive, re-explaining CNN and UNet basics rather than focusing on novel contributions.
13. Geological Context
Input features (NDVI, slope, BSI, etc.) are mathematically defined but not linked to the physical processes of landslides.
For a geoscience journal, stronger integration of geological reasoning is needed.
14. Error and Failure Analysis
No systematic analysis of failure cases is presented (e.g., misclassification by terrain type, vegetation cover, or landslide type).
This limit understanding of the model’s practical strengths and weaknesses.
15. Discussion Section
The discussion primarily restates results without deeper interpretation.
Reasons for low IoU or marginal CBAM improvements are not critically analyzed.
16. Robustness Testing
The manuscript repeatedly claims the model is “robust,” but robustness is never tested (e.g., noisy data, reduced training data, cross-regional generalization).
Citation: https://doi.org/10.5194/egusphere-2025-2348-RC2 -
AC2: 'Reply on RC2', Jing Wang, 14 Oct 2025
- Auxiliary indices (NDVI, NDMI, slope, elevation, GNDVI, brightness, BSI) are included without clear scientific justification. No ablation study is presented to verify their contribution compared to RGB-only input.
Response: Thank you for your suggestion. We have added a comprehensive ablation study on (Page15, Figure 7) to evaluate the contribution of different input features to the proposed UNet-CBAM model. Ten experimental settings were tested, including individual modalities such as RGB, spectral, terrain, NDVI, slope, elevation, GNDVI, brightness, and BSI, as well as their combinations. The results clearly demonstrate that the full 11-channel input configuration achieves the best overall performance, with F1 and mIoU values 9.3% and 12.1% higher than the next best configuration, respectively. This confirms that integrating all selected features significantly enhances the model’s segmentation accuracy and robustness compared with using only high-resolution RGB images.
- Architecture and Novelty (Fig. 1)The architecture diagram is unclear, with unexplained numbers above convolution blocks. The proposed contribution appears incremental, with limited distinction from existing UNet-CBAM approaches.
Response: Thank you for your suggestion. The proposed UNet-CBAM model introduces a novel integration of multi-scale spectral and terrain feature enhancement with both channel and spatial attention mechanisms, enabling highly precise and robust landslide segmentation in complex remote sensing imagery. We have added a detailed description of Figure 1 (Page 3, Line 65-71), clearly illustrating the architecture of the proposed model. The numbers above the convolutional blocks now indicate the number of output channels for each layer. We explicitly highlight the key innovations compared with the standard UNet-CBAM, including the fusion of multisource inputs (RGB, spectral, and terrain features), additional encoder and decoder layers to capture richer hierarchical features, dual auxiliary outputs for intermediate supervision, and enhanced attention mechanisms to guide feature extraction more effectively. These enhancements collectively constitute the major contributions of our work, significantly improving segmentation accuracy and robustness in challenging terrains.
- Comparisons with State-of-the-Art
Comparisons are limited to older models (UNet, SegNet, DP-FCN). Recent transformer-based or advanced attention-based models are not considered, weakening the claim of advancement.
Response: Thank you for your suggestion. We have updated the comparison on Page 15, Line 355-357, replacing the older baseline models with more recent state-of-the-art methods, including DP-FCN, BBN-UNet, ASK-UNet++, LandslideSegNet, LandslideNet, and SegFormer. This provides a more rigorous and up-to-date evaluation of the proposed UNet-CBAM performance against current advanced approaches in landslide detection.
- Overfitting and Training Behavior (Fig. 4)
Training/validation curves suggest potential overfitting for UNet after ~45 epochs.
Although the authors argue CBAM reduces overfitting, there is no deeper discussion or mitigation strategy (e.g., early stopping, augmentation, regularization).
Response: Thank you for your suggestion. We have clarified on Page 11, Line 282-284 that to prevent overfitting, the training of the proposed UNet-CBAM model incorporates a learning rate scheduling strategy and an early stopping mechanism. The learning rate scheduler adjusts the learning rate dynamically to stabilize training, while early stopping terminates training when the validation performance ceases to improve, ensuring that the observed results are not due to overfitting.
- Performance Claims
The abstract claims a “10% improvement,” yet both UNet and UNet-CBAM achieve AUC ≈ 0.99. Reported improvements are marginal, and claims are overstated.
Response: Thank you for your suggestion. We would like to clarify that the previously stated ‘10% improvement’ was a typographical error. The actual improvements in the revised manuscript are reflected in the F1 score and mIoU metrics: the proposed UNet-CBAM achieves approximately a 3.0% increase in F1 score and nearly a 5.0% increase in mIoU compared to the next best model, while the ROC AUC remains consistently high at 0.99 for both models. This clarification has been updated in the current version of the manuscript.
- Dataset Limitation
Reliance solely on Landslide4Sense-2022, a patch-based dataset, restricts generalization.
No geographic hold-out or external validation is performed, undermining claims of robustness.
Response: Thank you for your suggestion. We appreciate your comment. Currently, no other publicly available datasets contain all the input elements used by our proposed model. To address generalization concerns, in our experiments we partitioned the Landslide4Sense-2022 dataset by splitting 1500 images for training and using the remaining images for validation. This setup ensures that the model is evaluated on unseen data, providing a preliminary assessment of its generalization capability within the dataset.
- Evaluation Metrics
Reported IoU for UNet-CBAM is only 0.246, which is very low for a segmentation task.
Accuracy is emphasized despite class imbalance; IoU and F1 should be prioritized.
No benchmarking is provided against other studies using the same dataset.
Response: Thank you for your suggestion. We appreciate your observation. The previous IOU statistics were incorrectly calculated on a per-image basis rather than across the dataset. In the revised manuscript, this has been corrected, and the IOU values now reflect a proper dataset-level evaluation, as shown in Table 1 on Page17. Accordingly, the discussion has been revised to emphasize IOU and F1 score as the primary performance metrics, rather than overall accuracy, providing a more reliable assessment of the model segmentation performance and facilitating comparison with other studies using the same Landslide4Sense dataset.
- Operational Feasibility
Although computational demands are noted, no details on model complexity, parameter count, runtime, or inference speed are given.
Without this, practical deployability cannot be assessed.
Response: Thank you for your suggestion. We have addressed your concern by adding quantitative details on model complexity and runtime. Specifically, Table 1 on Page 17 now includes FLOPs and inference time metrics, providing a clear picture of computational demand. Additionally, on Page 11, Line 282-284, we have added a description of the training strategy and the hardware used, including GPU specifications, to clarify the feasibility and practical deployment considerations of UNet-CBAM in real-world monitoring systems.
- Explainability of Attention
The CBAM module is included but not validated with attention maps or saliency visualizations. Its true contribution to feature learning is unclear.
Response: Thank you for your suggestion. We have addressed your concern regarding the explainability of the attention mechanism by adding visualizations of the model’s focus. Specifically, Figure 4 ( Page 13) presents the Terrain Attention visualization, illustrating how the CBAM module emphasizes topographically relevant features, while Figure 5 (Page 14) shows the Grad-CAM visualization, highlighting the regions that most strongly influence the model’s segmentation decisions. These additions provide an intuitive understanding of how UNet-CBAM leverages attention to enhance feature localization and prediction accuracy.
- Reproducibility
Code and trained weights are “available on request” only. For deep learning studies, public release of code, weights, or inference pipelines is essential for reproducibility.
Response: Thank you for your suggestion. We have now ensured full reproducibility of our study by providing public access to both the dataset and the complete code. The Landslide4Sense-2022 dataset, which underpins all experiments in this work, can be downloaded from https://pan.baidu.com/s/1vUhSJGMvD0OsnAbUTx5TBg?pwd=xkqd
. Additionally, the full codebase, including preprocessing scripts, model training routines, and example inference pipelines, is available at https://github.com/xoxofreja/NHESS-code
. By providing these resources, we enable independent validation, reproduction of our results, and practical application of the proposed UNet-CBAM framework, addressing concerns regarding reproducibility and facilitating further research in landslide segmentation using multisource remote sensing imagery.
- Figures and Captions
Figures are low-resolution and captions lack sufficient explanatory detail.
Visual comparisons (e.g., Fig. 6, Fig. 9) are anecdotal and not systematically analyzed.
Response: Thank you for your suggestion. We have updated all figures in the manuscript, enhancing their resolution and overall visual quality. In addition, figure captions have been revised to provide clearer and more informative descriptions, explicitly explaining the context, content, and significance of each figure to ensure they are fully interpretable and self-contained for the reader.
- Methodological Focus
Sections 2.1–2.3 are overly descriptive, re-explaining CNN and UNet basics rather than focusing on novel contributions.
Response: Thank you for your suggestion. We included the descriptions in Sections 2.1–2.3 to ensure that readers with a more general background can follow the methodology and implementation details. These sections provide the necessary foundational context, which supports the understanding of the novel contributions of our work, and are therefore retained for completeness and accessibility.
- Geological Context
Input features (NDVI, slope, BSI, etc.) are mathematically defined but not linked to the physical processes of landslides. For a geoscience journal, stronger integration of geological reasoning is needed.
Response: Thank you for your suggestion. We have now strengthened the geoscientific rationale for each input feature in the manuscript (Page 11, Line 287-295). Specifically, the RGB image provides a true-color representation, capturing visible land cover variations; NDVI highlights vegetation density and health, distinguishing dense green areas from barren zones; the Slope map reflects terrain gradients, identifying steep versus gentle slopes critical in landslide-prone regions; Elevation data reveals topographic relief affecting landslide dynamics; NDMI indicates vegetation and soil moisture levels; GNDVI refines analysis of vegetation chlorophyll content; Brightness Index differentiates light and dark surfaces; and BSI emphasizes bare soil and non-vegetated areas. Collectively, these multisource features capture terrain, vegetation, moisture, and surface characteristics, offering a physically informed basis for landslide detection and analysis.
- Error and Failure Analysis
No systematic analysis of failure cases is presented (e.g., misclassification by terrain type, vegetation cover, or landslide type). This limit understanding of the model’s practical strengths and weaknesses.
Response: Thank you for your suggestion. We have addressed this concern by adding a detailed analysis of failure cases in Figure 10 on Page 19, where a confusion matrix is presented to systematically compare misclassifications across different terrain types, vegetation covers, and landslide classes, thereby providing clearer insight into the model practical strengths and limitations.
- Discussion Section
The discussion primarily restates results without deeper interpretation. Reasons for low IoU or marginal CBAM improvements are not critically analyzed.
Response: Thank you for your suggestion. The previously reported IoU values were calculated on individual sample images, which underestimated the overall segmentation performance. In the revised manuscript, we have corrected this by computing the mean IoU across the entire dataset, providing a more accurate and representative assessment of model performance.
- Robustness Testing
The manuscript repeatedly claims the model is “robust,” but robustness is never tested (e.g., noisy data, reduced training data, cross-regional generalization).
Response: Thank you for your suggestion. We have added a robustness evaluation experiment in Figure 6 (page 14) to address this issue. The revised manuscript now includes tests under multiple perturbation scenarios, such as adversarial attacks, Gaussian noise, salt-and-pepper noise, and variations in brightness and contrast, to assess the model’s stability under challenging input conditions. The results show that UNet-CBAM maintains consistently high accuracy (ranging from 0.9530 to 0.9720) across all test cases, demonstrating strong robustness and resilience against noise and illumination variations. This new experiment provides quantitative evidence supporting the robustness claims made in the paper.
Citation: https://doi.org/10.5194/egusphere-2025-2348-AC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 722 | 251 | 22 | 995 | 13 | 31 |
- HTML: 722
- PDF: 251
- XML: 22
- Total: 995
- BibTeX: 13
- EndNote: 31
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The work on this paper is interesting but far from acceptable because of the major scientific challanges:
1. Section 2.4 describes the derivation of indices such as NDVI, NDMI, slope, elevation, GNDVI, brightness, and BSI, but the manuscript never makes clear why those variables are needed for the model. An ablation study must be conduted to understand if all those variables are needed. In my opinion just the high resolution RGB image is enough, but scientific test must be conducted
2. The figure 1 architecture of the model is confusing to say the least, it does not provide any information on how the model is developed and what does the numbers represent on top of conv blocks. Also, what is the modification/significant contribution from your work compared to existing UNet-CBAM architecture.
3. The comparison with other models is not sufficient, you must compare it with state of the art approaches and not 10 year old methods such as UNet. Please look at the works of Bhuyan and others to make a solid comparison.
4. If I look at the Figure 4, your model starts to overfit after epoch 45 and at epoch 45 the Unet and your approach have same validation accuracy. Please justify how you prevented overfitting and if what we see is not a result of overfit.
5. Based on the ROC curve, it is clear that the improvement on your approach is negligible (both 99%), but you claim 10% improvement. How so?
6. The study relies exclusively on Landslide4Sense-2022. While this is a curated and high-quality dataset, it is single-source and patch-based, which raises concerns about generalization. Without testing on an external dataset, or at least partitioning the training and validation by geographic region, it is not possible to demonstrate that the model generalizes beyond the specific data distribution provided by Landslide4Sense.
7. Overall figure quality is really bad, I do not assume the manuscript itself needs loss curves, please put them in supplement.
8. Given your problem is image segmentation and the data is unabalanced (as pixels without landslides are more than pixels with landsldies), IOU/F1 score is the most reliable metric. I see the best model has an IOU of 0.246; this is no-where sufficient. Also compare your IOU to others who have used the same dataset. Also, do not emphasize on accuracy on your disucssion but on IOU.
9. The paper also lacks details on model complexity and runtime. It notes that computational demand may hinder deployment, but does not quantify model size, number of parameters, training time, inference speed, or hardware requirements. Such information is critical for assessing the operational feasibility of deploying UNet-CBAM in real-world monitoring systems. Without it, the reader cannot evaluate whether the proposed method is practical beyond academic experimentation.
10. The explainability of the attention mechanism is missing. The CBAM module is designed to guide the network’s focus toward important features, yet the paper does not present any visualizations of attention maps or saliency overlays.
11. Reproducibility is another concern. The authors note that code and data are not publicly available, only upon request. For a methods-oriented paper, it is strongly encouraged to make at least trained model weights, preprocessing scripts, or example inference pipelines publicly accessible. Without such resources, independent validation and application of the model will be limited. In general DL papers without publicly accessible code, data, and weights to test independently should not be sent out for review and can never be accepted mainly because of the reproducibility concerns.
12. In general Figures are of low-quality and needs much better captions to explain the context of the figure itself.