the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Rapid Flood Mapping from Aerial Imagery Using Fine-Tuned SAM and ResNet-Backboned U-Net
Abstract. Flooding is a major natural hazard that requires a rapid response to minimize the loss of life and property and to facilitate damage assessment. Aerial imagery, especially images from unmanned aerial vehicles (UAVs) and helicopters, plays a crucial role in identifying areas affected by flooding. Therefore, developing an efficient model for rapid flood mapping is essential. In this study, we present two segmentation approaches for the mapping of flood-affected areas: (1) a fine-tuned Segment Anything Model (SAM), comparing the performance of point prompts versus bounding box (Bbox) prompts, and (2) a U-Net model with ResNet-50 and ResNet-101 as pre-trained backbones. Our results showed that the fine-tuned SAM performed best in segmenting floods with point prompts (Accuracy: 0.96, IoU: 0.90), while Bbox prompts led to a significant drop (Accuracy: 0.82, IoU: 0.67). This is because flood images often cover the image from edge to edge, making Bbox prompts less effective at capturing boundary details. For the U-Net model, the ResNet-50 backbone yielded an accuracy of 0.87 and an IoU of 0.72. Performance improved slightly with the ResNet-101 backbone, achieving an accuracy of 0.88 and an IoU of 0.74. This improvement can be attributed to the deeper architecture of ResNet-101, which allows more complex and detailed features to be extracted, improving U-Net’s ability to segment flood-affected areas accurately. The results of this study will help emergency response teams identify flood-affected areas more quickly and accurately. In addition, these models could serve as valuable tools for insurance companies when assessing damage. Moreover, the segmented flood images generated by these models can serve as training data for other machine learning models, creating a pipeline for more advanced flood analysis and prediction systems.
- Preprint
(2015 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 28 Oct 2025)
-
RC1: 'Comment on egusphere-2025-3146', Saham Mirzaei, 03 Sep 2025
reply
-
AC1: 'Reply on RC1', Hadi Shokati, 09 Sep 2025
reply
Dear Editor and Reviewer,
We would like to express our sincere gratitude for your time and thoughtful comments on our manuscript, " Rapid Flood Mapping from Aerial Imagery Using Fine-Tuned SAM and ResNet-Backboned U-Net." Your insightful feedback has been extremely valuable in helping us improve the clarity, strength, and overall quality of our work.
We have carefully considered all suggestions and addressed them point-by-point in the revised manuscript. For your reference, we have highlighted our responses to your comments in green. We believe these revisions have significantly strengthened the manuscript and we are confident that it is now ready for further consideration.
Thank you again for your valuable contribution to this process. We look forward to your feedback on the revised manuscript.
-
AC1: 'Reply on RC1', Hadi Shokati, 09 Sep 2025
reply
-
RC2: 'Comment on egusphere-2025-3146', Saham Mirzaei, 09 Sep 2025
reply
Upon re-reading the manuscript, I noticed that in lines 200–203 you mention the use of various data augmentation techniques. Could you please clarify the probability settings assigned to each augmentation method?
In lines 201–203, it is not clear whether the augmentation was applied exclusively to the training dataset. Providing this clarification would enhance the transparency of the methodology.
Still in lines 201–203, it would be highly valuable to explicitly include details regarding the number of images before and after data augmentation, as well as their distribution across the training, validation, and test sets. Such information is critical to ensure reproducibility.
In lines 209–219, you mention the use of both Dice Loss and Cross-Entropy Loss. Could you please specify how these two loss functions were combined? For example, were they summed, averaged, or weighted differently?
I appreciate that the code is publicly available on GitHub. However, I could not locate the corresponding datasets in the repository. Based on the README file, it seems that the authors expect users to obtain the data from an external source. While this is acceptable provided that the source remains reliably available, hosting a copy of the datasets within your GitHub repository would be preferable for long-term accessibility.Citation: https://doi.org/10.5194/egusphere-2025-3146-RC2 -
AC2: 'Reply on RC2', Hadi Shokati, 11 Sep 2025
reply
Dear Editor and Reviewer,
We would like to express our sincere gratitude again for your time and thoughtful comments on our manuscript, " Rapid Flood Mapping from Aerial Imagery Using Fine-Tuned SAM and ResNet-Backboned U-Net." Your insightful feedback has been extremely valuable in helping us improve the clarity, strength, and overall quality of our work.
We have carefully considered all suggestions and addressed them point-by-point in the revised manuscript. For your reference, we have highlighted our responses to your comments in green. We believe these revisions have significantly strengthened the manuscript and we are confident that it is now ready for further consideration.
Thank you again for your valuable contribution to this process. We look forward to your feedback on the revised manuscript.
-
AC2: 'Reply on RC2', Hadi Shokati, 11 Sep 2025
reply
-
CC1: 'Comment on egusphere-2025-3146', Armin Moghimi, 17 Sep 2025
reply
Dear authors,
I read the paper, and I see the method used by the authors, and also the code is derived from the following paper and GitHub:
ArminMoghimi/Fine-tune-the-Segment-Anything-Model-SAM-: A. Moghimi, M. Welzel, T. Celik, and T. Schlurmann, "A Comparative Performance Analysis of Popular Deep Learning Models and Segment Anything Model (SAM) for River Water Segmentation in Close-Range Remote Sensing Imagery,"
https://github.com/ArminMoghimi/Fine-tune-the-Segment-Anything-Model-SAM-
https://doi.org/10.1109/ACCESS.2024.3385425
However, I couldn't see this reference in the reference list, which it should be.
Citation: https://doi.org/10.5194/egusphere-2025-3146-CC1 -
AC3: 'Reply on CC1', Hadi Shokati, 19 Sep 2025
reply
Dear Armin,
We appreciate your feedback. After a thorough review, we did not find any similarities between our code and the one you mentioned. Our implementation originates from our previous article published in CATENA, where we focused on segmenting erosion and deposition. That codebase is the result of years of experience and numerous meetings with domain experts. We therefore respectfully clarify that our implementation is fully independent and not derived from the repository you mentioned. We highly recommend reading the article linked below for a better understanding of the foundations of our approach.We were not aware of your article and code before, but we appreciate your work — it is a valuable contribution to the field. If we had seen it earlier, we would have certainly benefited from it. In the academic community, the shared goal is to advance the state of the art, and we are no exception. To support this goal, we include a comparison with your results in the results section of our revised manuscript, which we are confident will further enhance the quality of the paper.
Thank you again for your valuable feedback.
-
AC4: 'Reply on AC3', Hadi Shokati, 19 Sep 2025
reply
Please see the link below:
GitHub: https://github.com/hadi1994shokati/Flood-segmentationPaper in Catena: https://www.sciencedirect.com/science/article/pii/S0341816225002565
-
CC2: 'Reply on AC3', Armin Moghimi, 19 Sep 2025
reply
Dear Hadi,
Thank you—please don’t get me wrong—you are one of our remote sensing community, and your work is good. I don’t want to dwell on the similarity aspect, as we've already explored comparisons between SAM and UNet50 (with ResNet backbone) in the context of water segmentation using close-range remote sensing images (from UAVs, smartphones, and handheld cameras, within a 1–300 meter range) (https://doi.org/10.1109/ACCESS.2024.3385425). However, I’d like to suggest that you consider referencing our results, as they align with your findings and could strengthen your discussion section. For instance, we observed that SAM performs very well in general. However, when segmenting images from the same area, UNet actually produced even better results. This nuance might enrich your discussion, especially when highlighting the practical performance differences between models.
Another point is related to the computational aspects: SAM typically operates on 1024×1024 patches, and when fine-tuning with a frozen ViT backbone, it still requires significant computational resources. The choice of ViT backbone also matters—ViT-H is quite heavy and not ideal for fine-tuning, whereas the smaller variants (like Tiny ViT and Medium ViT) tend to perform better with fewer resources.
Lastly, regarding datasets: One could argue that with SAM, we might not need large annotated datasets anymore. While SAM reduces the need for manual annotation, I would still say that datasets are necessary. The real question is: how many do we actually need? To help address this, you might consider referencing this recent paper by Professor Anette Eltner from Dresden University: https://doi.org/10.1080/01431161.2025.2457131
It discusses the sensitivity of model performance to dataset size and could help you frame this as a potential advantage of SAM over UNet.
I’d love to hear your thoughts and see how you might incorporate some of these ideas into your discussion.
Warm regards,
ArminCitation: https://doi.org/10.5194/egusphere-2025-3146-CC2 -
AC5: 'Reply on CC2', Hadi Shokati, 29 Sep 2025
reply
Dear Armin,
Thank you once again for your valuable feedback. We truly appreciate the time and effort you have taken to engage with our work.
In our work, we not only compared U-NET and SAM, but also evaluated two types of input prompts in SAM (points and bounding boxes) and two types of backbones for U-NET (ResNet-50 and ResNet-100). We agree that including a direct comparison with your findings will make our paper more comprehensive, and we will add this comparison to the results section of our revised manuscript.
Regarding the ViT backbone, we also evaluated multiple variants and observed that their effectiveness for flood-affected area segmentation was comparable. To optimize computational resources, we selected ViT-Base, as it provides a favorable balance between accuracy and efficiency.
We also appreciate your recommendation of Professor Anette Eltner’s recent article. It provides an excellent perspective on dataset size requirements. The number of labeled images needed depends on the complexity of the task. For instance, in our previous study on erosion and deposition segmentation (https://doi.org/10.1016/j.catena.2025.108954) we worked with about 400 labeled images and observed clear performance gains with increasing dataset size, an effect that was particularly relevant given the higher complexity of that task compared to flood mapping. This increased complexity was because eroded and non-eroded soil often have very similar visual characteristics, whereas flooded areas are usually more distinct from their surroundings.
Thank you again for your thoughtful suggestions. They will certainly help us improve the clarity and impact of our paper.
-
CC3: 'Reply on AC5', Armin Moghimi, 29 Sep 2025
reply
Perfect Hadi,
All the best, hope to see your published paper soon on the desk.
All the best,
Armin
Citation: https://doi.org/10.5194/egusphere-2025-3146-CC3
-
CC3: 'Reply on AC5', Armin Moghimi, 29 Sep 2025
reply
-
AC5: 'Reply on CC2', Hadi Shokati, 29 Sep 2025
reply
-
AC4: 'Reply on AC3', Hadi Shokati, 19 Sep 2025
reply
-
AC3: 'Reply on CC1', Hadi Shokati, 19 Sep 2025
reply
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,312 | 48 | 24 | 1,384 | 27 | 33 |
- HTML: 1,312
- PDF: 48
- XML: 24
- Total: 1,384
- BibTeX: 27
- EndNote: 33
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
The research is well designed and written. It contributes to the development of a strong and user-friendly AI tool that can provide quick and effective support in flood-affected areas where urgent assistance is needed, without requiring harmonized or standardized procedures for image collection from different sources. As a limitation of the research, I believe it would be valuable to suggest including the geolocation of the final flood map to facilitate relief efforts. Furthermore, the reasons behind the superiority of SAM-Points should be discussed. Compared to other methods, this approach appears to be more effective in distinguishing bare soil from flooded areas.