Rapid Flood Mapping from Aerial Imagery Using Fine-Tuned SAM and ResNet-Backboned U-Net

Shokati, Hadi; Seufferheld, Kay D.; Fiener, Peter; Scholten, Thomas

doi:10.5194/egusphere-2025-3146

Preprints

https://doi.org/10.5194/egusphere-2025-3146

Preprints

08 Aug 2025

| 08 Aug 2025

Rapid Flood Mapping from Aerial Imagery Using Fine-Tuned SAM and ResNet-Backboned U-Net

Hadi Shokati, Kay D. Seufferheld, Peter Fiener, and Thomas Scholten

Abstract. Flooding is a major natural hazard that requires a rapid response to minimize the loss of life and property and to facilitate damage assessment. Aerial imagery, especially images from unmanned aerial vehicles (UAVs) and helicopters, plays a crucial role in identifying areas affected by flooding. Therefore, developing an efficient model for rapid flood mapping is essential. In this study, we present two segmentation approaches for the mapping of flood-affected areas: (1) a fine-tuned Segment Anything Model (SAM), comparing the performance of point prompts versus bounding box (Bbox) prompts, and (2) a U-Net model with ResNet-50 and ResNet-101 as pre-trained backbones. Our results showed that the fine-tuned SAM performed best in segmenting floods with point prompts (Accuracy: 0.96, IoU: 0.90), while Bbox prompts led to a significant drop (Accuracy: 0.82, IoU: 0.67). This is because flood images often cover the image from edge to edge, making Bbox prompts less effective at capturing boundary details. For the U-Net model, the ResNet-50 backbone yielded an accuracy of 0.87 and an IoU of 0.72. Performance improved slightly with the ResNet-101 backbone, achieving an accuracy of 0.88 and an IoU of 0.74. This improvement can be attributed to the deeper architecture of ResNet-101, which allows more complex and detailed features to be extracted, improving U-Net’s ability to segment flood-affected areas accurately. The results of this study will help emergency response teams identify flood-affected areas more quickly and accurately. In addition, these models could serve as valuable tools for insurance companies when assessing damage. Moreover, the segmented flood images generated by these models can serve as training data for other machine learning models, creating a pipeline for more advanced flood analysis and prediction systems.

Received: 01 Jul 2025 – Discussion started: 08 Aug 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Hadi Shokati, Kay D. Seufferheld, Peter Fiener, and Thomas Scholten

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-3146', Saham Mirzaei, 03 Sep 2025

The research is well designed and written. It contributes to the development of a strong and user-friendly AI tool that can provide quick and effective support in flood-affected areas where urgent assistance is needed, without requiring harmonized or standardized procedures for image collection from different sources. As a limitation of the research, I believe it would be valuable to suggest including the geolocation of the final flood map to facilitate relief efforts. Furthermore, the reasons behind the superiority of SAM-Points should be discussed. Compared to other methods, this approach appears to be more effective in distinguishing bare soil from flooded areas.

Citation: https://doi.org/10.5194/egusphere-2025-3146-RC1
- AC1: 'Reply on RC1', Hadi Shokati, 09 Sep 2025
  
  Dear Editor and Reviewer,
  We would like to express our sincere gratitude for your time and thoughtful comments on our manuscript, " Rapid Flood Mapping from Aerial Imagery Using Fine-Tuned SAM and ResNet-Backboned U-Net." Your insightful feedback has been extremely valuable in helping us improve the clarity, strength, and overall quality of our work.
  We have carefully considered all suggestions and addressed them point-by-point in the revised manuscript. For your reference, we have highlighted our responses to your comments in green. We believe these revisions have significantly strengthened the manuscript and we are confident that it is now ready for further consideration.
  Thank you again for your valuable contribution to this process. We look forward to your feedback on the revised manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3146-AC1
RC2:
'Comment on egusphere-2025-3146', Saham Mirzaei, 09 Sep 2025

Upon re-reading the manuscript, I noticed that in lines 200–203 you mention the use of various data augmentation techniques. Could you please clarify the probability settings assigned to each augmentation method?

In lines 201–203, it is not clear whether the augmentation was applied exclusively to the training dataset. Providing this clarification would enhance the transparency of the methodology.

Still in lines 201–203, it would be highly valuable to explicitly include details regarding the number of images before and after data augmentation, as well as their distribution across the training, validation, and test sets. Such information is critical to ensure reproducibility.

In lines 209–219, you mention the use of both Dice Loss and Cross-Entropy Loss. Could you please specify how these two loss functions were combined? For example, were they summed, averaged, or weighted differently?

I appreciate that the code is publicly available on GitHub. However, I could not locate the corresponding datasets in the repository. Based on the README file, it seems that the authors expect users to obtain the data from an external source. While this is acceptable provided that the source remains reliably available, hosting a copy of the datasets within your GitHub repository would be preferable for long-term accessibility.

Citation: https://doi.org/10.5194/egusphere-2025-3146-RC2
- AC2: 'Reply on RC2', Hadi Shokati, 11 Sep 2025
  
  Dear Editor and Reviewer,
  We would like to express our sincere gratitude again for your time and thoughtful comments on our manuscript, " Rapid Flood Mapping from Aerial Imagery Using Fine-Tuned SAM and ResNet-Backboned U-Net." Your insightful feedback has been extremely valuable in helping us improve the clarity, strength, and overall quality of our work.
  We have carefully considered all suggestions and addressed them point-by-point in the revised manuscript. For your reference, we have highlighted our responses to your comments in green. We believe these revisions have significantly strengthened the manuscript and we are confident that it is now ready for further consideration.
  Thank you again for your valuable contribution to this process. We look forward to your feedback on the revised manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3146-AC2
CC1:
'Comment on egusphere-2025-3146', Armin Moghimi, 17 Sep 2025

Dear authors,
I read the paper, and I see the method used by the authors, and also the code is derived from the following paper and GitHub:
ArminMoghimi/Fine-tune-the-Segment-Anything-Model-SAM-: A. Moghimi, M. Welzel, T. Celik, and T. Schlurmann, "A Comparative Performance Analysis of Popular Deep Learning Models and Segment Anything Model (SAM) for River Water Segmentation in Close-Range Remote Sensing Imagery,"
https://github.com/ArminMoghimi/Fine-tune-the-Segment-Anything-Model-SAM-
https://doi.org/10.1109/ACCESS.2024.3385425
However, I couldn't see this reference in the reference list, which it should be.

Citation: https://doi.org/10.5194/egusphere-2025-3146-CC1
- AC3:
  'Reply on CC1', Hadi Shokati, 19 Sep 2025
  
  Dear Armin,
  
  We appreciate your feedback. After a thorough review, we did not find any similarities between our code and the one you mentioned. Our implementation originates from our previous article published in CATENA, where we focused on segmenting erosion and deposition. That codebase is the result of years of experience and numerous meetings with domain experts. We therefore respectfully clarify that our implementation is fully independent and not derived from the repository you mentioned. We highly recommend reading the article linked below for a better understanding of the foundations of our approach.
  We were not aware of your article and code before, but we appreciate your work — it is a valuable contribution to the field. If we had seen it earlier, we would have certainly benefited from it. In the academic community, the shared goal is to advance the state of the art, and we are no exception. To support this goal, we include a comparison with your results in the results section of our revised manuscript, which we are confident will further enhance the quality of the paper.
  Thank you again for your valuable feedback.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3146-AC3
  - AC4: 'Reply on AC3', Hadi Shokati, 19 Sep 2025
    
    Please see the link below:
    
    GitHub: https://github.com/hadi1994shokati/Flood-segmentation
    Paper in Catena: https://www.sciencedirect.com/science/article/pii/S0341816225002565
    
    Citation: https://doi.org/10.5194/egusphere-2025-3146-AC4
  - CC2: 'Reply on AC3', Armin Moghimi, 19 Sep 2025
    
    Dear Hadi,
    Thank you—please don’t get me wrong—you are one of our remote sensing community, and your work is good. I don’t want to dwell on the similarity aspect, as we've already explored comparisons between SAM and UNet50 (with ResNet backbone) in the context of water segmentation using close-range remote sensing images (from UAVs, smartphones, and handheld cameras, within a 1–300 meter range) (https://doi.org/10.1109/ACCESS.2024.3385425). However, I’d like to suggest that you consider referencing our results, as they align with your findings and could strengthen your discussion section. For instance, we observed that SAM performs very well in general. However, when segmenting images from the same area, UNet actually produced even better results. This nuance might enrich your discussion, especially when highlighting the practical performance differences between models.
    Another point is related to the computational aspects: SAM typically operates on 1024×1024 patches, and when fine-tuning with a frozen ViT backbone, it still requires significant computational resources. The choice of ViT backbone also matters—ViT-H is quite heavy and not ideal for fine-tuning, whereas the smaller variants (like Tiny ViT and Medium ViT) tend to perform better with fewer resources.
    Lastly, regarding datasets: One could argue that with SAM, we might not need large annotated datasets anymore. While SAM reduces the need for manual annotation, I would still say that datasets are necessary. The real question is: how many do we actually need? To help address this, you might consider referencing this recent paper by Professor Anette Eltner from Dresden University: https://doi.org/10.1080/01431161.2025.2457131
    It discusses the sensitivity of model performance to dataset size and could help you frame this as a potential advantage of SAM over UNet.
    I’d love to hear your thoughts and see how you might incorporate some of these ideas into your discussion.
    Warm regards,
    
    Armin
    
    Citation: https://doi.org/10.5194/egusphere-2025-3146-CC2
    
    AC5: 'Reply on CC2', Hadi Shokati, 29 Sep 2025
    
    Dear Armin,
    Thank you once again for your valuable feedback. We truly appreciate the time and effort you have taken to engage with our work.
    In our work, we not only compared U-NET and SAM, but also evaluated two types of input prompts in SAM (points and bounding boxes) and two types of backbones for U-NET (ResNet-50 and ResNet-100). We agree that including a direct comparison with your findings will make our paper more comprehensive, and we will add this comparison to the results section of our revised manuscript.
    Regarding the ViT backbone, we also evaluated multiple variants and observed that their effectiveness for flood-affected area segmentation was comparable. To optimize computational resources, we selected ViT-Base, as it provides a favorable balance between accuracy and efficiency.
    We also appreciate your recommendation of Professor Anette Eltner’s recent article. It provides an excellent perspective on dataset size requirements. The number of labeled images needed depends on the complexity of the task. For instance, in our previous study on erosion and deposition segmentation (https://doi.org/10.1016/j.catena.2025.108954) we worked with about 400 labeled images and observed clear performance gains with increasing dataset size, an effect that was particularly relevant given the higher complexity of that task compared to flood mapping. This increased complexity was because eroded and non-eroded soil often have very similar visual characteristics, whereas flooded areas are usually more distinct from their surroundings.
    Thank you again for your thoughtful suggestions. They will certainly help us improve the clarity and impact of our paper.
    
    Citation: https://doi.org/10.5194/egusphere-2025-3146-AC5
    
    CC3: 'Reply on AC5', Armin Moghimi, 29 Sep 2025
    
    Perfect Hadi,
    All the best, hope to see your published paper soon on the desk.
    All the best,
    Armin
    
    Citation: https://doi.org/10.5194/egusphere-2025-3146-CC3
RC3:
'Comment on egusphere-2025-3146', Anonymous Referee #2, 22 Oct 2025
In their paper, Hadi Shokati et al. Propose methods to improve rapid flood mapping from Aerial Imagery using Fine-Tuned SAM and ResNet-Backboned U-Net. This paper is a valuable contribution to remote sensing and rapid disaster assessment. Although none of the comments and suggestions are critical, I would like to ask authors to incorporate and address these issues and suggestions before the paper's publication.
Although the methods in this paper are related to floods, they do not directly discuss flood itself. Therefore, it would be beneficial for the readers and also enhance the paper’s visibility to replace “flood” in keywords with “flood mapping” or ”segmentation of flood”, which are more relevant to the presented study.

The introduction and methods sections are well written, addressing the main issues and research question. However, there was a minor absence of the reference in line 210 regarding the choice of DiceCELoss. It is claimed in the paper that: “DiceCELoss is often used to improve segmentation performance by leveraging both the pixel-wise accuracy (via Cross-Entropy) and the structural similarity (via Dice coefficient).” Please include at least one reference to support this claim and the choice of this loss function.

Although the terms and names of the methods are well described throughout the paper, their usage in the text and figures is inconsistent. For example, Segment Anything Mode is consistently abbreviated as SAM, but the versions (point prompts or points prompts) are referred to in various inconsistent forms. Please use consistent terminology for methods in the entire manuscript, especially for the main methods. For instance, here are a few examples:
SAM (Points prompts) on page 14 and SAM (Point prompts) on page 15 in the figures.

Points in Figure 4.

point prompts in line 309

point prompt in line 100

"Bounding boxes" is abbreviated as "Bbox" in line 135, but this is inconsistently used throughout the text and figures, sometimes as "bounding box" and other times as "Bbox."

...

Figure 6 lacks a brief description of the subplots labeled a, b, ..., h in the caption.

I would like to ask the authors to elaborate on why 290 images with different geographic regions and diverse flood events are sufficient for this study. We recognize that transfer learning enables us to train our models with a limited sample size by leveraging pre-trained data; I would appreciate a discussion on how this sample size captures the variability needed for a robust model. Including this clarification would strengthen the manuscript by addressing potential concerns about the dataset.
Citation: https://doi.org/10.5194/egusphere-2025-3146-RC3
- AC6: 'Reply on RC3', Hadi Shokati, 27 Oct 2025
  
  Dear Reviewer,
  We would like to express our sincere gratitude for your time and thoughtful comments on our manuscript, " Rapid Flood Mapping from Aerial Imagery Using Fine-Tuned SAM and ResNet-Backboned U-Net." Your insightful feedback has been extremely valuable in helping us improve the clarity, strength, and overall quality of our work.
  We have carefully considered all suggestions and addressed them point-by-point in the revised manuscript. For your reference, we have highlighted our responses to your comments in blue. We believe these revisions have significantly strengthened the manuscript, and we are confident that it is now ready for further consideration.
  Thank you again for your valuable contribution to this process. We look forward to your feedback on the revised manuscript.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3146-AC6

Hadi Shokati, Kay D. Seufferheld, Peter Fiener, and Thomas Scholten

Viewed

Total article views: 1,508 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,390	80	38	1,508	33	39

HTML: 1,390
PDF: 80
XML: 38
Total: 1,508
BibTeX: 33
EndNote: 39

Views and downloads (calculated since 08 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	644	12	5	661
Sep 2025	644	31	19	694
Oct 2025	83	28	12	123
Nov 2025	19	9	2	30

Cumulative views and downloads (calculated since 08 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	644	12	5	661
Sep 2025	644	31	19	694
Oct 2025	83	28	12	123
Nov 2025	19	9	2	30

Viewed (geographical distribution)

Total article views: 1,297 (including HTML, PDF, and XML) Thereof 1,297 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 19 Nov 2025

Short summary

Floods threaten lives and property and require rapid mapping. We compared two artificial intelligence approaches on aerial imagery: a fine‑tuned Segment Anything Model (SAM) guided by point or bounding box prompts, and a U‑Net network with ResNet‑50 and ResNet‑101 backbones. The point‑based SAM was the most accurate with precise boundaries. Faster and more reliable flood maps help rescue teams, insurers, and planners to act quickly.


Total:	0
HTML:	0
PDF:	0
XML:	0