OrthoSAM: Multi-Scale Extension of the Segment Anything Model for River Pebble Delineation from Large Orthophotos
Abstract. Sediment characteristics and grain-size distribution are crucial for understanding natural hazards, hydrologic conditions, and ecosystems. However, traditional methods for collecting this information are costly, labor-intensive, and time-consuming. To address this, we present OrthoSAM, a workflow leveraging the Segment Anything Model (SAM) for automated delineation of densely packed pebbles in high-resolution orthomosaics. Our framework consists of a tiling scheme, improved seed (input) point generation, and a multi-scale resampling scheme. Validation using synthetic images shows high precision close to 1, a recall above 0.9, with a mean IoU above 0.9. Using a large synthetic dataset, we show that the two-sample Kolmogorov-Smirnov test confirms the accuracy of the grain size distribution. We identified a size detection limit of 30 pixels; pebbles with a diameter below this limit are not reliably detected. Applying OrthoSAM to orthomosaics from the Ravi River in India, we delineated 6087 pebbles with high precision (0.93) and recall (0.94). The resulting grain statistics include area, axis lengths, perimeter, RGB statistics, and smoothness measurements, providing valuable insights for further analysis in geomorphology and ecosystem studies.
General Comments:
The authors present a novel method and proof-of-concept for pebble segmentation in orthoimages by adapting the popular and widely-used Segment Anything Model (SAM; Kirillov et al., 2023). They identify important, but often unaddressed, weaknesses of SAM, such as the reduced performance in dense segmentation tasks (where many instances of the same object class should be segmented), and its limited capability to segment objects from one class with a significant size variability. To test their approach, the authors use 1) synthetic images with circles as a proxy for pebbles and 2) ortho-mosaics of real pebbles created with handheld cameras and photogrammetric processing. In their experiment 1, they test for the effect of a variety of image perturbations on segmentation quality. Here, they find that particularly shadow effects have some negative impact on SAM’s segmentation performance. In experiment 2, they apply their workflow to real-world images, showcasing the improvement of their multi-scale segmentation with SAM. In this scenario, they categorically evaluate segmentation performance through manual counting due to the lack of ground truth masks. Both experiments show that their approach is up to the task and has the potential to mitigate some of the segmentation shortcomings of SAM for such applications.
I find the method well-conceived and thought-through, the data rigorously tested and clearly reported, and the manuscript well structured. In particular, I consider the balance between technical details in the main manuscript and the appendices well struck, which makes the manuscript very readable, while not omitting relevant information. The presented results generally support the findings and conclusions. Here, I would only have two suggestions for calculating additional scores and using an additional image dataset to test the approach (see specific comments below), which might allow for a better evaluation of some aspects of the segmentation performance of SAM/OrthoSAM. However, these are just suggestions, not concerns raised. Currently, the manuscript has many small figures; maybe combining some figures into larger figures (e.g., Figures 10 and 11) would be helpful. Additionally, some minor/technical comments are included as in-line comments in the attached pdf.
In summary, I find the work of very high quality, with only a few minor points where the manuscript could be further improved. I suspect the authors will have no problems in addressing these points, and I look forward to seeing the manuscript published soon.
Â
Kind regards,
David Mair (Uni Bern)
Specific comments:
Â
References (including in-line comments):
Chen, Y., Bao, J., Chen, R., Li, B., Yang, Y., Renteria, L., Delgado, D., Forbes, B., Goldman, A. E., Simhan, M., Barnes, M. E., Laan, M., McKever, S., Hou, Z. J., Chen, X., Scheibe, T., & Stegen, J. (2024). Quantifying Streambed Grain Size, Uncertainty, and Hydrobiogeochemical Parameters Using Machine Learning Model YOLO. Water Resources Research, 60(11). https://doi.org/10.1029/2023WR036456
Huang, Y., Yang, X., Liu, L., Zhou, H., Chang, A., Zhou, X., Chen, R., Yu, J., Chen, J., Chen, C., Liu, S., Chi, H., Hu, X., Yue, K., Li, L., Grau, V., Fan, D. P., Dong, F., & Ni, D. (2024). Segment anything model for medical images? Medical Image Analysis, 92. https://doi.org/10.1016/j.media.2023.103061
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., Dollár, P., & Girshick, R. (2023). Segment Anything. http://arxiv.org/abs/2304.02643
Mair, D., Witz, G., Do Prado, A. H., Garefalakis, P., & Schlunegger, F. (2024). Automated detecting, segmenting and measuring of grains in images of fluvial sediments: The potential for large and precise data from specialist deep learning models and transfer learning. Earth Surface Processes and Landforms, 49(3), 1099–1116. https://doi.org/10.1002/esp.5755
Pachitariu, M., Rariden, M., & Stringer, C. (2025). Cellpose-SAM: superhuman generalization for cellular segmentation. https://doi.org/10.1101/2025.04.28.651001
Padilla, R., Netto, S. L., & da Silva, E. A. B. (2020). A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130
Stringer, C., Wang, T., Michaelos, M., & Pachitariu, M. (2021). Cellpose: a generalist algorithm for cellular segmentation. Nature Methods, 18(1), 100–106. https://doi.org/10.1038/s41592-020-01018-x
Zegers, G., Hayashi, M., & Garcés, A. (2025). Distributed estimation of surface sediment size in paraglacial and periglacial environments using drone photogrammetry. Earth Surface Processes and Landforms, 50(7). https://doi.org/10.1002/esp.70093