the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
OrthoSAM: Multi-Scale Extension of the Segment Anything Model for River Pebble Delineation from Large Orthophotos
Abstract. Sediment characteristics and grain-size distribution are crucial for understanding natural hazards, hydrologic conditions, and ecosystems. However, traditional methods for collecting this information are costly, labor-intensive, and time-consuming. To address this, we present OrthoSAM, a workflow leveraging the Segment Anything Model (SAM) for automated delineation of densely packed pebbles in high-resolution orthomosaics. Our framework consists of a tiling scheme, improved seed (input) point generation, and a multi-scale resampling scheme. Validation using synthetic images shows high precision close to 1, a recall above 0.9, with a mean IoU above 0.9. Using a large synthetic dataset, we show that the two-sample Kolmogorov-Smirnov test confirms the accuracy of the grain size distribution. We identified a size detection limit of 30 pixels; pebbles with a diameter below this limit are not reliably detected. Applying OrthoSAM to orthomosaics from the Ravi River in India, we delineated 6087 pebbles with high precision (0.93) and recall (0.94). The resulting grain statistics include area, axis lengths, perimeter, RGB statistics, and smoothness measurements, providing valuable insights for further analysis in geomorphology and ecosystem studies.
- Preprint
(10279 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-4003', David Mair, 20 Sep 2025
-
RC2: 'Comment on egusphere-2025-4003', Zoltan Sylvester, 07 Oct 2025
The manuscript by Chan et al. focuses on the description and validation of an open-source Python machine learning model called ‘OrthoSAM’, which relies on the Segment Anything Model (SAM) to generate instance segmentations of images of coarse-grained fluvial sediment. As someone who has also done some work on using SAM for grain segmentation, I think this is a promising approach and a having access to a variety of techniques and implementations at this stage are overall an advantage. The paper is well written and nicely illustrated, it includes a number of novel approaches that have not been implemented before, and the authors have clearly put a significant amount of thoughtful and careful work into the software and into validating the results with synthetic and field data. In addition, they have made the code open-source and available as a GitHub repository, which makes it a lot easier for these methods to be adopted and tested on other datasets.
I do have a number of comments that I think should be addressed by the authors before publication; these are as follows.
- The SAM-based approach and the tiling of large images are features of OrthoSAM that our Python module called ‘Segmenteverygrain’ also relies on. Although Segmenteverygrain is mentioned in the manuscript, I think there should be a bit more detailed discussion of what are the differences between the two techniques - not just the fact that OrthoSAM only relies on SAM, without the need for the U-Net pass, but also aspects like how broadly is the model applicable, how is it possible to improve the model outputs, is it possible to fine-tune the model. I do think that there is room for a variety of approaches to taking advantage of SAM (and of other similar) models in sedimentology and geomorphology, but it will be useful for the reader to get a brief a overview of the differences between the existing tools.
- One the novel aspects of the work presented by Chan et al. is the generation of synthetic data that is then used for validation. While I totally see the value of this in increasing the community’s confidence in the model, one of the important questions about ML models is their ability to generalize. Although SAM has been trained on a wide variety of images and is good at generalization, I think it is less clear how well OrthoSAM would perform on real images of coarse-grained sediment that are quite different from the examples used in the paper. Although the authors are right that “manual validation is inevitably prone to subjectivity and human error, leading to potential biases and inconsistencies”, I would argue that a carefully QC-d segmentation of real datasets is potentially more valuable for validating a machine learning model than a synthetic dataset that does not fully reproduce the complexity and variety of actual datasets. So I concur with the other reviewer that applying OrthoSAM to other datasets would be a valuable addition to the paper. It should not take too long to run it on some other publicly available datasets.
- I do not think this is a major issue, certainly not for this manuscript, but: I have tried to install OrthoSAM on my computer and to run one of the notebooks but I gave up without getting to a result because I got a number of errors early on. Making it easier for a broad range of users to install and run the code will ensure a broader adoption of OrthoSAM.
- The ‘hardware requirements’ section is quite useful, but it could be improved if typical compute times were added, e.g., how long does it take to create a segmentation result for an image with ~1000 grains? Is it possible/feasible at all to run the segmentation on a CPU?
I hope the authors will find these comments / suggestions somewhat useful.
Sincerely,
Zoltan Sylvester
Citation: https://doi.org/10.5194/egusphere-2025-4003-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
850 | 35 | 9 | 894 | 23 | 28 |
- HTML: 850
- PDF: 35
- XML: 9
- Total: 894
- BibTeX: 23
- EndNote: 28
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
General Comments:
The authors present a novel method and proof-of-concept for pebble segmentation in orthoimages by adapting the popular and widely-used Segment Anything Model (SAM; Kirillov et al., 2023). They identify important, but often unaddressed, weaknesses of SAM, such as the reduced performance in dense segmentation tasks (where many instances of the same object class should be segmented), and its limited capability to segment objects from one class with a significant size variability. To test their approach, the authors use 1) synthetic images with circles as a proxy for pebbles and 2) ortho-mosaics of real pebbles created with handheld cameras and photogrammetric processing. In their experiment 1, they test for the effect of a variety of image perturbations on segmentation quality. Here, they find that particularly shadow effects have some negative impact on SAM’s segmentation performance. In experiment 2, they apply their workflow to real-world images, showcasing the improvement of their multi-scale segmentation with SAM. In this scenario, they categorically evaluate segmentation performance through manual counting due to the lack of ground truth masks. Both experiments show that their approach is up to the task and has the potential to mitigate some of the segmentation shortcomings of SAM for such applications.
I find the method well-conceived and thought-through, the data rigorously tested and clearly reported, and the manuscript well structured. In particular, I consider the balance between technical details in the main manuscript and the appendices well struck, which makes the manuscript very readable, while not omitting relevant information. The presented results generally support the findings and conclusions. Here, I would only have two suggestions for calculating additional scores and using an additional image dataset to test the approach (see specific comments below), which might allow for a better evaluation of some aspects of the segmentation performance of SAM/OrthoSAM. However, these are just suggestions, not concerns raised. Currently, the manuscript has many small figures; maybe combining some figures into larger figures (e.g., Figures 10 and 11) would be helpful. Additionally, some minor/technical comments are included as in-line comments in the attached pdf.
In summary, I find the work of very high quality, with only a few minor points where the manuscript could be further improved. I suspect the authors will have no problems in addressing these points, and I look forward to seeing the manuscript published soon.
Kind regards,
David Mair (Uni Bern)
Specific comments:
References (including in-line comments):
Chen, Y., Bao, J., Chen, R., Li, B., Yang, Y., Renteria, L., Delgado, D., Forbes, B., Goldman, A. E., Simhan, M., Barnes, M. E., Laan, M., McKever, S., Hou, Z. J., Chen, X., Scheibe, T., & Stegen, J. (2024). Quantifying Streambed Grain Size, Uncertainty, and Hydrobiogeochemical Parameters Using Machine Learning Model YOLO. Water Resources Research, 60(11). https://doi.org/10.1029/2023WR036456
Huang, Y., Yang, X., Liu, L., Zhou, H., Chang, A., Zhou, X., Chen, R., Yu, J., Chen, J., Chen, C., Liu, S., Chi, H., Hu, X., Yue, K., Li, L., Grau, V., Fan, D. P., Dong, F., & Ni, D. (2024). Segment anything model for medical images? Medical Image Analysis, 92. https://doi.org/10.1016/j.media.2023.103061
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., Dollár, P., & Girshick, R. (2023). Segment Anything. http://arxiv.org/abs/2304.02643
Mair, D., Witz, G., Do Prado, A. H., Garefalakis, P., & Schlunegger, F. (2024). Automated detecting, segmenting and measuring of grains in images of fluvial sediments: The potential for large and precise data from specialist deep learning models and transfer learning. Earth Surface Processes and Landforms, 49(3), 1099–1116. https://doi.org/10.1002/esp.5755
Pachitariu, M., Rariden, M., & Stringer, C. (2025). Cellpose-SAM: superhuman generalization for cellular segmentation. https://doi.org/10.1101/2025.04.28.651001
Padilla, R., Netto, S. L., & da Silva, E. A. B. (2020). A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130
Stringer, C., Wang, T., Michaelos, M., & Pachitariu, M. (2021). Cellpose: a generalist algorithm for cellular segmentation. Nature Methods, 18(1), 100–106. https://doi.org/10.1038/s41592-020-01018-x
Zegers, G., Hayashi, M., & Garcés, A. (2025). Distributed estimation of surface sediment size in paraglacial and periglacial environments using drone photogrammetry. Earth Surface Processes and Landforms, 50(7). https://doi.org/10.1002/esp.70093