the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
From Ground Photos to Aerial Insights: Automating Citizen Science Labeling for Tree Species Segmentation in UAV Images
Abstract. Spatially accurate information on plant species is essential for various biodiversity monitoring applications like vegetation monitoring. Unoccupied Aerial Vehicle (UAV)-based remote sensing combined with supervised Convolutional Neural Networks (CNNs)-based segmentation methods has enabled accurate segmentation of plant species. However, labeling training data for supervised CNN methods in vegetation monitoring is a resource-intensive task, particularly for large-scale remote sensing datasets. This study presents an automated workflow that integrates the Segment Anything Model (SAM) with Gradient-weighted Class Activation Mapping (Grad-CAM) to generate segmentation masks for citizen science plant photographs, reducing the efforts required for manual annotation. We evaluated the workflow by using the generated masks to train CNN-based segmentation models to segment 10 broadleaf tree species in UAV images. The results demonstrate that segmentation models can be trained directly using citizen science-sourced plant photographs, automating mask generation without the need for extensive manual labeling. Despite the inherent complexity of segmenting broadleaf tree species, the model achieved an overall acceptable performance. Towards efficiently monitoring vegetation dynamics across space and time, this study highlights the potential of integrating foundation models with citizen science data and remote sensing into automated vegetation mapping workflows, providing a scalable and cost-effective solution for biodiversity monitoring.
- Preprint
(54188 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-662', Anonymous Referee #1, 18 Mar 2025
-
AC1: 'Response to Reviewer 1 Comments', Salim Soltani, 09 May 2025
Dear Reviewer,
We would like to sincerely thank you for your constructive and thoughtful comments. We greatly appreciate the time and effort you invested in reviewing our manuscript. Your feedback has been very helpful in identifying areas for improvement.
We have carefully addressed all comments and will revise the manuscript accordingly. For better readability, we have compiled our detailed responses in the attached PDF, structured in a clear table format.
Thank you once again for your valuable input.
Sincerely,
Salim Soltani
(on behalf of the Co-authors, Lauren E. Gillespie, Moises Exposito-Alonso, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer, and Teja Kattenborn)
-
AC1: 'Response to Reviewer 1 Comments', Salim Soltani, 09 May 2025
-
RC2: 'Comment on egusphere-2025-662', Anonymous Referee #2, 19 Apr 2025
I have reviewed the manuscript “From Ground Photos to Aerial Insights: Automating Citizen
Science Labeling for Tree Species Segmentation in UAV Images”. The authors examined the use of citizen science plant photographs to generate large training data needed for segmenting plant species from high-resolution UAV imagery. Specifically, the authors combined several AI/ML models to extract species training masks from photographs. The research topic is very interesting and timely, and addresses a core need to advance the use of optical UAV imagery for larger-scale vegetation mapping. The manuscript is well structured and nice discussed. My concerns are mainly on the Methods and Results.
I would recommend the authors to add a workflow chart to help readers understand the various types of methods and data used for the study. There are several AI/ML models employed for various different data processing, including both photographs and UAV imagery. I found it hard to connect the different processing steps, and how different data streams and AI/ML methods are used.
Second, not much information is presented in the Results, barely enough to understand the performance of the model. The authors did quite significant work on processing and segmenting the photographs from iNaturealist and Pl@ntNet. However, results about these processing and segmentation are completely missed in the Results. I am nervous the presentation of Results is disconnected with the Methods. Recommend the authors to carefully tie them together, especially, how F1 score, confusion matrix was calculated. The authors mentioned independent transect validation data were identified from UAV imagery, but did not mention where and how those were produced, distribution across species and space etc. I think it is also useful to present the species maps across the experiment plots.
Lastly an overall thought, a core advance of using UAV imagery is to provide landscape-scale observations. The authors argued that ultra-high (finer than 0.22 cm) might be necessary to better segment species from UAV imagery. This statement appears to “false”, and ignored that canopy structure and form are important information for species identification, which are not considered in this study. On the other hand, it is cool to generate the initial masks for UAV species identification using photographs, but it might be more useful to iterate over the species segmentation at UAV level, leveraging other information like canopy form and structure, to enlarge training samples at UAV level, instead of forcing UAV data to the same resolution as ground photographs?
Minor comments:
- I wonder what features the authors used for segmentation? It is clear that the authors used only RGB imagery, but are other indices or transformations incorporated in the SAM segmentation?
- The author mentioned that photos/masks from citizen science were ‘zoomed’ out when applied as training for UAV imagery. What’s the resolution after that? Is it comparable to UAV resolution?
Citation: https://doi.org/10.5194/egusphere-2025-662-RC2 -
AC2: 'Response to Reviewer 2 Comments', Salim Soltani, 09 May 2025
Dear Reviewer,
We would like to sincerely thank you for your constructive and thoughtful comments. We greatly appreciate the time and effort you invested in reviewing our manuscript. Your feedback has been very helpful in identifying areas for improvement.
We have carefully addressed all comments and will revise the manuscript accordingly. For better readability, we have compiled our detailed responses in the attached PDF, structured in a clear table format.
Thank you once again for your valuable input.
Sincerely,
Salim Soltani
(on behalf of the Co-authors, Lauren E. Gillespie, Moises Exposito-Alonso, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer, and Teja Kattenborn)
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
556 | 128 | 23 | 707 | 13 | 26 |
- HTML: 556
- PDF: 128
- XML: 23
- Total: 707
- BibTeX: 13
- EndNote: 26
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
In this study, authors develop an end-to-end workflow that transforms the simple labels of crowd-sourced plant photos from iNaturalist and Pl@ntNet into segmentations masks. This mask dataset serves as labelled data to train deep learning species classification models. Authors also successfully utilized the dataset to train a CNN model to classify UAV ortho-imagery and accurately segment plant species at large scale. By reducing the time and labor required for field surveys to collect reference data for remote sensing image classification, this labeled dataset may offer some practical benefits. Overall, the study demonstrates both intellectual merit and practical relevance. The manuscript is also well-structured and well-written. However, the use of these citizen science datasets as labelled data for segmenting UAV images yields low accuracy in various species, hindering practical applications of these datasets and the method. The UAV image segmentation model performance should be improved for further evaluation.
Other comments
Martins et al., 2020. Exploring multiscale object-based convolutional neural network (multi-OCNN) for remote sensing image classification at high spatial resolution. https://doi.org/10.1016/j.isprsjprs.2020.08.004