the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
From simple labels to semantic image segmentation: Leveraging citizen science plant photographs for tree species mapping in drone imagery
Abstract. Knowledge of plant species distributions is essential for various applications, such as nature conservation, agriculture, and forestry. Remote sensing data, especially high-resolution orthoimages from Unoccupied Aerial Vehicles (UAVs), were demonstrated to be an effective data source for plant species mapping. Particularly, in concert with novel pattern recognition methods, such as Convolutional Neural Networks (CNNs), plant species can be accurately segmented in such high-resolution UAV images. Training such pattern recognition models for species segmentation that are transferable across various landscapes and remote sensing data characteristics often requires excessive training data. Training data are usually derived in the form of segmentation masks from field surveys or visual interpretation of the target species in remote sensing images. Still, both methods are laborious and constrain the training of transferable pattern recognition models. Alternatively, pattern recognition models could be trained on the open knowledge of how plants look as available from smartphone-based species identification apps, that is, millions of citizen science-based smartphone photographs and the corresponding species label. However, these pairs of citizen science-based photographs and simple species labels (one label for the entire image) cannot be used directly for training state-of-the-art segmentation models used for UAV image analysis, which require per-pixel labels for training (also called masks). Here, we overcome the limitation of simple labels of citizen science plant observations with a two-step approach: In the first step, we train CNN-based image classification models using the simple labels and apply them in a moving-window approach over UAV orthoimagery to create segmentation masks. In the second phase, these segmentation masks are used to train state-of-the-art CNN-based image segmentation models with an encoder-decoder structure. We tested the approach on UAV orthoimages acquired in summer and autumn on a test site comprising ten temperate deciduous tree species in varying mixtures. Several tree species could be mapped with surprising accuracy (mean F1-score = 0.47). In homogenous species assemblages, the accuracy increased considerably (mean F1-score 0.55). The results indicate that many tree species can be mapped without generating training data and by integrating pre-existing knowledge from citizen science. Moreover, our analysis revealed that citizen science photographs’ variability in acquisition data and context facilitates the generation of models that are transferable through the vegetation season. Thus, citizen science data may greatly advance our capacity to monitor hundreds of plant species and, thus, Earth's biodiversity across space and time.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(79805 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(79805 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2576', Anonymous Referee #1, 12 Jan 2024
Thank you for the opportunity to review this manuscript. In this article the authors present an innovative method to incorporate citizen science photographs of trees to segment and classify ten deciduous tree species from aerial images, using a Convolutional Neural Network. The two-step approach of using simple labels of citizen science data to create masks for a segmentation model is innovative and highly relevant. I think that the paper fits well within the scope of this journal and presents an application of an interesting new approach to remote sensing.
The manuscript as a whole is very well structured. Only the first part of the abstract could be shortened significantly.
Comments
1) The first part of the abstract, that presents an overview of the problem could be shortened to make it more concise (try to summarise each section of the manuscript in 1-3 sentences).
2) l. 250 why did you choose EfficientNetV2L over the other tested backbone architectures?
3) l. 261 how much % of the images were assigned NA? Did this influence the model training?
4) Could you explain the term “replacements” (e.g. l. 240)?
5) Do you think the amount of misclassified data could be a problem for the training of the segmentation model? (l. 297-298)
6) 0.22 cm already seems like very high resolution. Many remote sensing studies focus on making high resolution reference data more usable over large areas (i.e. by adapting it to satellite data). You argue for the use of even finer resolution data in the future. What research objectives could be studies using this very high resolution of UAV data? Is there a research gap for very high prediction accuracy over relatively small areas? Could multispectral/hyperspectral sensors be more useful than higher resolution?
Minor comments
l. 29 Please remove the “and” between “data” and “by”
l. 51 “unleash” might not be the right word; “harness” might be better suited
“provided” might be better instead of “given”
l. 56-60 This sentence is not completely clear to me. Maybe you can reformulate it to make it
easier to read.
l. 63 Please remove “similar”, as it is unnecessary
l. 66 Consider combining sentence “[…] costly, as training data […]”
l. 81 Is the training data limited or just costly/time consuming to generate?
l. 89 “platforms”
l. 90/95 “mil” or “M”;
please remove “of”
l. 97 Please remove “The” before “Pl@ntNet”
l. 109 “Ideally, for species mapping applications […]”
l. 115-120 This part might fit better in the Methods section
l. 198 Please remove “Accordingly”
l. 235 “were afterward rasterized”
l. 240-241 What does “sampled with replacement” mean?
l. 317 Please replace “while” with “although”, or similar
l. 337-341 This might fit better in the Discussion section
l. 367 “varying”
l. 373 “partially relatively inaccurate” → This is a little vague. Maybe expand upon it a
little.
l. 387-389 Please remove one instance of “plots with more species (two or four)”
l. 393 “higher value” than what?
l. 442 Maybe you can find a better phrasing than “diversity of human behaviour”
l. 457 “often costly”
l. 484 “large” instead of “excessive” (which means unreasonably much)
l. 485 “good transferability”
Figure 2: The text font is very small. It would also be better if the labels match the ones used in the text: “OrthoJuly” and “OrthoSeptember” instead of “Ortho 1” and “Ortho 2”
Figure 4: The text font here is also very small.
Figure 6: The height of the transects seems to be different between plots (eg. plot 29 and plot 33). If they are all the same (2 m), please show them with the same extents in the figure as well.
Citation: https://doi.org/10.5194/egusphere-2023-2576-RC1 -
AC1: 'Response to the first reviewer's comment', Salim Soltani, 28 Mar 2024
Salim Soltani, Remote Sensing Center for Earth System Research
University of Leipzig, salim.soltani@uni-leipzig.de
Ref. No.: egusphere-2023-2576- “From simple labels to semantic image segmentation: Leveraging citizen science plant photographs for tree species mapping in drone imagery “
Dear reviewer,
We would like to thank you for your constructive comments that allowed us to improve the quality of the manuscript and for the time that you spent commenting on the manuscript.
We have addressed the first reviewer's comments. We hope that the revised manuscript addresses all the shortcomings of the earlier version.
Kind regards,
Salim Soltani
(on behalf of the Co-authors, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer , Teja Kattenborn)
-
AC1: 'Response to the first reviewer's comment', Salim Soltani, 28 Mar 2024
-
RC2: 'Comment on egusphere-2023-2576', Anonymous Referee #2, 04 Apr 2024
I enjoyed reading the manuscript and its rigorous approach to image segmentation and have no additional comments in addition to those of Reviewer #1.
Citation: https://doi.org/10.5194/egusphere-2023-2576-RC2 -
AC2: 'Response to the second reviewer's comment', Salim Soltani, 05 Apr 2024
Dear reviewer,
We would like to thank you for your positive evaluation of the manuscript.
We thoroughly addressed the constructive suggestions of reviewer 1.
Kind regards,
Salim Soltani
(on behalf of the Co-authors, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer , Teja Kattenborn)
Citation: https://doi.org/10.5194/egusphere-2023-2576-AC2
-
AC2: 'Response to the second reviewer's comment', Salim Soltani, 05 Apr 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-2576', Anonymous Referee #1, 12 Jan 2024
Thank you for the opportunity to review this manuscript. In this article the authors present an innovative method to incorporate citizen science photographs of trees to segment and classify ten deciduous tree species from aerial images, using a Convolutional Neural Network. The two-step approach of using simple labels of citizen science data to create masks for a segmentation model is innovative and highly relevant. I think that the paper fits well within the scope of this journal and presents an application of an interesting new approach to remote sensing.
The manuscript as a whole is very well structured. Only the first part of the abstract could be shortened significantly.
Comments
1) The first part of the abstract, that presents an overview of the problem could be shortened to make it more concise (try to summarise each section of the manuscript in 1-3 sentences).
2) l. 250 why did you choose EfficientNetV2L over the other tested backbone architectures?
3) l. 261 how much % of the images were assigned NA? Did this influence the model training?
4) Could you explain the term “replacements” (e.g. l. 240)?
5) Do you think the amount of misclassified data could be a problem for the training of the segmentation model? (l. 297-298)
6) 0.22 cm already seems like very high resolution. Many remote sensing studies focus on making high resolution reference data more usable over large areas (i.e. by adapting it to satellite data). You argue for the use of even finer resolution data in the future. What research objectives could be studies using this very high resolution of UAV data? Is there a research gap for very high prediction accuracy over relatively small areas? Could multispectral/hyperspectral sensors be more useful than higher resolution?
Minor comments
l. 29 Please remove the “and” between “data” and “by”
l. 51 “unleash” might not be the right word; “harness” might be better suited
“provided” might be better instead of “given”
l. 56-60 This sentence is not completely clear to me. Maybe you can reformulate it to make it
easier to read.
l. 63 Please remove “similar”, as it is unnecessary
l. 66 Consider combining sentence “[…] costly, as training data […]”
l. 81 Is the training data limited or just costly/time consuming to generate?
l. 89 “platforms”
l. 90/95 “mil” or “M”;
please remove “of”
l. 97 Please remove “The” before “Pl@ntNet”
l. 109 “Ideally, for species mapping applications […]”
l. 115-120 This part might fit better in the Methods section
l. 198 Please remove “Accordingly”
l. 235 “were afterward rasterized”
l. 240-241 What does “sampled with replacement” mean?
l. 317 Please replace “while” with “although”, or similar
l. 337-341 This might fit better in the Discussion section
l. 367 “varying”
l. 373 “partially relatively inaccurate” → This is a little vague. Maybe expand upon it a
little.
l. 387-389 Please remove one instance of “plots with more species (two or four)”
l. 393 “higher value” than what?
l. 442 Maybe you can find a better phrasing than “diversity of human behaviour”
l. 457 “often costly”
l. 484 “large” instead of “excessive” (which means unreasonably much)
l. 485 “good transferability”
Figure 2: The text font is very small. It would also be better if the labels match the ones used in the text: “OrthoJuly” and “OrthoSeptember” instead of “Ortho 1” and “Ortho 2”
Figure 4: The text font here is also very small.
Figure 6: The height of the transects seems to be different between plots (eg. plot 29 and plot 33). If they are all the same (2 m), please show them with the same extents in the figure as well.
Citation: https://doi.org/10.5194/egusphere-2023-2576-RC1 -
AC1: 'Response to the first reviewer's comment', Salim Soltani, 28 Mar 2024
Salim Soltani, Remote Sensing Center for Earth System Research
University of Leipzig, salim.soltani@uni-leipzig.de
Ref. No.: egusphere-2023-2576- “From simple labels to semantic image segmentation: Leveraging citizen science plant photographs for tree species mapping in drone imagery “
Dear reviewer,
We would like to thank you for your constructive comments that allowed us to improve the quality of the manuscript and for the time that you spent commenting on the manuscript.
We have addressed the first reviewer's comments. We hope that the revised manuscript addresses all the shortcomings of the earlier version.
Kind regards,
Salim Soltani
(on behalf of the Co-authors, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer , Teja Kattenborn)
-
AC1: 'Response to the first reviewer's comment', Salim Soltani, 28 Mar 2024
-
RC2: 'Comment on egusphere-2023-2576', Anonymous Referee #2, 04 Apr 2024
I enjoyed reading the manuscript and its rigorous approach to image segmentation and have no additional comments in addition to those of Reviewer #1.
Citation: https://doi.org/10.5194/egusphere-2023-2576-RC2 -
AC2: 'Response to the second reviewer's comment', Salim Soltani, 05 Apr 2024
Dear reviewer,
We would like to thank you for your positive evaluation of the manuscript.
We thoroughly addressed the constructive suggestions of reviewer 1.
Kind regards,
Salim Soltani
(on behalf of the Co-authors, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer , Teja Kattenborn)
Citation: https://doi.org/10.5194/egusphere-2023-2576-AC2
-
AC2: 'Response to the second reviewer's comment', Salim Soltani, 05 Apr 2024
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
743 | 222 | 57 | 1,022 | 22 | 20 |
- HTML: 743
- PDF: 222
- XML: 57
- Total: 1,022
- BibTeX: 22
- EndNote: 20
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Salim Soltani
Olga Ferlian
Nico Eisenhauer
Hannes Feilhauer
Teja Kattenborn
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(79805 KB) - Metadata XML