From simple labels to semantic image segmentation: Leveraging citizen science plant photographs for tree species mapping in drone imagery

Soltani, Salim; Ferlian, Olga; Eisenhauer, Nico; Feilhauer, Hannes; Kattenborn, Teja

doi:https://doi.org/10.5194/egusphere-2023-2576

Preprints

https://doi.org/10.5194/egusphere-2023-2576

Preprints

05 Dec 2023

| 05 Dec 2023

From simple labels to semantic image segmentation: Leveraging citizen science plant photographs for tree species mapping in drone imagery

Salim Soltani, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer, and Teja Kattenborn

Abstract. Knowledge of plant species distributions is essential for various applications, such as nature conservation, agriculture, and forestry. Remote sensing data, especially high-resolution orthoimages from Unoccupied Aerial Vehicles (UAVs), were demonstrated to be an effective data source for plant species mapping. Particularly, in concert with novel pattern recognition methods, such as Convolutional Neural Networks (CNNs), plant species can be accurately segmented in such high-resolution UAV images. Training such pattern recognition models for species segmentation that are transferable across various landscapes and remote sensing data characteristics often requires excessive training data. Training data are usually derived in the form of segmentation masks from field surveys or visual interpretation of the target species in remote sensing images. Still, both methods are laborious and constrain the training of transferable pattern recognition models. Alternatively, pattern recognition models could be trained on the open knowledge of how plants look as available from smartphone-based species identification apps, that is, millions of citizen science-based smartphone photographs and the corresponding species label. However, these pairs of citizen science-based photographs and simple species labels (one label for the entire image) cannot be used directly for training state-of-the-art segmentation models used for UAV image analysis, which require per-pixel labels for training (also called masks). Here, we overcome the limitation of simple labels of citizen science plant observations with a two-step approach: In the first step, we train CNN-based image classification models using the simple labels and apply them in a moving-window approach over UAV orthoimagery to create segmentation masks. In the second phase, these segmentation masks are used to train state-of-the-art CNN-based image segmentation models with an encoder-decoder structure. We tested the approach on UAV orthoimages acquired in summer and autumn on a test site comprising ten temperate deciduous tree species in varying mixtures. Several tree species could be mapped with surprising accuracy (mean F1-score = 0.47). In homogenous species assemblages, the accuracy increased considerably (mean F1-score 0.55). The results indicate that many tree species can be mapped without generating training data and by integrating pre-existing knowledge from citizen science. Moreover, our analysis revealed that citizen science photographs’ variability in acquisition data and context facilitates the generation of models that are transferable through the vegetation season. Thus, citizen science data may greatly advance our capacity to monitor hundreds of plant species and, thus, Earth's biodiversity across space and time.

Received: 02 Nov 2023 – Discussion started: 05 Dec 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 79805 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (79805 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

14 Jun 2024

From simple labels to semantic image segmentation: leveraging citizen science plant photographs for tree species mapping in drone imagery

Salim Soltani, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer, and Teja Kattenborn

Biogeosciences, 21, 2909–2935, https://doi.org/10.5194/bg-21-2909-2024,https://doi.org/10.5194/bg-21-2909-2024, 2024

Short summary

Salim Soltani, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer, and Teja Kattenborn

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2576', Anonymous Referee #1, 12 Jan 2024

Thank you for the opportunity to review this manuscript. In this article the authors present an innovative method to incorporate citizen science photographs of trees to segment and classify ten deciduous tree species from aerial images, using a Convolutional Neural Network. The two-step approach of using simple labels of citizen science data to create masks for a segmentation model is innovative and highly relevant. I think that the paper fits well within the scope of this journal and presents an application of an interesting new approach to remote sensing.
The manuscript as a whole is very well structured. Only the first part of the abstract could be shortened significantly.
Comments
1) The first part of the abstract, that presents an overview of the problem could be shortened to make it more concise (try to summarise each section of the manuscript in 1-3 sentences).
2) l. 250 why did you choose EfficientNetV2L over the other tested backbone architectures?
3) l. 261 how much % of the images were assigned NA? Did this influence the model training?
4) Could you explain the term “replacements” (e.g. l. 240)?
5) Do you think the amount of misclassified data could be a problem for the training of the segmentation model? (l. 297-298)
6) 0.22 cm already seems like very high resolution. Many remote sensing studies focus on making high resolution reference data more usable over large areas (i.e. by adapting it to satellite data). You argue for the use of even finer resolution data in the future. What research objectives could be studies using this very high resolution of UAV data? Is there a research gap for very high prediction accuracy over relatively small areas? Could multispectral/hyperspectral sensors be more useful than higher resolution?

Minor comments
l. 29 Please remove the “and” between “data” and “by”
l. 51 “unleash” might not be the right word; “harness” might be better suited
“provided” might be better instead of “given”
l. 56-60 This sentence is not completely clear to me. Maybe you can reformulate it to make it
easier to read.
l. 63 Please remove “similar”, as it is unnecessary
l. 66 Consider combining sentence “[…] costly, as training data […]”
l. 81 Is the training data limited or just costly/time consuming to generate?
l. 89 “platforms”
l. 90/95 “mil” or “M”;
please remove “of”
l. 97 Please remove “The” before “Pl@ntNet”
l. 109 “Ideally, for species mapping applications […]”
l. 115-120 This part might fit better in the Methods section
l. 198 Please remove “Accordingly”
l. 235 “were afterward rasterized”
l. 240-241 What does “sampled with replacement” mean?
l. 317 Please replace “while” with “although”, or similar
l. 337-341 This might fit better in the Discussion section
l. 367 “varying”
l. 373 “partially relatively inaccurate” → This is a little vague. Maybe expand upon it a
little.
l. 387-389 Please remove one instance of “plots with more species (two or four)”
l. 393 “higher value” than what?
l. 442 Maybe you can find a better phrasing than “diversity of human behaviour”
l. 457 “often costly”
l. 484 “large” instead of “excessive” (which means unreasonably much)
l. 485 “good transferability”
Figure 2: The text font is very small. It would also be better if the labels match the ones used in the text: “Ortho_July” and “Ortho_September” instead of “Ortho 1” and “Ortho 2”
Figure 4: The text font here is also very small.
Figure 6: The height of the transects seems to be different between plots (eg. plot 29 and plot 33). If they are all the same (2 m), please show them with the same extents in the figure as well.

Citation: https://doi.org/10.5194/egusphere-2023-2576-RC1
- AC1: 'Response to the first reviewer's comment', Salim Soltani, 28 Mar 2024
  
  Salim Soltani, Remote Sensing Center for Earth System Research
  University of Leipzig, salim.soltani@uni-leipzig.de
  
  Ref. No.: egusphere-2023-2576- “From simple labels to semantic image segmentation: Leveraging citizen science plant photographs for tree species mapping in drone imagery “
  
  Dear reviewer,
  We would like to thank you for your constructive comments that allowed us to improve the quality of the manuscript and for the time that you spent commenting on the manuscript.
  We have addressed the first reviewer's comments. We hope that the revised manuscript addresses all the shortcomings of the earlier version.
  
  Kind regards,
  Salim Soltani
  (on behalf of the Co-authors, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer , Teja Kattenborn)
  
  Citation: https://doi.org/10.5194/egusphere-2023-2576-AC1
RC2:
'Comment on egusphere-2023-2576', Anonymous Referee #2, 04 Apr 2024

I enjoyed reading the manuscript and its rigorous approach to image segmentation and have no additional comments in addition to those of Reviewer #1.

Citation: https://doi.org/10.5194/egusphere-2023-2576-RC2
- AC2: 'Response to the second reviewer's comment', Salim Soltani, 05 Apr 2024
  
  Dear reviewer,
  We would like to thank you for your positive evaluation of the manuscript.
  We thoroughly addressed the constructive suggestions of reviewer 1.
  
  Kind regards,
  Salim Soltani
  (on behalf of the Co-authors, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer , Teja Kattenborn)
  
  Citation: https://doi.org/10.5194/egusphere-2023-2576-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2576', Anonymous Referee #1, 12 Jan 2024

Thank you for the opportunity to review this manuscript. In this article the authors present an innovative method to incorporate citizen science photographs of trees to segment and classify ten deciduous tree species from aerial images, using a Convolutional Neural Network. The two-step approach of using simple labels of citizen science data to create masks for a segmentation model is innovative and highly relevant. I think that the paper fits well within the scope of this journal and presents an application of an interesting new approach to remote sensing.
The manuscript as a whole is very well structured. Only the first part of the abstract could be shortened significantly.
Comments
1) The first part of the abstract, that presents an overview of the problem could be shortened to make it more concise (try to summarise each section of the manuscript in 1-3 sentences).
2) l. 250 why did you choose EfficientNetV2L over the other tested backbone architectures?
3) l. 261 how much % of the images were assigned NA? Did this influence the model training?
4) Could you explain the term “replacements” (e.g. l. 240)?
5) Do you think the amount of misclassified data could be a problem for the training of the segmentation model? (l. 297-298)
6) 0.22 cm already seems like very high resolution. Many remote sensing studies focus on making high resolution reference data more usable over large areas (i.e. by adapting it to satellite data). You argue for the use of even finer resolution data in the future. What research objectives could be studies using this very high resolution of UAV data? Is there a research gap for very high prediction accuracy over relatively small areas? Could multispectral/hyperspectral sensors be more useful than higher resolution?

Minor comments
l. 29 Please remove the “and” between “data” and “by”
l. 51 “unleash” might not be the right word; “harness” might be better suited
“provided” might be better instead of “given”
l. 56-60 This sentence is not completely clear to me. Maybe you can reformulate it to make it
easier to read.
l. 63 Please remove “similar”, as it is unnecessary
l. 66 Consider combining sentence “[…] costly, as training data […]”
l. 81 Is the training data limited or just costly/time consuming to generate?
l. 89 “platforms”
l. 90/95 “mil” or “M”;
please remove “of”
l. 97 Please remove “The” before “Pl@ntNet”
l. 109 “Ideally, for species mapping applications […]”
l. 115-120 This part might fit better in the Methods section
l. 198 Please remove “Accordingly”
l. 235 “were afterward rasterized”
l. 240-241 What does “sampled with replacement” mean?
l. 317 Please replace “while” with “although”, or similar
l. 337-341 This might fit better in the Discussion section
l. 367 “varying”
l. 373 “partially relatively inaccurate” → This is a little vague. Maybe expand upon it a
little.
l. 387-389 Please remove one instance of “plots with more species (two or four)”
l. 393 “higher value” than what?
l. 442 Maybe you can find a better phrasing than “diversity of human behaviour”
l. 457 “often costly”
l. 484 “large” instead of “excessive” (which means unreasonably much)
l. 485 “good transferability”
Figure 2: The text font is very small. It would also be better if the labels match the ones used in the text: “Ortho_July” and “Ortho_September” instead of “Ortho 1” and “Ortho 2”
Figure 4: The text font here is also very small.
Figure 6: The height of the transects seems to be different between plots (eg. plot 29 and plot 33). If they are all the same (2 m), please show them with the same extents in the figure as well.

Citation: https://doi.org/10.5194/egusphere-2023-2576-RC1
- AC1: 'Response to the first reviewer's comment', Salim Soltani, 28 Mar 2024
  
  Salim Soltani, Remote Sensing Center for Earth System Research
  University of Leipzig, salim.soltani@uni-leipzig.de
  
  Ref. No.: egusphere-2023-2576- “From simple labels to semantic image segmentation: Leveraging citizen science plant photographs for tree species mapping in drone imagery “
  
  Dear reviewer,
  We would like to thank you for your constructive comments that allowed us to improve the quality of the manuscript and for the time that you spent commenting on the manuscript.
  We have addressed the first reviewer's comments. We hope that the revised manuscript addresses all the shortcomings of the earlier version.
  
  Kind regards,
  Salim Soltani
  (on behalf of the Co-authors, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer , Teja Kattenborn)
  
  Citation: https://doi.org/10.5194/egusphere-2023-2576-AC1
RC2:
'Comment on egusphere-2023-2576', Anonymous Referee #2, 04 Apr 2024

I enjoyed reading the manuscript and its rigorous approach to image segmentation and have no additional comments in addition to those of Reviewer #1.

Citation: https://doi.org/10.5194/egusphere-2023-2576-RC2
- AC2: 'Response to the second reviewer's comment', Salim Soltani, 05 Apr 2024
  
  Dear reviewer,
  We would like to thank you for your positive evaluation of the manuscript.
  We thoroughly addressed the constructive suggestions of reviewer 1.
  
  Kind regards,
  Salim Soltani
  (on behalf of the Co-authors, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer , Teja Kattenborn)
  
  Citation: https://doi.org/10.5194/egusphere-2023-2576-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to minor revisions (review by editor) (08 Apr 2024) by Paul Stoy

AR by Salim Soltani on behalf of the Authors (11 Apr 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (22 Apr 2024) by Paul Stoy

AR by Salim Soltani on behalf of the Authors (01 May 2024) Manuscript

Journal article(s) based on this preprint

14 Jun 2024

From simple labels to semantic image segmentation: leveraging citizen science plant photographs for tree species mapping in drone imagery

Salim Soltani, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer, and Teja Kattenborn

Biogeosciences, 21, 2909–2935, https://doi.org/10.5194/bg-21-2909-2024,https://doi.org/10.5194/bg-21-2909-2024, 2024

Short summary

Salim Soltani, Olga Ferlian, Nico Eisenhauer, Hannes Feilhauer, and Teja Kattenborn

Viewed

Total article views: 1,022 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
743	222	57	1,022	22	20

HTML: 743
PDF: 222
XML: 57
Total: 1,022
BibTeX: 22
EndNote: 20

Views and downloads (calculated since 05 Dec 2023)

Month	HTML	PDF	XML	Total
Dec 2023	490	140	36	666
Jan 2024	69	14	5	88
Feb 2024	26	15	3	44
Mar 2024	45	15	3	63
Apr 2024	55	16	7	78
May 2024	38	16	3	57
Jun 2024	20	6	0	26

Cumulative views and downloads (calculated since 05 Dec 2023)

Month	HTML	PDF	XML	Total
Dec 2023	490	140	36	666
Jan 2024	69	14	5	88
Feb 2024	26	15	3	44
Mar 2024	45	15	3	63
Apr 2024	55	16	7	78
May 2024	38	16	3	57
Jun 2024	20	6	0	26

Viewed (geographical distribution)

Total article views: 992 (including HTML, PDF, and XML) Thereof 992 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 14 Jun 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (79805 KB)
Metadata XML

Short summary

In this research, we developed a novel method using citizen science data as alternative training data for computer vision models to map plant species in Unoccupied Aerial Vehicles (UAVs) images. We use citizen science plant photographs to train models and apply them to UAV images. We tested our approach on UAV images of a test site with ten different tree species, yielding accurate results. This research shows the potential of citizen science data to advance our ability to monitor plant species.


Total:	0
HTML:	0
PDF:	0
XML:	0