the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Automatic detection of instream large wood in videos using deep learning
Abstract. Instream large wood (i.e., downed trees, branches and roots larger than 1 m in length and 10 cm diameter) has essential geopmorphological and ecological functions supporting the health of river ecosystems. Still, even though its transport during floods may pose a risk, it is rarely observed and, therefore, poorly understood. This paper presents a novel approach to detect pieces of instream wood from video. The approach uses a Convolutional Neural Network to detect wood automatically. We sampled data to represent different wood transport conditions, combining 20 datasets to yield thousands of instream wood images. We designed multiple scenarios using different data subsets with and without data augmentation and analyzed the contribution of each one to the effectiveness of the model using k-fold cross-validation. The mean average precision of the model varies between 35 and 93 percent, and is highly influenced by the quality of the data which it detects. When the image resolution is low, the identified components in the labeled pieces, rather than exhibiting distinct characteristics such as bark or branches, appear more akin to amorphous masses or 'blobs'. We found that the model detects wood with a mean average precision of 67 percent when using a 418 pixels input image resolution. Also, improvements of up to 23 percent could be achieved in some instances and increasing the input resolution raised the weighted mean average precision to 74 percent. We show that the detection performance on a specific dataset is not solely determined by the complexity of the network or the training data. Therefore, the findings of this paper can be used when designing a custom wood detection network. With the growing availability of flood-related videos featuring wood uploaded to the internet, this methodology facilitates the quantification of wood transport across a wide variety of data sources.
- Preprint
(2832 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2024-792', Andrés Iroumé, 14 May 2024
Is a very well written and interesting manuscript.
I have a few suggestions intended to complete/improve some aspects.
They are:
Introduction:
- Page 1, L19-20. Natural mortality wind, snow loads, wildfires and beaver activities can also be recruitment sources.
- Page 1, L20. “Wood plays a crucial role by trapping sediment, creating pools, and generating spatially varying flow patterns” not only as it distributes along the riverbanks, but also when stored withing the active or bankfull channel.
- Page 2, L34. The number of observations of instream wood is scarce? I do not fully agree. Perhaps the amount of observations of instream wood dynamics is scarce, so please clarify.
- Page 2, L43, about the best methods to quantify wood transport. Not only video-based methods, but also the installation of a GPS in each wood is a very good method, but extremely expensive.
Methods:
- Page 3, L86. Figure 1does not give an overview of the data collection and processing. It gives an overview of the process to follow to collect and process data. Please also correct the title of Fig. 1 below the figure.
- Page 4, L107 and 115. Figure or figure? Please decide.
Discussion and conclusion:
- I do not find comments related to the limitations of the use of low-cost cameras, and how to avoid these limitations, may be by using high resolution cameras, installations, others. Please discuss and conclude.
Citation: https://doi.org/10.5194/egusphere-2024-792-CC1 -
AC2: 'Reply on CC1', Janbert Aarnink, 04 Jul 2024
Thank you for your comments and your help in increasing the quality of this manuscript.
The suggestions are well appreciated. We might indeed need to be more clear on wood observations verses wood dynamics observations. We have also had more comments about the figure and will try to make it clearer. Furthermore we will add a section where we will discuss the limitations of low cost cameras and further elaborate.
Citation: https://doi.org/10.5194/egusphere-2024-792-AC2
-
AC2: 'Reply on CC1', Janbert Aarnink, 04 Jul 2024
-
RC1: 'Comment on egusphere-2024-792', Diego Panici, 14 Jun 2024
The manuscript is about the automatic detection of instream large wood in video recording using deep learning tools. The results are really intriguing, but I believe that a substantial revision will be needed before considering this paper for publication. Here are some major comments:
First, there is limited to no comparison with other existing models. CNNs are widely used for image recognition (and, indeed, the quthors acknowledged YOLO being the most wide spread algorithm), yet, there is no comparative analysis with other studies or algorithms.
Second, the overall aim and output of this manuscript is really unclear. It is necessary to explicitate this further and emphasise what the study has revealed and what increase in scientific knowledge it has brought. As things stand, it is hard to discern what is the new scientific knowledge that this paper has produced.
Third, the paper structure needs substantial changes. The results and discussion sections merged together makes difficult to discern between the actual observations and the authors' analysis. It is essential that the two sections are kept separate. The language used is also not appropriate for a scientific paper: this was mostly informal and colloquial and needs thorough revision.
Fourth, the method was unclear and lacked explanation (at times it was not even easy to understand what cameras have been used, where and how, whilst a schematic would have helped). Overall, this limits the generalisation of the method proposed.
An annotated version is also provided with in-line comments.
-
AC1: 'Reply on RC1', Janbert Aarnink, 04 Jul 2024
Thank you very much for the comments, we appreciate the time taken to revise our manuscript and the suggestions that will contribute to a significant increase in quality of the paper. In the following, we reply to each of your comments. In the coming weeks, we will revise and make all changes accordingly in our revised version.
‘
First, there is limited to no comparison with other existing models. CNNs are widely used for image recognition (and, indeed, the quthors acknowledged YOLO being the most wide spread algorithm), yet, there is no comparative analysis with other studies or algorithms.
‘
Thanks for the comment, we understand the concern and we agree that we can elaborate more on the alternative methods from different fields and create a more detailed comparison. The current state of the art is only discussed briefly and the paper will benefit by explaining more clearly what the gaps are that the proposed method fills. Also, different techniques from other fields (for instance lidar tech) can be added to the comparison. We will address in the introduction and in the discussion of the revised manuscript. We will introduce the current state of the art in the field wood detection, and expand the use of CNN in other similar fields, like for example, fish passage or plastic transport. For example, adding more past and recent works, like:
https://www.sciencedirect.com/science/article/pii/S0169555X24001351
https://www.jads.nl/case/river-plastic-monitoring/
https://www.sciencedirect.com/science/article/pii/S0303243422000083
https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2019EA000960
https://www.sciencedirect.com/science/article/pii/S146419090185005X
https://academic.oup.com/icesjms/article/80/7/1911/7240285
In addition to the aforementioned models, we will add machine learning baselines to justify the use of the YOLO architecture compared to simpler computer vision algorithms:
-
A standard convolutional neural network (also informally referred to as a “vanilla CNN”)
-
Region-based convolutional neural networks
-
Non-ML computer vision baselines, using, e.g., thresholding and edge detection
‘
Second, the overall aim and output of this manuscript is really unclear. It is necessary to explicitate this further and emphasise what the study has revealed and what increase in scientific knowledge it has brought. As things stand, it is hard to discern what is the new scientific knowledge that this paper has produced.
‘
Thanks for rising this issue. We understand that the goal of our study was not clear enough, and will edit the text to fix this. The goal of our work was to develop an algorithm that is able to detect and track floating wood pieces in any river and under various conditions. Such an algorithm did not exist at the moment, as the current tools are site specific and require site-specific calibration. The future applications of our algorithm will also be further explained. The immediate use of our CNN is the computation of wood fluxes to better understand wood dynamics in rivers, but there are also many practical applications, such as warning systems or flood risks estimations. We will expand this in the introduction and the discussion of the revised manuscript.
‘
Third, the paper structure needs substantial changes. The results and discussion sections merged together makes difficult to discern between the actual observations and the authors' analysis. It is essential that the two sections are kept separate. The language used is also not appropriate for a scientific paper: this was mostly informal and colloquial and needs thorough revision.
‘
Thank you for pointing out this was not clear. We will separate the results and discussion section and work on the structure for a clearer storyline. We will also go through the paper and revise the language carefully to make it less informal.
‘
Fourth, the method was unclear and lacked explanation (at times it was not even easy to understand what cameras have been used, where and how, whilst a schematic would have helped). Overall, this limits the generalisation of the method proposed.
‘
We understand this concern and see that there were some confusing parts in the methods section. We added the source references for the CNN methods, but we will expand this to provide additional details about the , the collection of data (e.g. sites, devices, etc,), the machine learning methods, the labeling and the processing. The table of the datasets will be updated and elaborated, and we will add a more clear figure to support the section.
‘
Fifth: An annotated version is also provided with in-line comments.
‘
Thanks for the annotations, we will revise the text following each suggestion, and will show our changes tracked in the revised manuscript.
Citation: https://doi.org/10.5194/egusphere-2024-792-AC1 -
-
AC1: 'Reply on RC1', Janbert Aarnink, 04 Jul 2024
-
RC2: 'Comment on egusphere-2024-792', Chris Tomsett, 19 Jul 2024
This paper investigates a novel concept for monitoring wood in rivers, developing on existing algorithm development of CNNs for image recognition. There are numerous applications and possible impacts for this work, both for research and monitoring. The authors make a good case for the necessity of the research and offer a good grounding in some of the key concepts for a reader who is knew to machine learning methods, as well as those more familiar with them.
The writing needs to be improved throughout, as there are numerous occasions of informal writing which feel out of place, such as ‘made a recent come-back’. Likewise, there are spelling mistakes and issues on consistency between American and English, which I am aware can be a challenge when writing in a non-native language. This can be helped by making sure both the spell-check and dictionary of all text in the document are set to one or the other. Moreover, some of the text struggles to convey the complexity of the methods in places, with repetition followed by missing detail.
The scenario design is clear, however how these scenarios fit in with some of the other analyses being undertaken is less apparent. There seems to be several sections which are additional scenarios/tests throughout the paper which are not clearly explained in the methods. Furthermore, there are numerous scenarios which are outlined in the methods which are not commented on in the results or discussion. These should either be discussed, or possibly removed (placed in supplementary) for the revision. Some of the additional analyses could then be incorporated as scenarios to make it easier to understand for the reader. Moreover, as the methods are quite complex, an improved schematic overview of the workflow would benefit the manuscript.
The results and discussion are currently presented as one. It would be best to separate them in this instance, with a discussion focussing on the reasons why some scenarios performed better, the limitations of the design, and the impact this may have for wood monitoring. This seems to be the largest element missing from this paper. Overall, it has great potential for helping to improve wood monitoring in rivers, but this is only briefly covered in the discussion, despite the overwhelming literature relating to the importance of wood in rivers, the hazards they present, and the different methods currently being used to monitor them. The results themselves are also not covered in full which is surprising as the order of magnitude of results is similar to those results that are covered.
The figures and tables throughout are of good standard, and with some adjustment would be suitable for publication. The only major differences would be the inclusion/adjustment of the methodology schematic as the visual elements would help the reader alongside the text, as well as the inclusion of a location map for the dataset origins in Figure 2.
Overall, the paper shows good promise and with some adjustments to the content, bolstering some of the justification, improving the writing standard, and focusing on the relevance of the work, would provide a useful addition to the field.
Further detailed comments provided in the attached.
Model code and software
Codebase for "Automatic Detection of Instream Large Wood in Videos Using Deep Learning" J. Aarnink and T. Beucler https://github.com/janbertoo/Instream_Wood_Detection
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
300 | 103 | 36 | 439 | 15 | 17 |
- HTML: 300
- PDF: 103
- XML: 36
- Total: 439
- BibTeX: 15
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1