Technical Note: Monitoring discharge of mountain streams by retrieving image features with deep learning
Abstract. Traditional discharge monitoring usually relies on measuring flow velocity and cross-section area with various velocimeters or remote-sensing approaches. However, the topography of mountain streams in remote sites largely hinders the applicability of velocity-area methods. We here present a method to continuously monitor mountain stream discharge using a low-cost commercial camera and deep learning algorithm. A procedure of automated image categorization and discharge classification was developed to extract information on flow patterns and volumes from high-frequency red–green–blue (RGB) images with deep convolutional neural networks (CNNs). The method was tested at a small, steep, natural stream reach in southern China. Reference discharge data was acquired from a V-shaped weir and ultrasonic flowmeter installed a few meters downstream of the camera system. Results show that the discharge-relevant stream features implicitly embedded in RGB information can be effectively recognized and retrieved by CNN to achieve satisfactory accuracy in discharge measurement. Coupling CNN and traditional machine learning models (e.g., support vector machine and random forest) can potentially synthesize individual models’ diverse merits and improve generalization performance. Besides, proper image pre-processing and categorization are critical for enhancing the robustness and applicability of the method under environmental disturbances (e.g., weather and vegetation on river banks). Our study highlights the usefulness of deep learning in analyzing complex flow images and tracking flow changes over time, which provides a reliable and flexible alternative apparatus for continuous discharge monitoring of rocky mountain streams.
Chenqi Fang et al.
Status: open (until 08 Jul 2023)
- RC1: 'Comment on egusphere-2023-659', Anonymous Referee #1, 26 May 2023 reply
Chenqi Fang et al.
Chenqi Fang et al.
Viewed (geographical distribution)
To address the difficulty of achieving continuous monitoring of discharge in mountain streams using traditional methods, the authors proposed a low-cost measurement method for mountain streams discharge based on video images and deep learning algorithms. There are still some limitations in this study and should be major revision.
（1）Please check the copyright of the image in Figure 2. It seems that you directly used the image of LeNET as the part of your neural network.
（2）Please explain the practicality of your method. You chose to focus on monitoring discharge within the range of 0.014 and 0.050 m3/s. However, For discharge monitoring at a larger scale, can your sample size meet the practical needs? How do you evaluate the impact of the difference between the discharge obtained from the empirical formula, which serves as the ground truth, and the actual flow discharge on the stability of your algorithm, it might from different distributions? How do you assess whether the model obtained based on this sample size has sufficient generalization ability and avoids overfitting to data you collected?
（3）As you have employed an end-to-end approach, could you explain the reasons behind your choice of pre-processing? How does the inclusion or exclusion of these measures impact your results? It is worth considering that the chosen pre-processing methods might inadvertently remove essential features. In comparison to standard processing operations, what advantages does your approach offer?
（4）How is your loss function defined, is only nllloss? and please provide details of your training algorithm. Why did you choose AlexNet? What are the differences between AlexNet and other neural networks proposed in recent years?
（5）The author used CNN+SVM. Why did you design such a combination? Specifically, when choosing a specific layer's CNN output as the input to SVM, why did you select that particular layer? If the intention was to leverage high-level features extracted by CNN, why not directly use structures like AutoEncoders to compress the input into low-dimensional latent variables and then perform SVM decomposition based on these latent variables? Are you assuming the SVM can learn a better mapping than the NN? and how does the limitation of SVM's input dimensionality affect your performance? SVM is sensitive to the hyperparameters, How did you select the hyperparameters for SVM? and how you optimized then?
（6） All abbreviations should be defined when they first appear in the article and the author should correct them in the manuscript.
（7）The authors mention in line 124-125 that "real-time discharge is calculated at a time step of two minutes", but it is unclear how to match video images with a temporal resolution of 5 minutes.
（8）The author designed an automated categorization procedure to screen the raw images and exclude the "Raindrops" and "Dark" samples from model training. The author defines "Good quality" contains image samples without obvious noise or shadow in line 151-152, but how does the algorithm implement the determination of it?
（9）In Figure 2, the image represents the dark images still seem to have relatively clear visibility, can the author's model effectively cover the 7:00-19:00 interval if such images are also directly removed? After removing the "Raindrops" and "Dark" samples, how many valid images remain?
（11）The author built the model based on the idea of image classification, whereas discharge is a continuous variable more suitable for regression problems in deep learning. Therefore, the author's choice of the classification model needs to be justified.
（12）During the process of constructing the model, the author needs to explain how the loss function was set and how the loss changed during training on both the training and validation sets.
（13）In line 135-136, the author mentions that "7,757 image samples labeled with 37 discharge values between 0.014 and 0.050 m³/s were collected for model testing". However, in line 311, "100 stream images corresponding to each discharge volume for model training and validation" are used. Why only 100 stream images were used for model training and validation?
（14）Section 3.1 in the results was not mentioned before, and it should be reflected in the methods section.
（15）Coordinate labels in Figure 4 (b-2) are partly obscured.
（16）Figure 5 is more methodologically oriented, and it should be included in the methodology section rather than the results section.
（17）What is the significance of the author's additional classification of low shadow images, medium shadow images, and water reflection images? The author only mentioned the distribution of images in a ratio of 7:1:1:1 in lines 312-314.
（18）The authors have classified the video images, and in the results section, the effect of discharge recognition of "Good quality", "Below shadow", "Middle shadow", and "Water reflection" should be discussed individually.
（20）Although the evaluation metrics used in the study are widely applicable, I suggest that the authors provide a brief introduction to the Spearman correlation coefficient, accuracy, RMSE and other metrics mentioned in methodology section.
（21） Under the condition of imbalanced data samples, the performance of the model cannot be comprehensively evaluated by the accuracy metric. It is recommend that the authors supplement the results section with comprehensive metrics such as F1 scores, which will help to improve the reliability of the research conclusions.