Technical Note: Can Visual Gauges Trained on Biased Contact-based Gauge Data Accurately Estimate River Stage?
Abstract. Water stage variations significantly influence biochemical and hydrological processes within river networks. River camera, with its ease of deployment and low cost, has emerged as a promising tool for water stage estimation, enabling efficient water stage interpretation from images via deep learning (DL). However, a critical challenge is the requirement of accurate water stage data for DL training, which often have biases caused by sedimentations, floating debris or water flow impacts associated with contact-based gauge observations. Previous studies have overlooked the influence of gauge data errors in real-world applications. This study introduces an imaging-based water stage estimation framework that addresses hidden errors in gauge station measurements for training DL models. The framework adopts a multi-task learning paradigm, using erroneous gauge stage data as labels and incorporating water pixel ratios automatically extracted from images to constrain model estimation ranking. Based on training loss, a thresholding method then filters error-free data to retrain an unbiased model. This framework is tested on images and bubble-gauge stage data from the Minturn River, Greenland, spanning 2019 to 2021. The results obtained show the framework successfully identified a gauge offset event on July 29, 2021, and mitigated an average water stage observation error of approximately 0.6 meters thereafter. Moreover, the trained DL model revealed water stage fluctuations under low-flow conditions that gauge observation could not reflect. This study implies that integrating contact and non-contact observations is a robust approach for river stage measurement.
The authors present a technical note introducing an AI-based approach for river stage measurement using camera imagery, leveraging a multi-task learning framework. The core idea is to directly learn stage information from images while incorporating relative stage information from an image segmentation task as an auxiliary loss. The consideration of a multi-task learning approach is interesting, as it offers a potentially more robust way to train the neural network, and the effort to automatically filter noisy stage reference data is noted. While the study explores a novel application of multi-task learning, I think, several fundamental and methodological issues prevent its suitability for publication in HESS in its current form.
The most critical concern lies in the overall suitability and motivation for an approach that learns the absolute stage directly from images. The reliance on on-site gauge data for training at every new location significantly limits its utility, particularly for ungauged catchments, which are the primary target for innovative remote sensing techniques. As gauged catchments already possess well-established, high-accuracy stage measurement methods, the practical added value of this camera-based approach for these sites is questionable. Also, there are already studies discussing the potential and limits of directly learning the stage from images, which are not mentioned in this study (e.g. Vanden Boomen et al., 2021). Furthermore, the risk is high that the approach is highly sensitive to any movements (internal or external geometry) of the camera setup. Such movements would likely necessitate a complete re-learning of the model, which is a significant practical limitation and is not adequately addressed in the current work. Finally, the authors' premise that obtaining accurate stage data is a critical challenge for all DL-based camera gauges is debatable. For approaches relying on photogrammetry, the stage data serves only as a reference, not as the primary input for the AI-model, thereby mitigating this "critical challenge." A stronger, more refined motivation for this specific DL-only approach is needed.
The paper utilizes pixel information from segmented images to provide relative stage information but lacks sufficient discussion on the segmentation process itself. This is a significant omission, especially since several established studies (e.g., Eltner et al., 2021; Zamboni et al., 2025, Moghimi et al., 2024) already perform this kind of water segmentation for stage measurement, and the potential for segmentation errors and their influence on the multi-task learning is not discussed at all. Furthermore, the review fails to include relevant, state-of-the-art photogrammetric approaches that use water segmentation (e.g., Blanch et al., 2025). Given that the study site appears highly suitable for these methods, a direct comparison and justification for choosing the DL-only approach is necessary. Also, the achieved accuracy, appearing to be in the decimeter (dm) range, is not competitive with the centimeter (cm) accuracy demonstrated by other camera gauge studies, particularly those using robust photogrammetric methods (e.g., Eltner et al., 2021, Erfani et al., 2023, Blanch et al., 2025). Therefore, also the title of the manuscript is misleading because I think, the achieved accuracies cannot be described accurate. Finally, the approach involves combining two loss functions, which necessitates the fine-tuning of the lambda value. This introduces a hyperparameter that must be manually tuned, complicating the model's reliability and generality.
The suggested automatic detection of gauge errors appears effective only for very strong and obvious errors. It is unclear why an established statistical approach would not be equally or more effective for this task. The authors apply an automatic post-processing/filtering step to refine the training data, assuming the error resides in the stage data and not the camera imagery. This assumption needs stronger justification. The lack of provided code is a serious concern, particularly for a technical note. This does not comply with the FAIR principles, which are essential for research reproducibility.
While the multi-task learning idea is technically interesting, the manuscript does not provide a compelling scientific or practical justification for an approach that learns stage directly from images given the high site-specificity, sensitivity to movement, and lower accuracy compared to established methods. The fundamental questions regarding transferability and the need for new methods at gauged catchments remain unanswered. Furthermore, significant methodological detail is missing, and the paper does not adhere to open science principles.
References:
Blanch, X., Grundmann, J., Hedel, R., & Eltner, A. (2025). AI Image-based method for a robust automatic real-time water level monitoring: A long-term application case. https://doi.org/10.5194/egusphere-2025-724
Eltner, A., Bressan, P. O., Akiyama, T., Gonçalves, W. N., & Marcato Junior, J. (2021). Using Deep Learning for Automatic Water Stage Measurements. Water Resources Research, 57(3). https://doi.org/10.1029/2020WR027608
Erfani, S. M. H., C. Smith, Z. Wu, E. A. Shamsabadi, F. Khatami, A. R. J. Downey, J. Imran, and E. Goharian. 2023. “Eye of Horus: A Vision-Based Framework for Real-Time Water Level Measurement.” Hydrology and Earth System Sciences 27 (22): 4135–4149. https://doi.org/10.5194/ hess-27-4135-2023 .
Moghimi, A., M. Welzel, T. Celik, and T. Schlurmann. 2024. “A ComparativePerformance Analysis of Popular Deep Learning Models and Segment Anything Model (SAM) for River Water Segmentation in Close-Range Remote Sensing Imagery.” Institute of Electrical and Electronics Engineers Access 12:52067–52085. https://doi.org/10.1109/ACCESS.2024.3385425
Vanden Boomen, R. L., Yu, Z., & Liao, Q. (2021). Application of Deep Learning for Imaging-Based Stream Gaging. Water Resources Research, 57(11). https://doi.org/10.1029/2021WR029980
Zamboni, P. A. P., Blanch, X., Marcato Junior, J., Gonçalves, W. N., & Eltner, A. (2025). Do we need to label large datasets for river water segmentation? Benchmark and stage estimation with minimum to non-labeled image time series. International Journal of Remote Sensing, 46(7), 2719–2747. https://doi.org/10.1080/01431161.2025.2457131