An ADCP-Based Data-Driven Framework for Proxy Sediment Transport Monitoring: From Controlled Flumes to Natural Rivers

Tuhin, Mohammd Tanvir Haque; Hinkelmann, Reinhard; Mudersbach, Christoph

doi:10.5194/egusphere-2025-6190

Preprints

https://doi.org/10.5194/egusphere-2025-6190

Preprints

05 Mar 2026

| 05 Mar 2026

An ADCP-Based Data-Driven Framework for Proxy Sediment Transport Monitoring: From Controlled Flumes to Natural Rivers

Mohammd Tanvir Haque Tuhin, Reinhard Hinkelmann, and Christoph Mudersbach

Abstract. Acoustic Doppler Current Profilers (ADCPs) provide a rich yet underutilized source for monitoring hydrodynamics and sediment transport. Accurate prediction of sediment‐related variables is critical for river engineering, morphological studies, and environmental management. Among these, Bottom-Track Velocity (BT_Vel) serves as a robust proxy for near-bed sediment dynamics and bedload activity. This study develops a machine learning (ML) and deep learning (DL) framework to predict BT_Vel from ADCP-derived hydrodynamic and acoustic features, enabling proxy estimation of sediment transport processes in both controlled flume and natural riverine environments. Two datasets were analyzed: (i) a laboratory dataset of 22,650 ensemble samples obtained under controlled flow regimes, and (ii) a field dataset of 5,900 ensemble samples collected across seven campaigns at a fixed river cross-section. A consistent benchmarking strategy was applied across Random Forest, Gradient Boosting, LightGBM, CatBoost, XGBoost, LSTM, GRU, CNN, RNN, ANN, and a hybrid LSTM+CNN, with evaluation based on both an 80/20 split and a stratified 5-fold cross-validation (CV). SHAP analysis was conducted for model interpretability. In the laboratory, Random Forest (R² = 0.804 split / 0.783 CV) and Gradient Boosting (0.787 / 0.757) achieved the best generalization, while LSTM+CNN (0.770 / 0.730) and LSTM (0.775 / 0.718) remained competitive. In the field, Random Forest again delivered the strongest results (0.573 / 0.603), followed closely by CatBoost, LightGBM, and XGBoost. Notably, LSTM improved under cross-validation (0.468 → 0.529), suggesting fold-wise diversity stabilized training under noisy, heterogeneous river data. By contrast, the Stacking Regressor consistently showed the weakest generalization across both environments. SHAP revealed a shift in feature relevance: in the laboratory, Mean water velocity (Mean_Speed) dominated predictions, while in the field, Depth and signal-to-noise ratio (SNR) emerged as stronger drivers, reflecting the influence of stage variability and acoustic quality. Overall, the study demonstrates that ADCP-derived features, coupled with explainable ML/DL models, provide robust potential for proxy sediment transport modeling. Conversion to absolute transport rates requires paired sediment measurements, while future work should expand field campaigns and explore hybrid physics–data frameworks toward operational forecasting.

Received: 11 Dec 2025 – Discussion started: 05 Mar 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 1650 KB)

Supplement (152 KB)

Download & links

Mohammd Tanvir Haque Tuhin, Reinhard Hinkelmann, and Christoph Mudersbach

Status: final response (author comments only)

RC1:
'Comment on egusphere-2025-6190', Slaven Conevski, 04 May 2026
General Comments:

The paper presents a comprehensive attempt to utilize Machine Learning (ML) to estimate bedload variables using ADCP data. The core premise—predicting bedload solely from ADCP-derived inputs—is practically relevant. However, the current workflow and conceptual focus require significant refinement. While the literature review is strong, the "black box" nature of the current approach and the omission of established data-cleaning protocols limit the study's validity. I recommend a Major Revision (or Resubmit) based on the following concerns:
Major Critical Points:
Circular Logic & Model Validity: The authors use apparent bedload velocity (va) as the target variable while using input variables from the same instrument. The manuscript claims these models will perform same good if the target variables change, but provides no comparative evidence. Without a physical baseline or cross-instrument validation, the model risks being a "black box" with limited transferability to other target sets. The va – targets and the input feature set are coming from a same instrument. So the major question, why would someone need these models, when the variable is already available_?

Missing Quality Control: The filtering protocols established in Conevski et al. (2019, 2020 - JHR) for va are notably absent. Given that ADCP data is prone to noise, bypassing these standard filtering steps undermines the reliability of the training data.

Missing of Backscatter (BS): The study does not account for riverbed Backscatter strength (BS). As BS is a primary indicator of sediment concentration and bed properties, its exclusion from a bedload estimation model is a significant gap.

Missing more detailed Interpretability analysis.

Suggestions for Improvement:
Refocus the Scope/Novelty: The study would be more impactful if it shifted focus to the relationship between Bottom Track (BT) variables (va, BS) and water profiling data (the rest of the input features). This would allow for a detailed analysis of how local flow hydraulics relate to bedload transport—a topic currently under-researched….

Sensitivity to Filtering: Use va at different stages of processing as target variables (e.g., va1 with only direction filtering vs va2 with the full 4-step protocol, run all the ML models or the only the best ones). This would demonstrate how ML models handle raw vs. refined acoustic data.

Baseline Comparison: Include simple linear regression models as a baseline. ML complexity must be justified by showing significant performance gains over traditional statistical methods.

Enhanced Interpretability: Expand the ML analysis beyond global SHAP values. The authors should include Accumulated Local Effect (ALE) plots, Permutation Importance, and local SHAP explanations. Additionally, incorporating uncertainty analysis (e.g., prediction intervals) is essential for any model intended for field application.

Test of different feature sets, that have physical meaning vs other more ADCP settings related (bin size, blank distance, error velocity, etc)
Citation: https://doi.org/10.5194/egusphere-2025-6190-RC1
- AC1: 'Reply on RC1', Mohammd Tanvir Haque Tuhin, 31 May 2026
  
  We thank Dr. Conevski for the detailed and constructive review. His comments engaged closely with both the conceptual framing and the technical workflow of our study, and they helped us identify several important areas for improvement.
  In response, we restructured the benchmarking framework, revised the analysis scripts, introduced a staged quality-control protocol, reprocessed both datasets, and expanded the interpretability and uncertainty analyses. Our point-by-point response to the major critical points and suggestions is provided in the attached document.
  On behalf of all co-authors,
  
  Mohammd Tanvir Haque Tuhin
  
  Citation: https://doi.org/10.5194/egusphere-2025-6190-AC1
RC2:
'Comment on egusphere-2025-6190', Anonymous Referee #2, 24 May 2026

Comments - Questions (to be clarified in the text):
1. Abstract: The acronyms LSTM, GRU, CNN, RNN, ANN, SHAP should be explained (in parentheses).
2. Figures 1, 2, 3, 4: They should be cited in the text.
3. Line 179: What is SNR? It should be explained where it first appears.
4. Lines 187-188: Which is the difference between "mean water velocity" and "depth-averaged velocity"?
5. Line 227: What is exactly the "5-fold stratified cross-validation"?
6. Line 232: (a) R² is the determination coefficient (please, write it in parentheses!). (b) What does MSE mean? Mean Squared Error?
7. Line 300: Linear regression between which variables? Between "Bottom Track velocity" and which predictors?
8. Lines 310-311: I suppose that r is the correlation coefficient.
9. Line 287: What exactly is SHAP? More detail is needed.
10. General remark: It should be explicitly reported which the dependent variables at the ML methods are. E.g., Bottom-Track velocity, mean velocity, depth etc. In a similar way, it should be explicitly reported which the independent variables (predictors) are. E.g., bin distance, correlation, SNR etc.
11. Future research: A future research could include the quantification of sediment transport as a function of variables, such as bottom track velocity, mean velocity, depth etc., as well as the measurements of sediment transport.
12. See annotated manuscript for "editorial" errors!

Citation: https://doi.org/10.5194/egusphere-2025-6190-RC2
- AC2:
  'Reply on RC2', Mohammd Tanvir Haque Tuhin, 01 Jun 2026
  We sincerely thank the reviewer for the detailed and constructive review. We have carefully considered all comments, and the revisions have helped improve the clarity, precision, and overall quality of the manuscript.
  On behalf of all co-authors,
  
  Mohammd Tanvir Haque Tuhin
  
  Comments - Questions:
  Abstract: The acronyms LSTM, GRU, CNN, RNN, ANN, SHAP should be explained (in parentheses).
  
  We agree. All acronyms have now been expanded.
  Revised Text (Abstract, lines 19–20):
  ……Random Forest, Gradient Boosting, LightGBM, CatBoost, XGBoost, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Artificial Neural Network (ANN), and a hybrid LSTM+CNN, with evaluation based on both an 80/20 split and a stratified 5-fold cross-validation (CV). SHapley Additive exPlanations (SHAP) analysis was conducted for model interpretability…
  Figures 1, 2, 3, 4: They should be cited in the text.
  
  All four figures are now explicitly cited at the appropriate places.
  Revised Text :
  Section 2.1.1
  ...a 4 cm high block was installed at the downstream end of the sand bed (Fig. 1).
  Section 2.1.1
  …. At 8 m downstream from the flume inlet, a SonTek RS5 ADCP (3 MHz broadband pulse-coherent) was installed in a stationary position, near the midpoint of the flume, to provide continuous flow profiling (Fig. 2a). The RS5 ADCP employs a four-beam Janus configuration (25° beam tilt, 3° beam width), supplemented by a vertical depth beam (Fig. 2b)…
  Section 2.1.2
  …A remote-controlled rQPOD platform equipped with a SonTek RS5 ADCP was used to acquire the data (Fig. 3)….
  Section 2.3
  The overall workflow is summarized in Fig. 4, covering the main steps from laboratory and field data collection to preprocessing, feature selection, ML/DL modelling, validation, and feature-importance analysis.
  Line 179: What is SNR? It should be explained where it first appears.
  
  Corrected as suggested.
  Revised Text :
  ...matrix-type features (such as velocity standard deviation and Signal-to-Noise Ratio (SNR)) were averaged across bins to produce one value per ensemble...
  Lines 187-188: Which is the difference between "mean water velocity" and "depth-averaged velocity"?
  
  Thank you for pointing out this ambiguity. The two terms were not intended to represent separate variables.
  Revised Text :
  Mean water velocity (Mean_Speed), defined as the velocity magnitude averaged over the valid ADCP water-velocity bins for each ensemble, was used as a bulk-flow indicator related to potential sediment transport capacity.
  Line 227: What is exactly the "5-fold stratified cross-validation"?
  
  We have added an explicit definition.
  Revised Text :
  All models were trained and evaluated using both a traditional train–test split (80/20) and 5-fold stratified cross-validation to assess generalization. For regression stratification, the continuous target variable (BT_Vel) was first discretized into 10 quantile-based bins, and the folds were then generated to retain a similar distribution of low, medium, and high BT_Vel values in each fold.
  Line 232: (a) R²i s the determination coefficient (please, write it in parentheses!). (b) What does MSE mean? Mean Squared Error?
  
  Corrected as suggested.
  Revised Text :
  Performance was summarized using the coefficient of determination (R²), Mean Squared Error (MSE, in (m s⁻¹)²), and Mean Absolute Error (MAE, in m s⁻¹).
  Line 300: Linear regression between which variables? Between "Bottom Track velocity" and which predictors?
  
  Thank you for pointing this out. We have revised this section and clarified the variables used in the linear baseline analysis. In the revised manuscript, three linear regression models were added to the results, and we now explicitly state that BT_Vel was the dependent variable, while the ADCP-derived hydraulic and acoustic variables were used as predictors.
  Revised Text :
  To provide transparent statistical baselines using the available variables, we tested three linear regression approaches: Ordinary Least Squares (OLS), Ridge regression, and Lasso regression. In all cases, BT_Vel was the dependent variable, while the ADCP-derived hydraulic and acoustic variables were used as predictors, including Mean_Speed, Depth, SNR, Correlation, Bin_Distance, Vel_StdDev, and Vel_Expected_StdDev. The poor performance of the linear baselines showed that BT_Vel could not be adequately represented by linear combinations of these predictors alone.
  Lines 310-311: I suppose that r is the correlation coefficient.
  
  Yes. We have clarified this.
  Revised text:
  ...The target variable, Bottom-Track Velocity (BT_Vel), displays a moderate association with Mean_Speed (Pearson correlation coefficient r ≈ 0.43) and Bin Distance (r ≈ 0.26)...
  Line 287: What exactly is SHAP? More detail is needed.
  
  Thank you for this clarification request. We have expanded the explanation of SHAP in the revised manuscript.
  Revised text:
  To enhance interpretability, feature contributions were analyzed using SHAP (SHapley Additive exPlanations), an explainable-machine-learning method based on Shapley values. SHAP assigns each predictor a contribution value, indicating how much that predictor increases or decreases an individual model prediction relative to the model baseline. In this study, mean absolute SHAP values were used to assess the global importance of the ADCP-derived predictors for BT_Vel prediction.
  General remark: It should be explicitly reported which the dependent variables at the ML methods are. E.g., Bottom-Track velocity, mean velocity, depth etc. In a similar way, it should be explicitly reported which the independent variables (predictors) are. E.g., bin distance, correlation, SNR etc.
  
  Thank you for this helpful suggestion. We have revised Section 2.2.1 to state explicitly that BT_Vel was used as the dependent variable/target, while the ADCP-derived hydraulic and acoustic variables were used as independent predictors.
  Revised text:
  In both datasets, Bottom-Track Velocity (BT_Vel) was used as the dependent variable/target, while the ADCP-derived hydraulic and acoustic variables were used as independent predictors. The analysis focused on seven predictors that characterize the flow’s intensity, variability, and acoustic signal properties: Mean_Speed, Depth, Vel_StdDev, SNR, Bin_Distance, Vel_Expected_StdDev, and Correlation.
  Future research: A future research could include the quantification of sediment transport as a function of variables, such as bottom track velocity, mean velocity, depth etc., as well as the measurements of sediment transport.
  
  Thank you for this helpful suggestion. We have revised the Future Outlook section.
  Revised text:
  Direct measurements of SSC and bedload transport would enhance model calibration and validation. Future work should use such measurements to quantify sediment transport rates as functions of ADCP-derived hydraulic and acoustic predictors, including BT_Vel, Mean_Speed, Depth, SNR, and related variables.
  See annotated manuscript for "editorial" errors!
  
  Thanks for these editorial corrections. We have corrected the indicated editorial issues.
  
  Citation: https://doi.org/10.5194/egusphere-2025-6190-AC2

Mohammd Tanvir Haque Tuhin, Reinhard Hinkelmann, and Christoph Mudersbach

Supplement

https://doi.org/10.5194/egusphere-2025-6190-supplement

Mohammd Tanvir Haque Tuhin, Reinhard Hinkelmann, and Christoph Mudersbach

Viewed

Total article views: 1,602 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
943	564	95	1,602	165	75	107

HTML: 943
PDF: 564
XML: 95
Total: 1,602
Supplement: 165
BibTeX: 75
EndNote: 107

Views and downloads (calculated since 05 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	792	356	79	1,227
Apr 2026	79	73	1	153
May 2026	49	38	6	93
Jun 2026	17	9	6	32
Jul 2026	6	88	3	97

Cumulative views and downloads (calculated since 05 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	792	356	79	1,227
Apr 2026	79	73	1	153
May 2026	49	38	6	93
Jun 2026	17	9	6	32
Jul 2026	6	88	3	97

Viewed (geographical distribution)

Total article views: 1,515 (including HTML, PDF, and XML) Thereof 1,515 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 25 Jul 2026

Short summary

This study tests how Acoustic Doppler Current Profiler (ADCP) data can support proxy sediment-transport monitoring without labour-intensive sediment sampling. Using data from a flume and a natural river, we train and compare several machine-learning models to predict a near-bed velocity signal. The results show which ADCP features and model types work best as practical indicators of bed activity.


Total:	0
HTML:	0
PDF:	0
XML:	0