the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An ADCP-Based Data-Driven Framework for Proxy Sediment Transport Monitoring: From Controlled Flumes to Natural Rivers
Abstract. Acoustic Doppler Current Profilers (ADCPs) provide a rich yet underutilized source for monitoring hydrodynamics and sediment transport. Accurate prediction of sediment‐related variables is critical for river engineering, morphological studies, and environmental management. Among these, Bottom-Track Velocity (BT_Vel) serves as a robust proxy for near-bed sediment dynamics and bedload activity. This study develops a machine learning (ML) and deep learning (DL) framework to predict BT_Vel from ADCP-derived hydrodynamic and acoustic features, enabling proxy estimation of sediment transport processes in both controlled flume and natural riverine environments. Two datasets were analyzed: (i) a laboratory dataset of 22,650 ensemble samples obtained under controlled flow regimes, and (ii) a field dataset of 5,900 ensemble samples collected across seven campaigns at a fixed river cross-section. A consistent benchmarking strategy was applied across Random Forest, Gradient Boosting, LightGBM, CatBoost, XGBoost, LSTM, GRU, CNN, RNN, ANN, and a hybrid LSTM+CNN, with evaluation based on both an 80/20 split and a stratified 5-fold cross-validation (CV). SHAP analysis was conducted for model interpretability. In the laboratory, Random Forest (R² = 0.804 split / 0.783 CV) and Gradient Boosting (0.787 / 0.757) achieved the best generalization, while LSTM+CNN (0.770 / 0.730) and LSTM (0.775 / 0.718) remained competitive. In the field, Random Forest again delivered the strongest results (0.573 / 0.603), followed closely by CatBoost, LightGBM, and XGBoost. Notably, LSTM improved under cross-validation (0.468 → 0.529), suggesting fold-wise diversity stabilized training under noisy, heterogeneous river data. By contrast, the Stacking Regressor consistently showed the weakest generalization across both environments. SHAP revealed a shift in feature relevance: in the laboratory, Mean water velocity (Mean_Speed) dominated predictions, while in the field, Depth and signal-to-noise ratio (SNR) emerged as stronger drivers, reflecting the influence of stage variability and acoustic quality. Overall, the study demonstrates that ADCP-derived features, coupled with explainable ML/DL models, provide robust potential for proxy sediment transport modeling. Conversion to absolute transport rates requires paired sediment measurements, while future work should expand field campaigns and explore hybrid physics–data frameworks toward operational forecasting.
- Preprint
(1650 KB) - Metadata XML
-
Supplement
(152 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-6190', Slaven Conevski, 04 May 2026
-
AC1: 'Reply on RC1', Mohammd Tanvir Haque Tuhin, 31 May 2026
We thank Dr. Conevski for the detailed and constructive review. His comments engaged closely with both the conceptual framing and the technical workflow of our study, and they helped us identify several important areas for improvement.
In response, we restructured the benchmarking framework, revised the analysis scripts, introduced a staged quality-control protocol, reprocessed both datasets, and expanded the interpretability and uncertainty analyses. Our point-by-point response to the major critical points and suggestions is provided in the attached document.
On behalf of all co-authors,
Mohammd Tanvir Haque Tuhin
-
AC1: 'Reply on RC1', Mohammd Tanvir Haque Tuhin, 31 May 2026
-
RC2: 'Comment on egusphere-2025-6190', Anonymous Referee #2, 24 May 2026
Comments - Questions (to be clarified in the text):
1. Abstract: The acronyms LSTM, GRU, CNN, RNN, ANN, SHAP should be explained (in parentheses).
2. Figures 1, 2, 3, 4: They should be cited in the text.
3. Line 179: What is SNR? It should be explained where it first appears.
4. Lines 187-188: Which is the difference between "mean water velocity" and "depth-averaged velocity"?
5. Line 227: What is exactly the "5-fold stratified cross-validation"?
6. Line 232: (a) R2 is the determination coefficient (please, write it in parentheses!). (b) What does MSE mean? Mean Squared Error?
7. Line 300: Linear regression between which variables? Between "Bottom Track velocity" and which predictors?
8. Lines 310-311: I suppose that r is the correlation coefficient.
9. Line 287: What exactly is SHAP? More detail is needed.
10. General remark: It should be explicitly reported which the dependent variables at the ML methods are. E.g., Bottom-Track velocity, mean velocity, depth etc. In a similar way, it should be explicitly reported which the independent variables (predictors) are. E.g., bin distance, correlation, SNR etc.
11. Future research: A future research could include the quantification of sediment transport as a function of variables, such as bottom track velocity, mean velocity, depth etc., as well as the measurements of sediment transport.
12. See annotated manuscript for "editorial" errors!
-
AC2: 'Reply on RC2', Mohammd Tanvir Haque Tuhin, 01 Jun 2026
We sincerely thank the reviewer for the detailed and constructive review. We have carefully considered all comments, and the revisions have helped improve the clarity, precision, and overall quality of the manuscript.
On behalf of all co-authors,
Mohammd Tanvir Haque TuhinComments - Questions:
- Abstract: The acronyms LSTM, GRU, CNN, RNN, ANN, SHAP should be explained (in parentheses).
We agree. All acronyms have now been expanded.
Revised Text (Abstract, lines 19–20):
……Random Forest, Gradient Boosting, LightGBM, CatBoost, XGBoost, Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Artificial Neural Network (ANN), and a hybrid LSTM+CNN, with evaluation based on both an 80/20 split and a stratified 5-fold cross-validation (CV). SHapley Additive exPlanations (SHAP) analysis was conducted for model interpretability…
- Figures 1, 2, 3, 4: They should be cited in the text.
All four figures are now explicitly cited at the appropriate places.
Revised Text :
Section 2.1.1
...a 4 cm high block was installed at the downstream end of the sand bed (Fig. 1).
Section 2.1.1
…. At 8 m downstream from the flume inlet, a SonTek RS5 ADCP (3 MHz broadband pulse-coherent) was installed in a stationary position, near the midpoint of the flume, to provide continuous flow profiling (Fig. 2a). The RS5 ADCP employs a four-beam Janus configuration (25° beam tilt, 3° beam width), supplemented by a vertical depth beam (Fig. 2b)…
Section 2.1.2
…A remote-controlled rQPOD platform equipped with a SonTek RS5 ADCP was used to acquire the data (Fig. 3)….
Section 2.3
The overall workflow is summarized in Fig. 4, covering the main steps from laboratory and field data collection to preprocessing, feature selection, ML/DL modelling, validation, and feature-importance analysis.
- Line 179: What is SNR? It should be explained where it first appears.
Corrected as suggested.
Revised Text :
...matrix-type features (such as velocity standard deviation and Signal-to-Noise Ratio (SNR)) were averaged across bins to produce one value per ensemble...
- Lines 187-188: Which is the difference between "mean water velocity" and "depth-averaged velocity"?
Thank you for pointing out this ambiguity. The two terms were not intended to represent separate variables.
Revised Text :
Mean water velocity (Mean_Speed), defined as the velocity magnitude averaged over the valid ADCP water-velocity bins for each ensemble, was used as a bulk-flow indicator related to potential sediment transport capacity.
- Line 227: What is exactly the "5-fold stratified cross-validation"?
We have added an explicit definition.
Revised Text :
All models were trained and evaluated using both a traditional train–test split (80/20) and 5-fold stratified cross-validation to assess generalization. For regression stratification, the continuous target variable (BT_Vel) was first discretized into 10 quantile-based bins, and the folds were then generated to retain a similar distribution of low, medium, and high BT_Vel values in each fold.
- Line 232: (a) R2i s the determination coefficient (please, write it in parentheses!). (b) What does MSE mean? Mean Squared Error?
Corrected as suggested.
Revised Text :
Performance was summarized using the coefficient of determination (R²), Mean Squared Error (MSE, in (m s⁻¹)²), and Mean Absolute Error (MAE, in m s⁻¹).
- Line 300: Linear regression between which variables? Between "Bottom Track velocity" and which predictors?
Thank you for pointing this out. We have revised this section and clarified the variables used in the linear baseline analysis. In the revised manuscript, three linear regression models were added to the results, and we now explicitly state that BT_Vel was the dependent variable, while the ADCP-derived hydraulic and acoustic variables were used as predictors.
Revised Text :
To provide transparent statistical baselines using the available variables, we tested three linear regression approaches: Ordinary Least Squares (OLS), Ridge regression, and Lasso regression. In all cases, BT_Vel was the dependent variable, while the ADCP-derived hydraulic and acoustic variables were used as predictors, including Mean_Speed, Depth, SNR, Correlation, Bin_Distance, Vel_StdDev, and Vel_Expected_StdDev. The poor performance of the linear baselines showed that BT_Vel could not be adequately represented by linear combinations of these predictors alone.
- Lines 310-311: I suppose that r is the correlation coefficient.
Yes. We have clarified this.
Revised text:
...The target variable, Bottom-Track Velocity (BT_Vel), displays a moderate association with Mean_Speed (Pearson correlation coefficient r ≈ 0.43) and Bin Distance (r ≈ 0.26)...
- Line 287: What exactly is SHAP? More detail is needed.
Thank you for this clarification request. We have expanded the explanation of SHAP in the revised manuscript.
Revised text:
To enhance interpretability, feature contributions were analyzed using SHAP (SHapley Additive exPlanations), an explainable-machine-learning method based on Shapley values. SHAP assigns each predictor a contribution value, indicating how much that predictor increases or decreases an individual model prediction relative to the model baseline. In this study, mean absolute SHAP values were used to assess the global importance of the ADCP-derived predictors for BT_Vel prediction.
- General remark: It should be explicitly reported which the dependent variables at the ML methods are. E.g., Bottom-Track velocity, mean velocity, depth etc. In a similar way, it should be explicitly reported which the independent variables (predictors) are. E.g., bin distance, correlation, SNR etc.
Thank you for this helpful suggestion. We have revised Section 2.2.1 to state explicitly that BT_Vel was used as the dependent variable/target, while the ADCP-derived hydraulic and acoustic variables were used as independent predictors.
Revised text:
In both datasets, Bottom-Track Velocity (BT_Vel) was used as the dependent variable/target, while the ADCP-derived hydraulic and acoustic variables were used as independent predictors. The analysis focused on seven predictors that characterize the flow’s intensity, variability, and acoustic signal properties: Mean_Speed, Depth, Vel_StdDev, SNR, Bin_Distance, Vel_Expected_StdDev, and Correlation.
- Future research: A future research could include the quantification of sediment transport as a function of variables, such as bottom track velocity, mean velocity, depth etc., as well as the measurements of sediment transport.
Thank you for this helpful suggestion. We have revised the Future Outlook section.
Revised text:
Direct measurements of SSC and bedload transport would enhance model calibration and validation. Future work should use such measurements to quantify sediment transport rates as functions of ADCP-derived hydraulic and acoustic predictors, including BT_Vel, Mean_Speed, Depth, SNR, and related variables.
- See annotated manuscript for "editorial" errors!
Thanks for these editorial corrections. We have corrected the indicated editorial issues.
Citation: https://doi.org/10.5194/egusphere-2025-6190-AC2
-
AC2: 'Reply on RC2', Mohammd Tanvir Haque Tuhin, 01 Jun 2026
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 932 | 473 | 92 | 1,497 | 164 | 71 | 104 |
- HTML: 932
- PDF: 473
- XML: 92
- Total: 1,497
- Supplement: 164
- BibTeX: 71
- EndNote: 104
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General Comments:
The paper presents a comprehensive attempt to utilize Machine Learning (ML) to estimate bedload variables using ADCP data. The core premise—predicting bedload solely from ADCP-derived inputs—is practically relevant. However, the current workflow and conceptual focus require significant refinement. While the literature review is strong, the "black box" nature of the current approach and the omission of established data-cleaning protocols limit the study's validity. I recommend a Major Revision (or Resubmit) based on the following concerns:
Major Critical Points:
Suggestions for Improvement: