the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Sensitivity of hydrological machine learning prediction accuracy to information quantity and quality
Abstract. Machine learning (ML) is now commonly employed as a tool for hydrological prediction due to recent advances in computing resources and increases in data volume. The prediction accuracy of ML (or data-driven) modeling is known to be improved through training with additional data; however, the improvement mechanism needs to be better understood and documented. This study explores the connection between the amount of information contained in the data used to train an ML model and the model’s prediction accuracy. The amount of information was quantified using Shannon’s information theory, including marginal and transfer entropy. Three ML models were trained to predict the flow discharge, sediment, total nitrogen, and total phosphorus loads of four watersheds. The amount of information contained in the training data was increased by sequentially adding weather data and the simulation outputs of uncalibrated and/or calibrated mechanistic (or theory-driven) models. The reliability of training data was considered a surrogate of information quality, and accuracy statistics were used to measure the quality (or reliability) of the uncalibrated and calibrated theory-driven modeling outputs to be provided as training data for ML modeling. The results demonstrated that the prediction accuracy of hydrological ML modeling depends on the quality and quantity of information contained in the training data. The use of all types of training data provided the best hydrological ML prediction accuracy. ML models trained only with weather data and calibrated theory-driven modeling outputs could most efficiently improve accuracy in terms of information use. This study thus illustrates how a theory-driven approach can help improve the accuracy of data-driven modeling by providing quality information about a system of interest.
- Preprint
(1625 KB) - Metadata XML
-
Supplement
(8190 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-2036', Anonymous Referee #1, 30 Jun 2025
-
AC1: 'Reply on RC1', Minhyuk Jeung, 04 Dec 2025
RC1.1: The manuscript investigates how the input information quantity and quality, quantified by marginal and transfer entropy, influence the machine-learning-based (ML) hydrological prediction performance. The results demonstrate that increased information quantity does not necessarily enhance model performance whereas improved information quality can more efficiently boost predictive accuracy. However, some points might need to be improved or clarified before publication.
Response to RC1.1: We thank the reviewer for recognizing this study’s central contributions and for highlighting areas needing clarification. In response to the comments regarding structure, terminology, and interpretation in the discussion, we have made several key revisions to improve clarity and emphasize the study’s contributions.
First, as suggested, we split the original Discussion section into three clearer parts: (1) how information quantity affects ML performance, (2) how information quality affects performance, and (3) implications for combining ML with process-based models. We also moved detailed model results (e.g., RMSE and NSE) to the Results section to clearly separate evidence from interpretation.
Second, we replaced the word “resilience” with “robustness” when describing the ANN model’s ability to maintain high performance even when low-quality inputs (i.e., WD+UC) were utilized.
Third, we clarified our use of the term “efficiency” to avoid confusion. In this study, “information-use efficiency” refers to a model’s ability to achieve equal or better predictive performance using fewer, but more informative, input variables, not to reductions in training time or computational cost. We revised the text to reflect this meaning consistently.
Together, these revisions directly address the reviewer’s concerns and improve the manuscript’s clarity, structure, and contribution to the literature on entropy-informed ML model development in hydrology.
RC1.2: For the discussion part, the authors dump the discussion of ML modeling accuracy, the influence of data quantity and quality into one long section. It might be better to transfer the results of ML performance into the Results part, and then divide remaining discussions into several subsections, e.g., the influence of data quantity on ML performance, the influence of data quality on ML performance, implications on future integration of ML with mechanistic models. This breakdown might better clarify the contribution of this work.
Response to RC1.2: Thank you for this constructive suggestion. We agree that the original Discussion section combined several key themes, including ML model performance, the effects of data quantity and quality, and broader implications, into a single continuous narrative, which may have reduced clarity and obscured the study's main contributions. In response, we have made the following revisions:
- Relocated quantitative results (e.g., RMSE and NSE comparisons across data scenarios) from the Discussion to the Results section to more clearly separate evidence from interpretation.
- Reorganized the Discussion section into three focused sub-sections:
- 1. Influence of information quantity on ML performance
- 2. Influence of information quality on ML performance
- 3. Implications for integrating ML with mechanistic models
These changes better align our discussion structure with the study's main objectives and improve the clarity of our contributions. Specifically:
Section 4.1 examines how increases in input quantity influenced model performance, including observations of diminishing returns, and connects these patterns to previous literature.
Section 4.2 focuses on the impact of information quality, as quantified by transfer entropy, and highlights cases where fewer but higher-quality inputs led to improved performance, particularly in the ANN model.
Section 4.3 explores the broader implications of our findings, including how entropy metrics can guide input selection and data assimilation in mechanistic models, inform efficient training strategies, and support the development of hybrid modeling approaches.
RC1.3: Line 526. The authors mentions that the ANN model exhibit its resilience to more efficiently utilize quality information. What does the term of “resilience” mean here?
Response to RC1.3: We appreciate the reviewer’s request for clarification. In the original sentence, we wrote: “The ANN model exhibits its resilience to more efficiently utilize quality information …” Our intention was to convey that the ANN model retained a high level of predictive skill even after low-entropy (low-quality) inputs were added, whereas the other models showed a sharper decline in performance. In other words, the ANN model was robust to reductions in the quantity of information as long as the remaining inputs were of high informational quality. To avoid ambiguity, we have replaced the word “resilience” with “robustness,” a term more commonly used in the modelling literature to describe stability of performance under adverse or reduced-information conditions. In addition, we have added a definition in the text. The revised sentence now reads (Section 4.2, the first paragraph): “The ANN demonstrated robustness by effectively exploiting additional information, whereas the RF and SVM models exhibited performance deterioration.” Furthermore, we have provided a short explanatory clause that explicitly links robustness to the model’s ability to exploit high-entropy (high-quality) inputs: “…indicating that the ANN can exploit the remaining high-entropy variables more effectively than the other algorithms.”
RC1.4: Line 555-557. The authors mentions that high-quality training data can improve the efficiency of ML models. The term of “efficiency” might be ambiguous here since it can either refer to the information use efficiency or reduced training/computation time of ML models, especially given later comments on potential advantages of streamlined model training. Similar unclarified issues also exist in other parts, e.g., Line 539-540, and might cause reader’s misunderstanding. Please check related unclarified terms and keep consistency through discussions.
Response to RC1.4: Thank you for your comment regarding the ambiguous use of the term “efficiency.” We agree that in the original text, the use of “efficiency” could be misinterpreted as either referring to information-use efficiency or to computational/training efficiency. Our intention was to emphasize that high-quality training data enhanced the information-use efficiency of machine learning models, meaning the models achieved equal or better predictive accuracy using fewer, yet more informative, input variables. To clarify this, we revised the sentence in Lines 555–557 to state: “These results suggest that higher-quality training data improved the information-use efficiency of ML models, enabling them to maintain or improve prediction accuracy while using a reduced number of inputs.” We also reviewed and revised other instances throughout the manuscript, such as on Lines 539–540, to ensure that the term “efficiency” consistently refers to information-use efficiency unless otherwise specified.
Citation: https://doi.org/10.5194/egusphere-2025-2036-AC1
-
AC1: 'Reply on RC1', Minhyuk Jeung, 04 Dec 2025
-
RC2: 'Comment on egusphere-2025-2036', Anonymous Referee #2, 05 Nov 2025
This is a useful, well-motivated study on how information quantity (marginal entropy) and quality (transfer entropy) affect hydrological ML performance. The core message—that more data does not guarantee better predictions while higher-quality information is more impactful—is clear and relevant. I recommend moderate revision focused on clarity of structure, terminology, and methods transparency.
Detailed comments are listed as follows.
1. Restructure: move quantitative ML results to Results; keep Discussion for interpretation, organized into quantity effects, quality effects, and implications for ML–process model integration.
2. Clarify terms: replace/define “resilience” precisely; reserve “efficiency” for information-use efficiency (IUE) and use “computational efficiency” for runtime/training remarks.
3. Methods transparency: briefly specify how marginal/transfer entropy are estimated (estimator, lags/embedding/discretization) and note comparability of “bits” across variables.
4. Data splits & leakage: clearly diagram time windows (SWAT calibration vs. ML train/test) and state how leakage is avoided.
5. Uncertainty & presentation: add compact uncertainty cues (e.g., CIs/whiskers or paired tests) to key figures; simplify dense plots and fix minor typos/formatting.Citation: https://doi.org/10.5194/egusphere-2025-2036-RC2 -
AC2: 'Reply on RC2', Minhyuk Jeung, 04 Dec 2025
RC2.1: This is a useful, well-motivated study on how information quantity (marginal entropy) and quality (transfer entropy) affect hydrological ML performance. The core message—that more data does not guarantee better predictions while higher-quality information is more impactful—is clear and relevant. I recommend moderate revision focused on clarity of structure, terminology, and methods transparency. Detailed comments are listed as follows.
Response to RC2.1: We appreciate the reviewer’s detailed and constructive suggestions for clarifying the manuscript structure and improving methodological transparency. In response to them, we have made several key revisions. First, as suggested, we split the original Discussion section into three parts: (1) how information quantity affects ML performance, (2) how information quality affects ML performance, and (3) implications for combining ML with process-based models. We also moved detailed model accuracy statistics (e.g., RMSE and NSE) included in the Discussion section to the Results section to better separate evidence from interpretation. Second, we replaced the word “resilience” with “robustness” when describing the ANN model’s ability. Third, we clarified our use of the term “efficiency” to avoid confusion. In this study, “information-use efficiency” refers to a model’s ability to achieve equal or better predictive performance using fewer, but more informative, input variables, not to reductions in training time or computational cost. Fourth, we added whisker-box plots to present the inter-dataset, inter-model, and inter-watershed variability of the key metrics (KGE and IUE). These plots allow us to examine how performance varies across different ML models and data combinations and to discuss which configurations yield more consistent results.
RC2.2: Restructure: move quantitative ML results to Results; keep Discussion for interpretation, organized into quantity effects, quality effects, and implications for ML–process model integration.
Response to RC2.2: Thank you for this constructive suggestion. We agree that the original Discussion section combined several key themes, including ML model performance, the effects of data quantity and quality, and broader implications, into a single continuous narrative, which may have reduced clarity and obscured the study's main contributions. In response, we have made the following revisions:
- Relocated quantitative results (e.g., RMSE and NSE comparisons across data scenarios) from the Discussion to the Results section to more clearly separate evidence from interpretation.
- Reorganized the Discussion section into three focused sub-sections:
- 1. Influence of information quantity on ML performance
- 2. Influence of information quality on ML performance
- 3. Implications for integrating ML with mechanistic models
These changes better align our discussion structure with the study's main objectives and improve the clarity of our contributions. Specifically:
Section 4.1 examines how increases in input quantity influenced model performance, including observations of diminishing returns, and connects these patterns to previous literature.
Section 4.2 focuses on the impact of information quality, as quantified by transfer entropy, and highlights cases where fewer but higher-quality inputs led to improved performance, particularly in the ANN model.
Section 4.3 explores the broader implications of our findings, including how entropy metrics can guide input selection and data assimilation in mechanistic models, inform efficient training strategies, and support the development of hybrid modeling approaches.
RC2.3: Clarify terms: replace/define “resilience” precisely; reserve “efficiency” for information-use efficiency (IUE) and use “computational efficiency” for runtime/training remarks.
Response to RC2.3: We appreciate the reviewer’s request for clarification. In the original sentence, we wrote: “The ANN model exhibits its resilience to more efficiently utilize quality information …” Our intention was to convey that the ANN model retained a high level of predictive skill even after low-entropy (low-quality) inputs were added, whereas the other models showed a sharper decline in performance. In other words, the ANN model was robust to reductions in the quantity of information as long as the remaining inputs were of high informational quality. To avoid ambiguity, we have replaced the word “resilience” with “robustness,” a term more commonly used in the modelling literature to describe stability of performance under adverse or reduced-information conditions. In addition, we have added a quantitative definition in the text. The revised sentence now reads (Section 4.2 Influence of information quality on ML performance, the first paragraph): “The ANN demonstrated robustness by effectively exploiting additional information, whereas the RF and SVM models exhibited performance deterioration.” Furthermore, we have provided a short explanatory clause that explicitly links robustness to the model’s ability to exploit high-entropy (high-quality) inputs: “…indicating that the ANN can exploit the remaining high-entropy variables more effectively than the other algorithms.”
In addition, the use of “efficiency” could be misinterpreted as either referring to information-use efficiency or to computational/training efficiency. Our intention was to emphasize that high-quality training data enhanced the information-use efficiency of machine learning models, meaning the models achieved equal or better predictive accuracy using fewer, yet more informative, input variables. To clarify this, we revised the corresponding sentence to state: “These results suggest that higher-quality training data improved the information-use efficiency of ML models, enabling them to maintain or improve prediction accuracy while using a reduced number of inputs.” We also reviewed and revised other instances throughout the manuscript to ensure that the term “efficiency” consistently refers to information-use efficiency unless otherwise specified.
RC2.4: Methods transparency: briefly specify how marginal/transfer entropy are estimated (estimator, lags/embedding/discretization) and note comparability of “bits” across variables.
Response to RC2.4: We agree that a more detailed description of how marginal and transfer entropy were computed would improve reader understanding. Transfer entropy (TE) was calculated using the methodology offered by the RTransferEntropy package (Behrendt et al., 2019), applying Shannon TE with a quantile-based discretization scheme. This choice enhances robustness to outliers and better captures information transfer associated with relatively high and low values (Nie, 2021; Zhang and Zhao, 2022). The lag parameter was set to zero, because the ML models used in this study are standard regression models without explicit temporal memory (e.g., no LSTM); accordingly, we quantified synchronous information-use between inputs and outputs (relationship between same time; lag = 0), which aligns with our primary objective. Marginal entropy was computed with log base 2, and all marginal/transfer entropy magnitudes are reported in bits; where missing, units have been added to figures and tables.
References:
Behrendt, S., Dimpfl, T., Peter, F.J., Zimmermann, D.J., 2019. RTransferEntropy — Quantifying information flow between different time series using effective transfer entropy. SoftwareX 10, 100265.
Nie, C.-X., 2021. Dynamics of the price–volume information flow based on surrogate time series. Chaos: An Interdisciplinary Journal of Nonlinear Science 31.
Zhang, N., Zhao, X., 2022. Quantile transfer entropy: Measuring the heterogeneous information transfer of nonlinear time series. Communications in Nonlinear Science and Numerical Simulation 111, 106505.
RC2.5: Data splits & leakage: clearly diagram time windows (SWAT calibration vs. ML train/test) and state how leakage is avoided.
Response to RC2.5: Thank you for the constructive suggestion. The primary objective of this study is to evaluate how the quantity and quality of input data influence the predictive accuracy of ML models by using both intentionally uncalibrated and calibrated mechanistic modeling (SWAT) outputs as inputs.
To further ensure that data leakage was avoided, we carefully aligned the training and testing periods of the ML models with the calibration and validation periods of the SWAT model, respectively. For example, the ML models were evaluated using the same period (January 1, 2016 to December 31, 2017) employed for SWAT model validation. In addition, the ML models were trained exclusively on SWAT-simulated nutrient loads from the calibration period, while observed discharge and concentration data were used only for SWAT calibration and validation and never as ML inputs, thereby preserving the independence of the datasets. These ensured that no observed data used for SWAT calibration was involved in ML model training or testing, thereby maintaining strict independence between datasets.
To enhance clarity, these methodological safeguards and the rationale behind our data-separation strategy have been explicitly described in the revised manuscript (Section 2.6 – Study Watersheds and Training Data Acquisition and Section 4 – Discussions). In addition, we revised the diagram (Figure 1 in supplementary file) to clearly illustrate the data-splitting scheme and the training workflow.
RC2.6: Uncertainty & presentation: add compact uncertainty cues (e.g., CIs/whiskers or paired tests) to key figures; simplify dense plots and fix minor typos/formatting.
Response to RC2.6: We appreciate the reviewer’s suggestion to clarify the presentation of model performance uncertainty. In response to it, we have added new figures (Figure S5 in supplementary file) and revised the key figure (Figure 8 in supplementary file) to include compact uncertainty indicators as suggested. Specifically, we added box-whisker plots to present the inter-dataset, inter-model, and inter-watershed variability of the key metrics (IUE-ME and IUE-TE). In addition, we have carefully revised typographical errors and formatting throughout the manuscript.
-
AC2: 'Reply on RC2', Minhyuk Jeung, 04 Dec 2025
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 717 | 104 | 25 | 846 | 53 | 15 | 32 |
- HTML: 717
- PDF: 104
- XML: 25
- Total: 846
- Supplement: 53
- BibTeX: 15
- EndNote: 32
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The manuscript investigates how the input information quantity and quality, quantified by marginal and transfer entropy, influence the machine-learning-based (ML) hydrological prediction performance.
The results demonstrate that increased information quantity does not necessarily enhance model performance whereas improved information quality can more efficiently boost predictive accuracy.
However, some points might need to be improved or clarified before publication: