Preprints
https://doi.org/10.5194/egusphere-2025-2036
https://doi.org/10.5194/egusphere-2025-2036
02 Jun 2025
 | 02 Jun 2025

Sensitivity of hydrological machine learning prediction accuracy to information quantity and quality

Minhyuk Jeung, Younggu Her, Sang-Soo Baek, and Kwangsik Yoon

Abstract. Machine learning (ML) is now commonly employed as a tool for hydrological prediction due to recent advances in computing resources and increases in data volume. The prediction accuracy of ML (or data-driven) modeling is known to be improved through training with additional data; however, the improvement mechanism needs to be better understood and documented. This study explores the connection between the amount of information contained in the data used to train an ML model and the model’s prediction accuracy. The amount of information was quantified using Shannon’s information theory, including marginal and transfer entropy. Three ML models were trained to predict the flow discharge, sediment, total nitrogen, and total phosphorus loads of four watersheds. The amount of information contained in the training data was increased by sequentially adding weather data and the simulation outputs of uncalibrated and/or calibrated mechanistic (or theory-driven) models. The reliability of training data was considered a surrogate of information quality, and accuracy statistics were used to measure the quality (or reliability) of the uncalibrated and calibrated theory-driven modeling outputs to be provided as training data for ML modeling. The results demonstrated that the prediction accuracy of hydrological ML modeling depends on the quality and quantity of information contained in the training data. The use of all types of training data provided the best hydrological ML prediction accuracy. ML models trained only with weather data and calibrated theory-driven modeling outputs could most efficiently improve accuracy in terms of information use. This study thus illustrates how a theory-driven approach can help improve the accuracy of data-driven modeling by providing quality information about a system of interest.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Share

Journal article(s) based on this preprint

24 Feb 2026
Sensitivity of hydrological machine learning prediction accuracy to information quantity and quality
Minhyuk Jeung, Younggu Her, Sang-Soo Baek, and Kwangsik Yoon
Hydrol. Earth Syst. Sci., 30, 1077–1095, https://doi.org/10.5194/hess-30-1077-2026,https://doi.org/10.5194/hess-30-1077-2026, 2026
Short summary
Minhyuk Jeung, Younggu Her, Sang-Soo Baek, and Kwangsik Yoon

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-2036', Anonymous Referee #1, 30 Jun 2025
    • AC1: 'Reply on RC1', Minhyuk Jeung, 04 Dec 2025
  • RC2: 'Comment on egusphere-2025-2036', Anonymous Referee #2, 05 Nov 2025
    • AC2: 'Reply on RC2', Minhyuk Jeung, 04 Dec 2025

Interactive discussion

Status: closed

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on egusphere-2025-2036', Anonymous Referee #1, 30 Jun 2025
    • AC1: 'Reply on RC1', Minhyuk Jeung, 04 Dec 2025
  • RC2: 'Comment on egusphere-2025-2036', Anonymous Referee #2, 05 Nov 2025
    • AC2: 'Reply on RC2', Minhyuk Jeung, 04 Dec 2025

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
ED: Reconsider after major revisions (further review by editor and referees) (21 Dec 2025) by Fuqiang Tian
AR by Minhyuk Jeung on behalf of the Authors (09 Jan 2026)  Author's response   Author's tracked changes   Manuscript 
ED: Referee Nomination & Report Request started (10 Jan 2026) by Fuqiang Tian
RR by Anonymous Referee #1 (07 Feb 2026)
RR by Anonymous Referee #2 (13 Feb 2026)
ED: Publish as is (13 Feb 2026) by Fuqiang Tian
AR by Minhyuk Jeung on behalf of the Authors (16 Feb 2026)  Manuscript 

Journal article(s) based on this preprint

24 Feb 2026
Sensitivity of hydrological machine learning prediction accuracy to information quantity and quality
Minhyuk Jeung, Younggu Her, Sang-Soo Baek, and Kwangsik Yoon
Hydrol. Earth Syst. Sci., 30, 1077–1095, https://doi.org/10.5194/hess-30-1077-2026,https://doi.org/10.5194/hess-30-1077-2026, 2026
Short summary
Minhyuk Jeung, Younggu Her, Sang-Soo Baek, and Kwangsik Yoon
Minhyuk Jeung, Younggu Her, Sang-Soo Baek, and Kwangsik Yoon

Viewed

Total article views: 1,003 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
785 185 33 1,003 81 22 39
  • HTML: 785
  • PDF: 185
  • XML: 33
  • Total: 1,003
  • Supplement: 81
  • BibTeX: 22
  • EndNote: 39
Views and downloads (calculated since 02 Jun 2025)
Cumulative views and downloads (calculated since 02 Jun 2025)

Viewed (geographical distribution)

Total article views: 1,006 (including HTML, PDF, and XML) Thereof 1,006 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 24 Feb 2026
Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Short summary
Machine learning (ML) techniques have become widely used due to the availability of large data repositories and advancements in computing resources and methods. Our study explored the connection between a model’s accuracy and the information content of input data. Results showed that the accuracy of three ML models significantly improved when high-quality input data were included. These findings highlight the importance of data quality in ML model training.
Share