the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
HESS Opinions: Applied hydrologic models in the era of machine learning – retain, revamp, reconcile, or replace?
Abstract. Despite advancements in the performance of machine learning (ML) based hydrologic models, some institutions are hesitant to pursue ML as a replacement for existing conceptual or process-based hydrologic models in many applications. In several of these circumstances, traditional hydrologic models continue to be favored due to their familiarity, reliability, interpretability, established performance benchmarks under varied settings, availability of detailed training modules and a trained workforce, as well as close integration with data, processing, and decision-making pipelines. Recognizing these advantages, this perspective argues for two pragmatic and institutionally compatible paths forward for integration of ML within applied models: (1) reconciling ML as a complementary option in applied hydrologic modeling workflows; and (2) revamping or upskilling hydrologic modeling workflows using ML. To support this perspective, we highlight key opportunities where ML can be used as a tool to enhance results across various stages of the model implementation and operational workflow including data pre-processing, parameter calibration, parameter transferability, data assimilation, solver enhancement, accelerating scenario simulations and post-processing. Each of these two integration strategies can be implemented into current applied model frameworks, thereby combining the strengths of both physical modeling and ML. These strategies can help overcome current bottlenecks and address institutional needs of continuity and compatibility, while also offering the potential to improve model performance with ML.
- Preprint
(699 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- CC1: 'Comment on egusphere-2026-583', Nima Zafarmomen, 25 Feb 2026
-
RC1: 'Comment on egusphere-2026-583', Anonymous Referee #1, 16 May 2026
This manuscript presents a perspective on integrating machine learning (ML) into applied hydrologic modeling workflows, arguing for “reconciliation” or “revamping” rather than replacement of legacy models. The topic is timely and practically relevant, as many operational agencies are indeed grappling with how to incorporate ML advances. However, I have several concerns regarding the depth of the contribution, the clarity of the intended audience, and whether the proposed roadmap is actionable enough to move the field forward. My overall impression is that the manuscript largely synthesizes well-known challenges and opportunities without putting forward substantially new ideas or demonstrating that the proposed workflow would be effective in practice. The arguments would benefit from more specificity, more critical assessment of limitations, and clearer articulation of what this perspective adds beyond what the community already understands.
Major Comments
- Table 1: The purpose of Table 1 is unclear. Many widely used models such as VIC, SUMMA, and WRF-Hydro, which are mentioned in the text, are absent. In its current form, the table reads as an ad hoc selection of models without a clear rationale for inclusion or exclusion. I would suggest organizing models by category (e.g., lumped rainfall-runoff models, distributed watershed models, hydrodynamic models, integrated surface-subsurface models) and clearly stating the selection criteria. If the intent is simply to provide illustrative examples, the authors should state this explicitly and trim the table accordingly.
- Section 4.3 Step 3. Select an Appropriate ML Technique: This is proposed as a step in the road map. In practice, this is challenging and often cannot be resolved a priori. The best-performing ML approach varies depending on the use case, data availability, catchment characteristics, and the nature of the modeling deficiency being addressed. There is rarely a single "correct" technique, and the selection process typically requires iterative experimentation and benchmarking. As written, Step 3 risks giving practitioners an unrealistically linear impression of a complex and iterative process.
- Section 4: It is unclear who is the intended audience of the proposed road map. The framework is presented at a high level that may be too general for any specific audience (academia, agencies, or industry to find directly actionable. Each of these communities faces different constraints. For example, agencies may lack ML expertise and computational infrastructure; academics may lack access to operational systems and institutional buy-in. The manuscript would be strengthened by explicitly addressing the skills, resources, and institutional conditions required to implement the proposed workflow. I would encourage the authors to ground the framework with more concrete guidance, perhaps including rough resource estimates, example timelines, or case studies where similar integration has been attempted.
- As an Opinions paper, this manuscript is expected to offer a clear, forward-looking perspective that advances the community's thinking. While the framing around “retain, revamp, reconcile, or replace” is effective, the individual ideas presented (using ML for data gap-filling, calibration, surrogate modeling, etc.) are already well-established in the literature. I would encourage the authors to more clearly articulate what is new in their perspective.
Minor comments
- Table 1 lists “select municipal water treatment plants” as the typical users of MIKE SHE, which could be misleading. MIKE SHE is a fully integrated, physically-based hydrologic model whose user base is considerably broader than municipal water treatment.
- Lines 171–178 (Section 3.2, solver enhancement): The body of ML for accelerating ODE/PDE solutions is substantively different from the data-driven hydrologic modeling discussed elsewhere in the paper and falls more naturally under the umbrella of physics-informed methods, and the authors should use this term explicitly. This would help the reader connect this discussion to a rich and rapidly growing literature (e.g., physics-informed neural networks or PINN, neural operators)
- General proofreading: The manuscript would benefit from a careful proof reading. For example, missing period after outputs in line 141. The sentence starting line 140 largely repeats the point already made two lines above.
Citation: https://doi.org/10.5194/egusphere-2026-583-RC1 -
RC2: 'Comment on egusphere-2026-583', Anonymous Referee #2, 19 May 2026
General comments
The use of AI/ML in hydrology has substantially grown in the past decade in academic contexts (i.e., numerous publications), but its adoption for operational purposes remains relatively slow. This paper raises an important issue of why there is hesitation in adopting AI in applied models, with a catchy title that poses interesting questions: to retain, revamp, reconcile, or replace? The authors offer good insights into why legacy models and workflows persist despite the superior performance demonstrated by recent ML models.
However, in the discussion the paper does not substantively address all options proposed (retain, revamp, reconcile, or replace). The workflow presented (identify limitations, apply an approach, test, and gradually incorporate) is simplistic. It appears incremental, and is essentially retaining and revamping long-established hydrologic model improvement workflows, where traditional model structure or parameterization improvements are instead replaced with ML integration or architecture choice. Overall, the paper presents a narrow view of incorporating ML into existing hydrologic modeling workflows. If we look at the evolution of weather forecasting, ML methods are now rapidly being adopted as the primary forecasters, and process model outputs are being used as data for training. The same shift could occur soon in hydrology, potentially rendering current models and workflows less useful for large-scale, high-resolution, ML-based operational predictions except in generation of synthetic training data. This and other “replace”/reconcile possibilities are not discussed.
Hence, the current discussion seems superficial, and reads more like a workshop summary. The authors also overlook recent advances such as agentic AI, coding assistants, generative AI, and foundation models. Large Language Models (LLMs) are mentioned in passing, mostly from a perspective of improving model outputs. It also neglects significant progress in hybrid process-ML architectures, such as differentiable modeling that can improve the National Water Model (https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2024WR038928) and USGS efforts to implement process-guided deep learning into their national models. The conclusions advocate for the “retain” workflows, without making a strong case for why that is better than some of the more advanced ML methods that have shown demonstrable improvements in prediction accuracy, resolution and speed over traditional methods.
This paper would be more interesting, if it presented a more in-depth analysis of the barriers faced by hydrological modelers in integrating AI. While workflows and upskilling/training are mentioned, the discussion does not touch on other issues such as lack of data, limited GPU compute resources, and the difficulty of keeping up with rapidly evolving AI developments and tools. In particular, the authors do not discuss data-related challenges, including the lack of benchmark datasets, and time-consuming aspects such as data wrangling, quality control, and movement of large data to compute nodes. Most ML demonstrations in hydrology rely on a small set of benchmarks (e.g., CAMELS and its derivatives), curated monitoring network data (e.g., from USGS) or remote sensing products. For operational models, the authors could expand on their brief mentions of interpretability and trustworthiness. Specifically, it would be valuable to discuss validation methods and thresholds for accuracy (or other relevant metrics) required to make the decision about which modeling pathway is the best, and how to gain the trust of operational practitioners. In general, hydrology lacks sufficient benchmarks for model intercomparisons, and hence it is currently challenging to determine the failure modes of ML (or process) models. It is also unclear what timelines are being suggested for slow ML adoption (months, years?), and this suggestion fails to recognize how quickly AI methods are improving and being adopted for many other purposes.
Overall, this paper adds very little in terms of new suggestions for moving the field forward beyond what has been covered previously. There are already numerous opinions, reviews, and reports on the application of ML in hydrologic science and modeling, and the paths forward with integration of process-based and ML models. Below is a list of examples of such papers going back several years. None (other than Nearing et al. (2020) of these are cited in this manuscript.
Painter and Destouni (2026) https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2026WR043509
Yan et al. (2026): https://www.sciencedirect.com/science/article/pii/S1674237026000025
Zhi et al. (2024) https://www.nature.com/articles/s44221-024-00202-z
Varadharajan et al. (2022): https://onlinelibrary.wiley.com/doi/abs/10.1002/hyp.14565
Xu and Liang (2021) https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wat2.1533
Nearing et al. (2020) https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020WR028091
Fleming and Gupta (2020) https://physicstoday.aip.org/features/the-physics-of-river-prediction
Sun and Scanlon (2019) https://iopscience.iop.org/article/10.1088/1748-9326/ab1b7d
Specific comments
Table 1 is not useful or comprehensive as a list of process models. There is a huge number of process models used in hydrology (see the extensive list of hydrologic models maintained by the CSDMS https://csdms.colorado.edu/wiki/Hydrological_Models). Alternatively, Figure 1 in Yan et al. (2026) provides a useful overview of the trajectory of hydrologic model development. The main point to make is that there is a huge diversity of process models at varying levels of complexity, and the opportunities for integration with AI can vary a lot depending on the model type. This paper could explore that concept further.
The workflow sections are very short, lacking depth, and should be expanded substantially, as the primary contribution of the paper. The integration objectives and criteria for choosing an ML approach are big topics that can be elaborated on further.
The paper lacks sufficient referencing for an opinion piece. For example, it includes only three references to single-variable machine learning (ML) models, despite the existence of hundreds of publications demonstrating ML model skill across a diverse range of variables. As previously mentioned, numerous review papers that cite hundreds of these studies are available and should be included.
Citation: https://doi.org/10.5194/egusphere-2026-583-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 916 | 810 | 71 | 1,797 | 89 | 144 |
- HTML: 916
- PDF: 810
- XML: 71
- Total: 1,797
- BibTeX: 89
- EndNote: 144
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript presents a timely and well-articulated opinion on the evolving role of applied hydrologic models in the era of ML. The paper is well structured, written in a clear and accessible style, and supported by illustrative examples spanning forecasting, planning, and decision-support contexts. The proposed roadmap for ML integration is particularly valuable, as it frames adoption not only as a technical evolution but also as an institutional and cultural transition. Overall, the manuscript makes a meaningful contribution to an important and ongoing discussion. However, several issues should be addressed to further strengthen the clarity, rigor, and practical impact of the paper.
Comments
1- The manuscript would benefit from a more explicit decision framework clarifying when users should retain, revamp, reconcile, or replace existing modeling approaches. While the conceptual distinctions are helpful, readers will seek clearer decision criteria or guiding principles. Incorporating a structured comparison (e.g., a decision matrix based on data availability, interpretability needs, regulatory constraints, computational cost, and operational risk) would substantially improve the manuscript’s applicability.
2- The paper surveys a wide range of models, tasks, and ML integration opportunities. Although informative, the breadth risks diluting the central message. The manuscript would be strengthened by prioritizing or highlighting the most impactful and realistic integration pathways (e.g., calibration acceleration, surrogate modeling, bias correction, forcing-data improvement). This would enhance focus and provide clearer guidance for applied users.
3- Sections discussing LLMs)introduce interesting perspectives but would benefit from clearer boundaries regarding current capabilities versus future potential. Framing LLM-related discussions explicitly as emerging prospects would improve precision and avoid overgeneralization. I do recommend to consider papers such as “Can large language models effectively reason about adverse weather conditions?”, which reflects an active and relevant research frontier. Additionally, other emerging computational paradigms could be briefly acknowledged to broaden the forward-looking perspective. For example, quantum computing is increasingly discussed in environmental modeling contexts. The authors may consider citing recent developments such as "HydroQuantum: A new quantum-driven Python package for hydrological simulation" as an example of exploratory directions that, while still nascent, may influence future modeling workflows.