the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Distributed Hybrid Physics-AI Framework for Learning Corrections of Internal Hydrological Fluxes and Enhancing High-Resolution Regionalized Flood Modeling
Abstract. To advance the discovery of scale-relevant hydrological laws while better exploiting massive multi-source data, merging artificial intelligence with process-based modeling has emerged as a compelling approach, as demonstrated in recent lumped hydrological modeling studies. This research proposes a general spatially distributed hybrid modeling framework that seamlessly combines differentiable process-based modeling with neural networks. We focus on hybridizing a differentiable hydrological model with neural networks, leveraging the temporal memory effect of the original model, on top of a differentiable kinematic wave routing over a flow direction grid. We evaluate flood modeling performance and analyze the interpretability of learned conceptual parameters and corrections of internal fluxes using two high-resolution data sets (dx = 1 km, dt = 1 h). The first data set involves 235 catchments in France, used for local calibration-validation and model structure comparisons between the classical GR-like model and the hybrid approach. The second dataset presents a challenging multi-catchment modeling setup in flash flood-prone areas to demonstrate the framework's regionalization learning capabilities. The results show that the hybrid models achieve superior accuracy and robustness compared to classical approaches in both spatial and temporal validation. Analysis of the spatially distributed parameters and internal fluxes reveals the hybrid models' nuanced behavior, their adaptability to diverse hydrological responses, and their potential for uncovering physical processes.
- Preprint
(4046 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 05 Apr 2025)
-
RC1: 'Comment on egusphere-2024-3665', Anonymous Referee #1, 24 Feb 2025
reply
Review of HESS Manuscript
“A Distributed Hybrid Physics-AI Framework for Learning Corrections of Internal Hydrological Fluxes and Enhancing High-Resolution Regionalized Flood Modeling”
Dear Editor,I have attached my review of the manuscript.
1. Scope
The scope of the paper is well suited for HESS.
2. Summary
The authors introduce a distributed hybrid hydrological model. The model is based on the GRU process-based model architecture but includes embedding neural networks that are used to parameterize the process-based model. They test their model on 256 catchments located in France, divided into two datasets (235 and 21 catchments for the first and second datasets respectively). They conclude that the hybrid approaches perform better than the stand-alone process-based models.
Overall, the manuscript has the potential to be a good contribution, however, there are certain aspects mentioned in the comments below that should be taken into account before the manuscript is accepted.
3. EvaluationMajor comments:
Model comparison: The authors compare a stand-alone GRU model and their hybrid model approaches. I think the comparison between these two models is necessary, valuable, and the results are presented clearly. However, the fact that the hybrid performs better than the stand-alone process-based model is expected. With the hybrid approach, you have a model with more degrees of freedom, and the embedded NN can compensate for structural deficiencies in the process-based part, which will increase performance.
What I think is missing, to have a better idea of where the hybrid stands, is a comparison with a purely data-driven approach. For example, having a stand-alone LSTM, trained regionally with lumped meteorological inputs (e.g., catchment average values) would be a good benchmark. Or use as inputs, not only the basin-averaged values of precipitation, temperature, etc,… but also include other basin-averaged statistics (mean, std, max and min) that you can compute from the gridded products. This way we can see how the hybrid approach performs against purely data-driven methods, and if the extra effort of going distributed is worth it.
Section 3: Here you present two datasets, with which you run two sets of experiments. The first dataset includes 235 non-nested catchments in France with 13 years of data. In this dataset, you test the effect of having a NN for process parameterization. In the second dataset, you have 21 catchments in the Mediterranean region, both nested and independent, with 7 years of data. This one you use to test the model regionalization. Is there a reason why this last test cannot be made in the first dataset? One can evaluate regionalization from catchment to catchment, and not only inside the same catchment. Moreover, having results in 235 catchments for the second experiment will give more robust tests. Also, you can mix everything in a single dataset with 256 catchments. I was just wondering why did you make this division?
Line 282-294: The differences between the models are quite small. For example, the difference shown in Figure 4 between the median NSE for the GR.U and the GRNN.U is 0.008 and between the GRD and the GRNN.D is 0.014. Are the differences between the reported distributions statistically significant? I think this point should be further discussed. Because the hybrid approach has a higher flexibility than the process-based model. The embedded neural networks produce flux-correction parameters for each pixel and timestep, and if the differences between the hybrid and the stand-alone process-based model are small, it would be interesting to find out why. Maybe the physical dissipation of the basins makes it unnecessary to have so much detail, if one is just interested in the simulated discharge at a specific point. Or maybe the meteorological data is restricting further increases in quality.
As an additional question, do the flux correction parameters allow the model to artificially increase/decrease the amount of water (violate the mass-conservation principle) in the control volume?
Line 295-300: You indicate that you are evaluating the performance of the model in flash floods, and then you evaluate it in 2700 events during the 6-year validation period. Are these 2700 events flash floods or just regular floods? How did you classified them?
Line 327-333: In these lines (and Figure 7) you compare the NSE for 143 flood events, indicating that the hybrid models perform better. Even if this is true, all the models performed quite badly. For the GR.U and GRNN.U the median NSEs are -0.48 and 0.09, which is a clear indication that the models do not work at all. Just taking the mean of the observed data would yield to a NSE of 0. For the other two models, the NSE did improve, but was still quite low (0.19 and 0.37). You should expand the discussion here and try to understand why all models are performing so badly.
Minor comments:
Line 61: Clarify “This study”.
Line 72: Replace “have to be advanced” to “should advance”
Line 74: What do you mean by “earth critical zone”?
Line 136: The purple color of the parameters is almost red. I would suggest choosing another color scheme, more colorblind-friendly.
Line 183: What do you mean by neutralized atmospheric inputs?
Line 278-279: You indicate about Figure 3 “The results demonstrate the superior accuracy of hybrid methods compared to the classic models…” but it is not clear from the Figure, because one cannot see any details. There are certain peaks in which the hybrid is better, but you have 6 subplots, each with 5 years of hourly discharges, so you cannot really appreciate much. I imagine that if one looks at specific events, sometimes the hybrid is better, sometimes both are similar, and sometimes the process-based is better. Maybe print only a subset of the testing period, or specific events where the differences are significant. Then, with general metrics you can make the point on which model tends to perform better.
Line 285-290: I would separate more clearly (in different paragraphs) the results reported in calibration and validation. It is not a usual practice to compare models using results in the calibration period, as any meaningful comparison should be made in validation. If you want to report the results in calibration that is perfectly ok, but a more clear distinction should be made.
Line 290: The RMSE for GRNN.U, according to Figure 4, is 1.38 not 1.30. You should correct this in the text.
Figure 5. Is the Ebf metric (baseflow) a good/necessary indicator for performance during flood events?
Line 323-326: It is not clear what you want to say.
Line 352: You indicate that “Some spatial patterns in these corrections seem to emerge across France, and although analyzing trends in corrections as a function of physical explanatory factors may yield insights, it is beyond the scope of this study focusing on detailed quantitative analysis of those spatio-temporal corrections”. What are the spatial patterns showing in Figure 8? Because for me they are not so clear. Also, why analysing the correction factors as a function of physical characteristics is out of the scope? I think this is one of the most interesting parts you should focus on. If one of the advantages of hybrid models is that they produce physical interpretability, then one should interpret what the models are doing.
Line 361: You indicate that “a majority of exchange flux corrections fq,4 that share the same sign as fq,1.” Can you quantify this with a metric? Because from the figure is not obvious. fq4 shows more red in the bottom, but I am not sure if the majority of cases are in accordance.
Line 265: You indicate that “periodic behaviors are observed over time in all four heatmaps”. For fq4 I cannot distinguished clear periodic behaviors. Could be useful to plot the timeseries of some basins. Maybe include them in the appendix.
Line 385 and Figure 10b: You indicate that “Interestingly, these maps also reveal spatial variability in internal flux corrections.” It would be interesting to analyse why these patterns emerge.
Line 424: Rephrase “Also, one could also...”Citation: https://doi.org/10.5194/egusphere-2024-3665-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
319 | 48 | 5 | 372 | 3 | 4 |
- HTML: 319
- PDF: 48
- XML: 5
- Total: 372
- BibTeX: 3
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1