the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Autocalibration of a physically-based hydrological model: does it produce physically realistic parameters?
Abstract. Hydrological models are essential tools when predicting water availability, floods, and droughts. Physically-based models are capable of representing sophisticated degrees of realism compared to conceptual or data-driven models as they explicitly solve equations based on well-established physical laws that are directly related to catchment processes. However, they can require extensive calibration, which can be computationally demanding. This study develops and applies an autocalibration method for SHETRAN, a physically-based model, to improve its performance across 698 catchments in the UK. This paper discusses the process of model calibration, the benefits and caveats of the approach and discuss the extent to which physical realism of the parameters are preserved through the autocalibration.
Results show that the autocalibration process significantly improves SHETRAN’s performance, raising the median NSE value for the 698 catchments from 0.69 to 0.82. After calibration, 85 % of catchments achieve NSE values of ≥0.7, demonstrating a substantial enhancement in accuracy of simulations across a range of catchments with different climatic, hydrological, topographical, and geological characteristics. The greatest improvements were observed in groundwater-dominated catchments, where uncalibrated simulations struggled. Additionally, simulated transmissivity values align well with measured data, providing confidence in the model’s ability to produce parameters that mirror physical realism.
This study highlights the feasibility of applying physically-based models at a national scale when combined with effective autocalibration techniques. Autocalibrated-SHETRAN-UK performs comparably to conceptual and data-driven models, whilst offering improved transparency of hydrological processes. Future work will focus on integrating groundwater levels into the calibration process of SHETRAN and refining the model by introducing more spatial complexity in soil and aquifer representation within the model to better reflect real-world variability. These advancements will further enhance our capability to simulate hydrological responses under changing climatic and land-use conditions using SHETRAN.
- Preprint
(2179 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2025-1824', Anonymous Referee #1, 15 Jul 2025
reply
Title: Autocalibration of a physically-based hydrological model: does it produce physically realistic parameters?
The paper presents the results of a calibrated process-based model and compares them with other models. Moreover, the paper describes qualitatively the similarity of the hydrogeological parameters versus other sources of information. The improvement is substantial compared with the uncalibrated model, which sets the calibrated model at the level of conceptual and machine learning models, with the addition of having the interpretability of its parameters.
Two main concerns emerge from the paper. First, it is well known by the hydrology community that uncalibrated models can improve substantially if some calibration of their parameters is applied. Therefore, the author should put more effort into highlighting the difficulty of applying a calibration in such models. In this context, a better description of the process applied during the calibration will benefit a broader community that needs to calibrate those models. However, when the methodology was finally mentioning the autocalibration, the authors sent the information to the appendix, which goes in detriment of the importance of presenting a methodology for calibrating such models. From my point of view, this is a key point that is not presented adequately.
The second main concern is about the “realism” of the parameters. The authors spent many sections of the paper trying to probe that, but they did so qualitatively. Proving this strong argument needs more than comparing maps. The authors can analyze the results with scatter plots, correlations, % of catchments with consistent parameters, developing parameter maps with Krigging (or cokriging), etc., just to mention more robust analyses. Without that analysis is impossible to answer the title question. Moreover, the authors mention the word “real” many times to refer to the comparison with other maps. However, they fail to understand that such maps are just the result of some model. Therefore, they cannot be considered as “real” data. If they want to compare with real data, they should compare with the parameters extracted from the wells, therefore, point observations. Any other spatial distribution of such parameters is just a model (synthetic data). Another situation that the author did not mention about the groundwater parameters is that they are probably the parameter with the most uncertainty. The authors should incorporate an analysis of other parameters that are probably more easily constrained by observation or using remote sensing products.
In summary, I consider that the paper has results that are valuable for the hydrology community, but major changes must be made to highlight the points that are important for such a community, moving away from just presenting the result of a model.
Minor comments:
Line 11-12. That is a very strong argument that will create a lot of controversy. I like process-based models because they can represent a very complicated world with their simplified equations. However, they do not represent the truth because the complexity of a catchment is many orders of magnitude more than the representation of a process-based model. Moreover, in general, given that machine learning models can generate better results than process-based models under the same data, it shows us that there is more to improve in such models. Therefore, I don’t think the authors need to enter into this controversy.
Line 27-29. I do not think talking about future work in the abstract is a good idea because authors should summarize their findings, not what they did not do.
Line 50-52. This statement could be easily said too for a process-based model. To generate the equations used in the model, a lot of data and years of experiments were needed; therefore, saying that data-based models require more data is not fair.
Line 56. “well-established physical laws”. I am pretty sure that the only physical laws implemented in the model are mass and energy conservation. Moreover, these physical laws are probably not satisfied at the resolution of the model. Any other equations used by the model are just simplifications of the truth.
Line 59. The degree of uncertainty of such parameters is huge when they are used in catchment-scale models (or at 1km2 resolution). This statement does not have support.
Line 125. The idea of “realism” is oversold. It is well known that parameters do not necessarily represent something real in the world, especially if we work at 1km2 resolution.
Line 143. It would be beneficial if more details about the variability were added. The variability in the CAMELS dataset of GB, US, and CL is very different.
Line 181. Does it mean you treated the catchment as lumped? Is the meteorological forcing lumped too?
Line 205. The first word in the title is about the autocalibration, so the reader will think that something novel is presented about that in the paper. However, the authors did not talk about that and sent it to the appendix. This method should be highlighted more.
Line 281. The paper must be general enough for a worldwide audience; therefore, referencing places must be avoided.
Line 282-286. A section cannot be referenced before it is introduced in the text.
Line 295. Where are the PBIAS results presented in the paper?
Line 297. Add reference to the BGS hydrogeological aquifer map.
Figure 4. This information is better represented by a scatter plot.
Line 319-322. This statement is not supported by the results, or at least by the presented figures. Add some reference, or you must clarify that this is just a hypothesis.
Figure 5. The same color must be used for positive (reds) and negative (blues) changes. Blue cannot be used for positive changes.
Line 335. Why are the CAMELS attributes not used?
Line 344. There are a few catchments with NSE lower than 0.6; therefore, there is not enough information to support this statement.
Line 358. This is good, but a figure is not needed to show that no relationship was found. Send the figure to the appendix.
Line 360. I think this analysis would be more relevant if the best ML and lumped model were included.
Line 378-380. Is this statement checked by changes in the model (sensitivity analysis), or is it just a statement assumed, given the simplifications of the calibration?
Line 317. Why simpler? The calibration method used is equally simple to many of the models presented.
Figure 8. This figure does not have a fair comparison between models because each study has different catchments (probably different training periods). However, only one autocalibrated result is presented. The result of the autocalibrated model for each study must be added.
Line 442. Why were these parameters selected? They are probably the most uncertain of all of them.
Line 443. The main agreement is in the southern areas. The rest of the catchments are not necessary in agreement. Try to qualify the number of catchments in agreement or disagreement.
Figure 10. Color over color is not the best way to present the information. Try to incorporate a hatch for the catchments.
Figure 12. A comparison only with maps is not enough to show the similarities between the parameters and the external source. A scatter plot would be a better option. Try to incorporate the NSE as color to have another dimension for the differences.
Line 529. The term autocalibrated was used without describing it. What makes the model autocalibrated or just calibrated?
Line 536-537. I disagree. One analysis of performance and characteristics was presented, which was not conclusive. Later, only groundwater parameter analysis is presented without applying a direct comparison with performance.
Line 544-544. Where did you present these results?
Line 546-548. The authors are leaking information about some results they did not present in the paper. It is nice that more results would be available, so the authors know the reason, but the discussion and conclusion must be associated only with the results presented in the paper.
Line 566. “measured data”. That is not true. The authors compared with external sources that used observations to create a spatial distribution of the parameters. However, this is far from observed data. The author can calculate the density of observations used in such maps, and probably they will be less than one per catchment. Therefore, the map is just an interpolation method with high uncertainty that cannot be considered as “measured data”.
Line 569. Several remote sensing products could be used to analyze Ae/Pe in the model, but they were not used in the analysis.
Line 578-579. The statement about the sub-daily scale is true; however, GR4J and LSTM models were able to predict better than SHETRAN using lumped data. Therefore, the sub-daily scale will not solve the problems in the architecture that SHETRAN could have.
Line 586. “approximate measured values”, “observed data”. This is not true. See comment in line 566.
Line 587-588. The author did not test for extended future scenarios or climate change; therefore, they cannot be confident about that.
Line 590-593. This is information that is not relevant to this paper. The authors should focus only on the results presented.
Line 605. “real world”. The authors are overselling their result. They just compared the result visually with maps generated by other sources. This is not enough to probe consistency. For example, they did not check for spatial consistency between catchments. How can the “real world” be mentioned if the parameters change drastically between adjacent catchments?
Citation: https://doi.org/10.5194/egusphere-2025-1824-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
345 | 53 | 12 | 410 | 9 | 21 |
- HTML: 345
- PDF: 53
- XML: 12
- Total: 410
- BibTeX: 9
- EndNote: 21
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1