the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Calibrating a large-domain land/hydrology process model in the age of AI: the SUMMA CAMELS experiments
Abstract. Process-based hydrological modeling is a long-standing strategy for simulating and predicting complex water processes over large, hydro-climatically diverse domains, yet model parameter estimation (calibration) remains a persistent challenge for large-scale applications. New techniques and concepts arising in the artificial intelligence (AI) context for hydrology point to new opportunities to tackle this problem in process-based models. This study presents a machine learning (ML) based calibration strategy for large-domain modeling, implemented using the Structure for Unifying Multiple Modeling Alternatives (SUMMA) land/hydrology model coupled with the mizuRoute channel routing model. We explore various ML methods to develop and evaluate a model emulation and parameter estimation scheme, applied here to optimizing SUMMA parameters for streamflow simulation. Leveraging a large-sample catchment dataset, the large-sample emulator (LSE) approach integrates static catchment attributes, model parameters, and performance metrics, providing a basis for large-domain regionalization to unseen watersheds. The LSE approach is compared with a single-site emulator (SSE), demonstrating improved calibration outcomes across temporal and spatial cross-validation experiments. The joint training of the LSE framework yields comparable performance to traditional individual basin calibration while enabling potential for parameter regionalization to out-of-sample, unseen catchments. Motivated by the need to optimize complex hydrology models over continental-scale domains to support national water security applications, this work introduces a scalable strategy for the calibration of large-domain process-based hydrological models.
- Preprint
(8406 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 21 Apr 2025)
-
RC1: 'Comment on egusphere-2025-38', Anonymous Referee #1, 18 Mar 2025
reply
Dear Editor,
In the manuscript ‘Calibrating a large-domain land/hydrology process model in the age of AI: the SUMMA CAMELS experiments’ the authors present a novel hydrological model calibration method. Using a machine learning model, the authors map directly from the calibration parameters, and catchment attributes for the generalized calibration experiment, to the model performance. Subsequently, increasingly better calibration parameters are iteratively selected by using a genetic algorithm in tandem with the machine learning model, updating the machine learning model when new results are in. The manuscript is well written, thorough, and relevant, although some parts of the manuscript remain vague and could be improved. Therefore, I would recommend minor revisions for this manuscript. Below is a more expansive description of my main arguments, as well as a list of line-by-line comments.
Vague manuscript sections
Although the manuscript is well written, the novel calibration approach introduced in the manuscript remains unclear and in the background throughout the manuscript (except for the methods). The title, abstract and introduction could be improved by clearly stating what this study has done, instead of which model or which dataset was used. The same holds true for the discussion and conclusions, where more focus should be on the specific contribution of this study’s calibration approach, instead of generalization statements already made by various other studies. See the below line-by-line comments for more details.
Specific comments
Title: The title is not very descriptive and does not capture the study well. There are many different studies that calibrate large-domain hydrological models using AI. I would suggest revising the title.
Line 12: “a machine learning (ML) based calibration strategy”: What are the novel aspects of this strategy? This provides little information the study.
Lines 15-18: “the large-sample emulator (LSE) approach” / ”a single-site emulator (SSE)” these terms are very unclear as they have not been properly introduced.
Line 64: “physics-based PB”: double
Lines 71: “model emulation”: This study does not actually emulate the model, but the model performance. This distinct difference should be made more clear, especially as this is contrary to most of the other studies discussed in the introduction.
Lines 98-103: But how is the model actually configured. Are these gridded simulations (which seems to be the case based on lines 112-120)?
Lines 122-123: “expert judgment and review of model parameterizations (i.e. process algorithms)”. This sentence is unclear. What does expert judgement and review entail? In addition, are the model parameters the input parameters to the model or the processes included in the model? If the latter, maybe it is better to find a different term than “parameters”, maybe “configuration”?
Section 2.3: This section could benefit from some restructuring (see comments below).
Lines 148-154: Up until this paragraph, the study’s subbasin calibration approach (i.e. each subbasin is seen as a single calibration element; not, for example, each grid cell) was unclear to me. This approach could be better introduced in the introduction.
Lines 164-169: Up until this paragraph, the study’s iterative calibration approach was unclear to me. This approach could be better introduced in the introduction. In addition, this paragraph is better explained in section 2.3.2 (lines 189-199). Perhaps these sections could be restructured as there is a large overlap between the SSE and LSE experiments?
Lines 164-175: This iterative approach is very similar to traditional calibration approaches except for the speedup offered by the model performance emulator. Moreover, significant numbers of process-based model simulations are still needed, even when considering the generalization opportunities. This trade-off could be better discussed in the discussion.
Line 199: This could be a new section, which allows for more detailed description of hyperparameters and cross-validation.
Lines 420-455: These paragraphs do not discuss the novel aspects and strengths of this study. They do not have to, but they take up a relatively large portion of the discussion.
Line 423: “differentiablee”: differentiable
Discussion section: Personally, I would love to see more discussion on the trade-offs between different ML/DL based calibration approaches, and the place of this study’s calibration approach among them. In addition, I would like to know if and how this study’s calibration approach could be used for other (gridded) hydrological models, and to what extent fewer model simulations (e.g. only iteration 0) could be used to generate the same results.
Code and data availability: The code and associated datasets should be made public and cited.Citation: https://doi.org/10.5194/egusphere-2025-38-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
92 | 18 | 1 | 111 | 0 | 0 |
- HTML: 92
- PDF: 18
- XML: 1
- Total: 111
- BibTeX: 0
- EndNote: 0
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1