the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Comparing multi-model mosaic and multi-model combination methods to simulate streamflow across the contiguous USA
Abstract. The ability to accurately predict streamflow underpins decisions in water management, flood prevention, and sectoral planning. Traditional approaches for streamflow prediction often rely on one single model, thereby overlooking potential benefits from using multiple models. To address this limitation, this study explores alternative methods that select and combine multiple models to enhance streamflow simulations. Specifically, we assess the performance of multi-model mosaic methods that assign a single model to each catchment, and multi-model combination methods that merge multiple models using static or dynamic weighting schemes. The Framework for Understanding Structural Errors (FUSE) is used to create an ensemble of 78 hydrological models, which were applied to 559 catchments from the CAMELS dataset across the contiguous United States. Each of the 78 models is calibrated utilizing a composite objective function, calculated as the average of a high-flow and a low-flow performance metric, to cover a wide range of streamflow conditions. The results show that a carefully chosen single model from a larger ensemble can closely approach the performance of more complex multi-model strategies. Among the multi-model approaches, the combination and mosaic methods show broadly similar overall skill, although the combination approaches deliver slightly higher performance and lower sampling uncertainty. However, per-catchment differences persist, indicating that no single multi-model strategy dominates everywhere. This heterogeneity in performance makes it difficult to determine a priori which multi-model method will best represent streamflow in a given catchment.
- Preprint
(26568 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 11 Mar 2026)
- RC1: 'Comment on egusphere-2025-6083', Anonymous Referee #1, 20 Feb 2026 reply
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 223 | 124 | 12 | 359 | 11 | 13 |
- HTML: 223
- PDF: 124
- XML: 12
- Total: 359
- BibTeX: 11
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General comments
This paper demonstrates and compares approaches to combine multiple hydrological models (i.e. multi-model mosaics vs multi-model combinations), answering the question of which multi-model approach performs best over a large sample of catchments in the US. First, I’d like to say that I really enjoyed reading this paper. It covers an important topic – how to improve streamflow simulations through multi-model approaches – in a novel way. I also appreciated the discussion of sampling uncertainty, which is often overlooked in modelling studies. The figures were excellent, well presented and very clear, and the paper was well-written. I would recommend that this manuscript is worthy of publication with minor corrections and clarification of the methods. Further comments and suggestions are outlined below.
Specific comments
(1) section 2.5.2.2 left me with questions such as what are the benefits of minimising the number of models, how exactly does the method reduce the number of models required, and how many model structures remained? On further reading I found that more details are given in appendix A – it would be helpful to refer to this in the main text.
(2) section 2.5.3.1. Could you clarify how the models were combined? “using a simple average of up to four models” – did you take an equally weighted mean of discharge values from all four models for each timestep?
(3) Section 2.5.3.2. – the method selects “the combination of up to three models that yields the highest KGEcomp scores over the calibration period” – I’d be curious to know if there any cases where a single model is better than any combination of 2 or 3 models? And in this case would you use the single model as ‘the best combination’ or does this method require a minimum of 2 models? Again, this section could refer to appendix A.