the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Comparative Evaluation of ERA5-Land and ISIMIP3 Runoff Forcing for Global River Streamflow Simulation
Abstract. Flooding is among the most widespread natural hazards worldwide, yet many high-risk regions lack the observational data needed for effective flood planning. In these data-sparse regions, global flood models remain essential tools for estimating flood hazard, although their performance is strongly influenced by the choice of runoff forcing data. Two widely used global runoff products are the reanalysis-based ERA5-Land dataset and the ISIMIP3a multi-model hydrological ensemble. Their selection involves an inherent trade-off between high-resolution reanalysis runoff and runoff simulated by hydrological models driven by bias-corrected meteorological inputs, the latter also providing an explicit representation of uncertainty through ensemble spread. This study presents a comparative evaluation of these two products by routing both through a consistent global hydrodynamic framework (CaMa-Flood). Model performance was assessed across IPCC SREX regions against observations from 5,071 gauging stations using the Kling-Gupta Efficiency and its components, while long-term trends in low, mean, and high streamflow were evaluated from a subset of 3,135 stations with sufficient temporal coverage. Simulations forced by ERA5-Land show superior skill in reproducing observed daily streamflow, with consistently higher correlation and stronger agreement in the spatial pattern of regional streamflow trends. However, systematic biases in streamflow magnitude and a tendency to exaggerate drying trends, particularly for low streamflow, are also evident. In contrast, the ISIMIP3a ensemble shows lower skill in reproducing observed daily streamflow metrics but provides more conservative and observation-consistent estimates of long-term trends. Ensemble averaging further improves robustness, with simulated trend ranges more frequently overlapping observational uncertainty bounds, albeit at the expense of dampened variability and extremes. Differences between native and spatially aggregated ERA5-Land runoff were negligible within the present modelling framework. Overall, the results demonstrate that no single runoff product is universally optimum: ERA5-Land is well suited for reproducing historical streamflow dynamics, whereas ISIMIP3a is particularly valuable for robust assessments of long-term hydrological change and uncertainty.
- Preprint
(1718 KB) - Metadata XML
-
Supplement
(2642 KB) - BibTeX
- EndNote
Status: open (until 29 Jul 2026)
- RC1: 'Comment on egusphere-2026-2739', Anonymous Referee #1, 10 Jun 2026 reply
-
RC2: 'Comment on egusphere-2026-2739', Anonymous Referee #2, 15 Jun 2026
reply
The study simulates global river streamflow using the ERA5-Land runoff dataset and the ISIMIP3a hydrological runoff ensemble within a consistent CaMa-Flood modelling framework, and evaluates the resulting simulations against GRDC observational data. The topic is timely and potentially valuable, given the widespread use of both runoff products in global hydrological and flood-risk applications. However, the manuscript does not clearly establish the rationale for comparing ERA5-Land and ISIMIP3a, nor does it sufficiently explain the mechanisms behind their contrasting performance. The analysis mainly reports that ERA5-Land performs better for daily streamflow, while ISIMIP3a gives more conservative trend estimates, but the roles of forcing construction, bias correction, model structure, ensemble design, routing effects, and spatial resolution are not adequately disentangled. Therefore, I consider the manuscript unsuitable for publication in its current form and recommend rejection. My comments are as follows:
- The scientific question is not sufficiently developed. The manuscript shows that ERA5-Land performs better for daily streamflow KGE, while ISIMIP3a gives more conservative long-term streamflow trends within the CaMa-Flood framework. However, the paper mainly reports these contrasts rather than explaining why they occur. More diagnostic analysis is needed to identify whether the differences arise from forcing construction, precipitation bias correction, model structure, ensemble averaging, human impacts, calibration, or routing interactions. Additional experiments or targeted diagnostic analyses should be conducted to explain the mechanisms underlying the differences between the two runoff products in streamflow simulations, thereby strengthening the scientific significance of the study.
- The rationale for comparing ERA5-Land and ISIMIP3a is not clearly established. Lines 65–75 first discuss streamflow-corrected runoff products and ISIMIP-type runoff generated from bias-corrected climate inputs, but ERA5-Land is then introduced rather abruptly as a reanalysis-based dataset, mainly with emphasis on its high resolution and near-real-time availability. In essence, both ERA5-Land and ISIMIP3a generate runoff from meteorological forcing through land or hydrological models, but their forcing construction and modelling chains differ substantially. ERA5-Land runoff is produced by a land-surface model driven by ERA5 atmospheric reanalysis forcing, whereas ISIMIP3a runoff is produced by multiple hydrological models driven by bias-adjusted meteorological forcing. The authors should therefore more clearly justify why these two products are selected and explain their fundamental differences in terms of data assimilation versus bias correction, land-surface versus hydrological model structures, single-product versus ensemble design, and their expected implications for streamflow simulation. Without this clarification, the study reads more like a comparison of two available datasets than a well-justified scientific evaluation.
- The conclusions may be too dependent on a single routing model. All main results are derived from CaMa-Flood simulations, so it is difficult to separate runoff-forcing effects from routing-model effects. Streamflow timing, variability, high-flow behaviour, and trend propagation may depend on the specific routing scheme, river network, dam treatment, floodplain storage, and parameter settings. The authors should either test whether the conclusions are robust using other routing models or clearly limit their claims to the CaMa-Flood framework.
- The conclusion that spatial resolution has limited influence is too broad. The manuscript only compares the native ERA5-Land runoff resolution with one aggregated resolution, and this single sensitivity test is insufficient to support a general statement that resolution has little impact on streamflow simulations. The result should be interpreted more cautiously as showing limited sensitivity within the present CaMa-Flood configuration and selected metrics. A more robust assessment would require additional resolution levels and, ideally, tests across different hydrological settings or basin sizes.
- The Introduction devotes considerable space to the general impacts of flood hazards, but gives relatively limited attention to the uncertainty of runoff forcing products, which is the central issue of this study. The authors should shorten the broad background on flood disasters and refocus the Introduction on how different runoff datasets affect global streamflow and flood simulations. A more comprehensive review of previous studies on forcing-data uncertainty, runoff-product differences, and their propagation into flood modelling would help better motivate the comparison between ERA5-Land and ISIMIP3a.
- Sections 2.1.2 (“Global flood simulation”) and 2.2 (“Evaluation of simulations”) contain duplicated descriptions of the CaMa-Flood model. The repeated text should be removed, and the Methods section should be reorganized to avoid redundancy and improve readability.
- In Section 3.1.1, the use of regional median KGE alone may not adequately represent model performance within each SREX region. Although the median is a useful robust summary statistic, it can mask substantial station-level variability, especially in large and hydroclimatically heterogeneous regions. The authors should provide additional information, such as interquartile ranges, boxplots, station-level distributions, or the fraction of stations with positive KGE values, to better support the regional performance assessment.
- In Section 3.2.1, the statement that the leave-one-out sensitivity tests indicate “moderate dependence on individual regions with pronounced hydroclimatic signals” is not clearly supported by Table S2. The table only reports the range of leave-one-out cross-regional correlation coefficients, but does not identify which excluded regions drive the minimum or maximum correlations, nor does it show that these regions correspond to pronounced hydroclimatic signals. The authors should either provide the region-specific leave-one-out results and explain which regions control the sensitivity, or soften this interpretation.
- Figure 4 presents trend comparisons using regional averages, but this aggregation may mask important station-level variability. The authors should consider adding a station-level comparison using all available gauging stations, for example as a supplementary scatter or density plot. This would allow readers to better assess whether the reported trend relationships are consistently supported across individual stations rather than mainly reflecting regional aggregated values.
Citation: https://doi.org/10.5194/egusphere-2026-2739-RC2
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 59 | 24 | 4 | 87 | 5 | 1 | 1 |
- HTML: 59
- PDF: 24
- XML: 4
- Total: 87
- Supplement: 5
- BibTeX: 1
- EndNote: 1
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This manuscript compares streamflow simulations forced by rear5land runoff and ISIMIP3a runoff products within a consistent camaflood routing framework, using GRDC observations for global validation. The topic is timely and potentially important. However, the manuscript in its current form requires very very substantial revision. The study design does not cleanly isolate the causes of the differences between Ear5land and ISIMIP3a, the statistical treatment of trends and uncertainty is underdeveloped, several conclusions are overstated, and key methodological details are missing. There are also serious presentation problems, including duplicated text in the Methods section. I would not recommend publication in the present form
1. The manuscript presents a comparison between ear5land and ISIMIP3a runoff to address an important research gap, but the novelty is not yet convincingly established. Many previous studies have already evaluated global runoff products, streamflow simulations, hydrological model ensembles, reanalysis products, and routing uncertainty. The authors need to explain more explicitly what is new in this study and how it advances beyond existing benchmark comparisons.
2. Currently, the paper mostly comes across as a benchmark comparison. However, it would be helpful to clearly explain the main goal of the regional performance assessment. While the authors share regional KGE maps, they don’t specify the scientific question this analysis aims to address. maybe clarify whether this part is intended to evaluate the spatial consistency of Ear5land’s benefits, identify regions where both runoff products may not perform well, or or explore regional relationships among ISIMIP3a models may provdie some help.
3. The experiment does not isolate the effect of forcing dataset, hydrological model structure, bias correction, calibration, human impacts, or spatial resolution. Yet the Discussion, especially Section 4.2, frequently suggests causal explanations. The authors should avoid causal overinterpretation unless additional experiments are added. At minimum, they should clearly state that the comparison is between two runoff-product configurations rather than a controlled attribution of individual uncertainty sourcs
4. The authors should explain why the SREX regional scale is appropriate for evaluating model performance. SREX regions are broad climate-impact regions rather than hydrological units, and aggregation at this scale may mask substantial basin-level variability. The authors should justify this choice and discuss whether using major river basins, hydroclimatic zones, or aridity-based regions would affect the conclusions.
5. ear5land runoff does not include the same human impact representation as ISIMIP3a. Therefore, it is not clear whether the two products are fully comparable in regions strongly affected by reservoirs, water abstraction, irrigation, or other human interventions. The authors should explain how human impacts are represented in each product and whether this difference may bias the comparison.
6. Related to the previous point, the manuscript states that the camaflood dam module is enabled. However, ISIMIP3a simulations may already include some human influences depending on the hydrological model and experimental setup. The authors should clarify whether applying the camaflood dam module to ISIMIP3a runoff could create inconsistencies or doublecount some forms of regulation. A sensitivity experiment without the dam module would strengthen the study.
7. The Discussion needs some significant rewriting. Right now, it repeats the Results and includes some speculative interpretations. It would be helpful for the authors to concentrate the Discussion on the main scientific question. Why does Ear5land seem to do better for daily streamflow changes, while ISIMIP3a appears to be more conservative or more consistent when looking at long-term trends?
8. The Camaflood description is duplicated. The text describing Camaflood appears in Section 2.1.2 and then again at the start of Section 2.2.
9. The Introduction spends too much space on global flood exposure, climate pledges, and future flood risk. Should be shortened and refocused on runoff forcing uncertainty, streamflow simulation, and trend evaluation.
10. The authors have done an interesting job estimating TS slopes for annual Q10, mean flow, and Q90, and then comparing regional medians. To make their trend analysis even more robust and trustworthy, they might consider incorporating MK tests at the station level. Discussing field significance or multiple testing, and checking whether the modeled and observed regional trends differ significantly, could also add valuable insights. The suggestion that ISIMIP3a offers more observation-aligned trend estimates is promising, and with a bit more statistical backing, it can be even more convincing
11. KGE is useful, but KGE alone cannot fully characterize hydrological performance. Possible additions include NSE, logNSE, flow duration curve bias, or others. This is very important because the manuscript is motivated by flood modeling, but the current evaluation does not directly assess flood peaks or floodplain inundation.
12. The manuscript presents trends as percent changes per decade relative to the longterm mean for each flow metric. This can produce very large or unstable values when mean low flow is small, especially in arid and semi-arid regions.\
13. The conclusion that spaatial resolution has limited influence is too broad. The experiment only shows that, within the present Camaflood configuration and for the selected streamflow metrics, upscaling ear5land runoff from 0.1° to 0.5° has little effect.
14. The regional median KGE values don’t fully capture how well the model performs in each region. Keep in mind that within each SREX region, the performance at individual stations can differ quite a bit, highlighting the importance of looking at more detailed data.
15. The interpretation of the uncertainty overlap in Table 1 might benefit from a little bit of clarification. Keep in mind that a model with broader uncertainty intervals tends to overlap with observations more frequently, but this doesn’t automatically mean it performs better.
16. The staation filtering and allacation procedure needs more detail.
17. The manuscript should better connect the daily streamflow performance analysis with the long-term trend analysis. For example, does region with high KGE also show better trend agreement? Do regions with low KGE show larger trend errors? The current manuscript treats these two analyses mostly separately, but their relationship is central to the paper’s interpretation.
18. The manuscript should be more cautious when stating that Ear5land is well suited for reproducing historical streamflow dynamiccs and highlight that ISIMIP3a is especially valuable for long-term trend assessment. These are reasonable hypotheses, but the current evidence isn't yet strong enough to endorse such broad recommendation.
19. The authors should also address the non-independence between ear5land and ISIMIP3a since ISIMIP3a W5E5 partly uses ERA5 data and applies bias adjustments based on observational products, and ear5land is similarly driven by ERA5 atmospheric forcing, this connection needs acknowledgment.
20. Additionally, the paper should avoid suggesting that the results directly apply to flood hazard modeling unless flood-specific metrics are incorporated. Currently, the analysis focuses only on streamflow skill and trends.
21. Cama-flood simulation use saome river width and depth for ear5land and ISIMIP3a? Or they calculate seperately.