Using satellite observations to validate and improve reservoir storage simulations in global hydrological models
Abstract. Global hydrological models (GHMs) increasingly incorporate generic reservoir operation schemes (GROS) to simulate the regulation of rivers by dams. However, the reliability of GROS remains largely unvalidated on a global scale due to the historical scarcity of open in situ data. Here, we leverage the Global Reservoir Storage (GRS) satellite dataset to conduct the first comprehensive quantitative evaluation of reservoir storage simulations globally from five GHMs: H08, WaterGAP2-2e (WGP), MIROC-INTEG-LAND (MIL), CWatM (CWT) and LPJmL5-7-10-fire (LPJ). H08, WGP, MIL and LPJ adopted the process-based Hanasaki et al. (2006) reservoir operation scheme (H06), while CWT adopted the piecewise-function rule curve approach of Burek et al. (2013, 2020) (LIS). We address two primary questions: (1) how accurately do state-of-the-art GHMs reproduce global reservoir storage dynamics? and (2) are model deficiencies attributable to parametric rigidity (i.e., the adoption of globally uniform parameters) in GROS? We evaluated monthly reservoir storage series at 424 major dams (capacity ≥ 0.5 km³) over the historical period, 1999–2018. Performance was quantified using the Kling-Gupta Efficiency (KGE). Two post-hoc bias correction methods—linear scaling and variance-matching—were applied to the raw monthly storage simulations to evaluate whether simple, targeted statistical transformations could recover model skill. To comprehensively address parametric rigidity, we conducted a sensitivity analysis on H08 using its H06 scheme by varying two parameters: target storage level (TSL) and the degree of regulation threshold (DORT) and using LIS by varying the normal storage limit (LN). Our evaluation reveals that current GROS yield generally unsatisfactory performance, characterised by two distinct features. The first concerns seasonal amplitude in storage. MIL initially achieves the highest skill: 52.36 % of dams had a KGE > -0.41. However, KGE decomposition revealed this skill was largely due to dampened intra-annual variability rather than being driven by high correlation and/or low bias error. In contrast, the other GHMs often exhibit excessive seasonal drawdown, systematically overestimating storage amplitude. The second feature pertains to temporal dynamics in storage: within the group exhibiting exaggerated seasonal drawdown, H06-based models—H08, WGP and LPJ—significantly outperform the LIS-based CWT in temporal correlation. We demonstrate that when variance-matching bias correction is applied across all GHMs, two things happen: firstly, the performance of all GHMs becomes generally satisfactory (median KGE > -0.41), and secondly, the GHMs with exaggerated seasonal drawdown outperform MIL in terms of KGE, owing to their superior temporal correlation (H06-based GHMs) and mean bias estimation performance (except H08). By contrast, linear scaling yields only marginal improvements, indicating that correcting variability errors is substantially more effective than adjusting mean bias alone. Furthermore, sensitivity analyses confirm that exaggerated seasonal drawdown is primarily a result of parameter choices rather than inherent flaws in GROS. These findings highlight two critical insights: (1) one-size-fits-all parameters are a primary limitation in global reservoir modelling; and (2) satellite observations are a viable dataset for calibrating reservoir operation schemes in GHMs.