A data derived workflow for reservoir operations for simulating reservoir operations in a global hydrologic model

Steyaert, Jennie C.; Sutanudjaja, Edwin; Bierkens, Marc; Wanders, Niko

doi:10.5194/egusphere-2024-3658

Preprints

https://doi.org/10.5194/egusphere-2024-3658

Preprints

30 Jan 2025

| 30 Jan 2025

A data derived workflow for reservoir operations for simulating reservoir operations in a global hydrologic model

Jennie C. Steyaert, Edwin Sutanudjaja, Marc Bierkens, and Niko Wanders

Abstract. Globally there are over 24,000 storage structures (e.g. dams and reservoirs) that contribute over 7,000 km³ of storage. Until recently, most of the data regarding these reservoirs was not openly accessible. As a result, many studies rely on generalized operations based on generalized assumptions about reservoir storage dynamics and management. With the creation of global datasets such as the Global Reservoirs and Dams (GRanD), RealSat, GloLakes, and the International Coalition for Large Dams database (iCOLD) as well as localized datasets such as ResOpsUS for the contiguous United States, and the Mekong Data Monitor for the Mekong River basin, the inference of reservoir operations using data derived techniques has become much more ubiquitous regionally. Yet to our knowledge, there has been no global application of data-derived methods due to their model complexities and data limitations. Therefore, our analysis aims to fill this gap by providing a workflow for implementing data derived reservoir operations in the large scale hydrologic models with an application in the PCRGLOBWB 2 global hydrologic model. This methodology uses global satellite altimetry data from GloLakes, a parameterization methodology developed by Turner et al. (2021), and a random forest model. We also test the sensitivity of our reservoir scheme to downstream demand by selecting three different downstream areas presumed to be served by reservoirs release (command areas): 250 km, 650 km, and 1100 km. Our results demonstrate that our random forest algorithm is able to capture the storage dynamics and that the errors are mainly due to the errors in using remotely sensed storage data. Additionally, we observe in many cases that deriving operational bounds from historical reservoir time series has minimal impact on streamflow at the basin outlets nor is the scheme sensitive to the downstream command areas. We do observe that streamflow is affected directly downstream from the reservoirs and that the data-derived methodology does increase the accuracy of simulated global reservoir storage when compared to observations. In fact, the derived operations have much lower storage values that align better with both direct and remotely sensed reservoir storage observations. This demonstrates that generic operations overestimate the total amount of water stored in reservoirs and, as a result, are potentially overestimating water availability. Ultimately, our workflow allows global hydrologic models to capitalize on recent data acquisition by remote sensing to provide more accurate reservoir storage and global water security.

Received: 22 Nov 2024 – Discussion started: 30 Jan 2025

Competing interests: At least one of the (co-)authors is a member of the editorial board of Hydrology and Earth System Sciences. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 31939 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (31939 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

19 Nov 2025

Data derived reservoir operations simulated in a global hydrologic model

Jennie C. Steyaert, Edwin H. Sutanudjaja, Marc Bierkens, and Niko Wanders

Hydrol. Earth Syst. Sci., 29, 6499–6527, https://doi.org/10.5194/hess-29-6499-2025,https://doi.org/10.5194/hess-29-6499-2025, 2025

Short summary

Jennie C. Steyaert, Edwin Sutanudjaja, Marc Bierkens, and Niko Wanders

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3658', Anonymous Referee #1, 14 Mar 2025

Steyaert et al. present an approach to deriving reservoir operations for global scale hydrologic models. The study is of value to the water resources modeling community, but requires major revisions before being accepted for publication in HESS. Some elements of the method are not well justified, including the categorization of dams into “irrigation-like” and “hydro-like”, as well as use of a release decision approach based on downstream demand aggregated across an arbitrary command area. Use of a random forest model to extrapolate curves is a nice idea but is not evaluated fully (i.e., using a cross-validation scheme) and appears quite ineffective based on the results shown. Although the level and depth of analysis conducted is impressive, the quality of results/figures is quite poor, and often confusing. The study can be simplified and reworked to deliver more clear and compelling results (with more impactful figures) on improvements offered by a data-derived storage scheme. The paper would also benefit from a significant reduction in number of words. The introduction is 13 paragraphs long and contains a lot of general detail. I encourage the authors to rewrite the introduction in a way that brings immediate focus to the problem area, most recent literature addressing that problem, and aims of the study. Three or four paragraphs will suffice. The abstract, currently almost 400 words, can be halved without loss of essential information.
Title: Awkward repetition of "reservoir operations". Did you mean "A data derived workflow for simulating reservoir operations in a global hydrologic model" ? Also, this wording suggests that it is the *workflow* that is data derived, rather than the reservoir operation. So, did you actually mean something like "Data derived reservoir operations in a global hydrologic model" ?
Abstract L2. "most of the data was not openly accessible" . I would suggest that this remains true. Specify the type of data.
L27. water supply reservoirs, flood control reservoirs, and hydropower dams are found in all climate types.
L187. Do you mean: “…to determine reservoir rule curves that specify seasonal flood and conservation pools…” ?
L205. Not clear what is meant by “yearly maps of static reservoir characteristics”. Also, since L180 I have been reading and wondering the motivation and reasoning behind these two categories (“hydropower-like” and “irrigation-like”). Please try to clarify the role of this categorization early in the study.
L250. Please add further detail here on whether any efforts were made to ensure reservoirs were placed on correct streams. From what I read, it seems the lat/lon of the reservoirs are snapped to the PCR-GLOBWB grid then assigned that grid cell.
L270. Ok—here I am now realizing that irrigation-like and hydropower-like categories are used to inform releases, with the starfit approach solely defining storage curves. Doesn’t this mean the operations are not full data-driven but rather half data driven (storage curves) and half “generic” (release policy based on command area demand and reservoir purpose)?
L313 – missing reference to equation 5.
L325-330. I would be very unsure about labels of water supply / irrigation vs hydro etc within GranD leading to a neat splitting of dams respectively operated for downstream demand versus maintaining high storage levels. Apart from the issue of inaccurate reservoir purposes in the available global datasets, one rarely finds such simple distinctions in reality. Are you able to show that two categories of operations actually exist, e.g., by comparing the starfit curves for irrigation-like versus hydropower-like dams in the set of 1752 observed dams? I would be surprised if you find a clear distinction. If this is the case, I don’t see strong justification for the splittling—which in a way complicates the study.
L335. It’s unclear to me what the command area offers. The storage curves can guide the release without a downstream demand. Were any tests performed to evaluate whether this downstream demand actually improves on accuracy?
L342. How are surface water abstractions considered? Is this based on demand within the same grid cell as the reservoir?
Equation 6. Maybe I missed this, but how is Env defined? Also, how is the flood release defined? Is this just spill required to draw the reservoir back to the active zone?
L350. Unclear what is being done here. Are you creating an active zone per dam type and country? Why? I thought the random forest provides full parameterization for each dam.
L381. After validating the model and demonstrating effectiveness with the 25% out validation, why not re-train with all 1,752 structures before extrapolating? Also, given the importance of the random forest to the overall framework, I strongly suggest the authors pursue a k-fold cross validation scheme rather than single training and test samples.
L385. How many reservoirs end up being constrained to these bounds? Also, it’s not clear what is meant by flood peak here. Do you mean upper bound of active storage?
Table 2. Here would be very interesting to see a version that drops the command area and demand parameters (as well as hydro/irrigation split) entirely. I can’t see a strong justification for the demand-based release or the command area (or the hydro / irrigation split for that matter). A simple way to test this would be to take the mid-point of the active zone (i.e. assume just one curve to target) and operate toward that at all times (giving you a very simple release function).
L503. Above you state that Clinton dam has a hydropower main purpose.
Figure 4. Is this average monthly discharge over a number of years, or are you showing a single year’s output?
L588 – this is an inadequate way to evaluate storage dynamics improvement. You have observation and results. Compute NSE / RMSE / KGE or similar for each dam (sim vs obs) and show the difference across a distribution (perhaps splitting by continent or large basin).
Figure 7. It’s not clear why the data-derived storage curves result in a different seasonal storage pattern than GloLAKES for North America. Aren’t the curves based on GloLAKES data?

Citation: https://doi.org/10.5194/egusphere-2024-3658-RC1
- AC1: 'Reply on RC1', Jennie C. Steyaert, 24 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3658/egusphere-2024-3658-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3658-AC1
RC2:
'Comment on egusphere-2024-3658', Anonymous Referee #2, 17 Mar 2025

I have attached a file with my comments.

Citation: https://doi.org/10.5194/egusphere-2024-3658-RC2
- AC2: 'Reply on RC2', Jennie C. Steyaert, 24 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3658/egusphere-2024-3658-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3658-AC2

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-3658', Anonymous Referee #1, 14 Mar 2025

Steyaert et al. present an approach to deriving reservoir operations for global scale hydrologic models. The study is of value to the water resources modeling community, but requires major revisions before being accepted for publication in HESS. Some elements of the method are not well justified, including the categorization of dams into “irrigation-like” and “hydro-like”, as well as use of a release decision approach based on downstream demand aggregated across an arbitrary command area. Use of a random forest model to extrapolate curves is a nice idea but is not evaluated fully (i.e., using a cross-validation scheme) and appears quite ineffective based on the results shown. Although the level and depth of analysis conducted is impressive, the quality of results/figures is quite poor, and often confusing. The study can be simplified and reworked to deliver more clear and compelling results (with more impactful figures) on improvements offered by a data-derived storage scheme. The paper would also benefit from a significant reduction in number of words. The introduction is 13 paragraphs long and contains a lot of general detail. I encourage the authors to rewrite the introduction in a way that brings immediate focus to the problem area, most recent literature addressing that problem, and aims of the study. Three or four paragraphs will suffice. The abstract, currently almost 400 words, can be halved without loss of essential information.
Title: Awkward repetition of "reservoir operations". Did you mean "A data derived workflow for simulating reservoir operations in a global hydrologic model" ? Also, this wording suggests that it is the *workflow* that is data derived, rather than the reservoir operation. So, did you actually mean something like "Data derived reservoir operations in a global hydrologic model" ?
Abstract L2. "most of the data was not openly accessible" . I would suggest that this remains true. Specify the type of data.
L27. water supply reservoirs, flood control reservoirs, and hydropower dams are found in all climate types.
L187. Do you mean: “…to determine reservoir rule curves that specify seasonal flood and conservation pools…” ?
L205. Not clear what is meant by “yearly maps of static reservoir characteristics”. Also, since L180 I have been reading and wondering the motivation and reasoning behind these two categories (“hydropower-like” and “irrigation-like”). Please try to clarify the role of this categorization early in the study.
L250. Please add further detail here on whether any efforts were made to ensure reservoirs were placed on correct streams. From what I read, it seems the lat/lon of the reservoirs are snapped to the PCR-GLOBWB grid then assigned that grid cell.
L270. Ok—here I am now realizing that irrigation-like and hydropower-like categories are used to inform releases, with the starfit approach solely defining storage curves. Doesn’t this mean the operations are not full data-driven but rather half data driven (storage curves) and half “generic” (release policy based on command area demand and reservoir purpose)?
L313 – missing reference to equation 5.
L325-330. I would be very unsure about labels of water supply / irrigation vs hydro etc within GranD leading to a neat splitting of dams respectively operated for downstream demand versus maintaining high storage levels. Apart from the issue of inaccurate reservoir purposes in the available global datasets, one rarely finds such simple distinctions in reality. Are you able to show that two categories of operations actually exist, e.g., by comparing the starfit curves for irrigation-like versus hydropower-like dams in the set of 1752 observed dams? I would be surprised if you find a clear distinction. If this is the case, I don’t see strong justification for the splittling—which in a way complicates the study.
L335. It’s unclear to me what the command area offers. The storage curves can guide the release without a downstream demand. Were any tests performed to evaluate whether this downstream demand actually improves on accuracy?
L342. How are surface water abstractions considered? Is this based on demand within the same grid cell as the reservoir?
Equation 6. Maybe I missed this, but how is Env defined? Also, how is the flood release defined? Is this just spill required to draw the reservoir back to the active zone?
L350. Unclear what is being done here. Are you creating an active zone per dam type and country? Why? I thought the random forest provides full parameterization for each dam.
L381. After validating the model and demonstrating effectiveness with the 25% out validation, why not re-train with all 1,752 structures before extrapolating? Also, given the importance of the random forest to the overall framework, I strongly suggest the authors pursue a k-fold cross validation scheme rather than single training and test samples.
L385. How many reservoirs end up being constrained to these bounds? Also, it’s not clear what is meant by flood peak here. Do you mean upper bound of active storage?
Table 2. Here would be very interesting to see a version that drops the command area and demand parameters (as well as hydro/irrigation split) entirely. I can’t see a strong justification for the demand-based release or the command area (or the hydro / irrigation split for that matter). A simple way to test this would be to take the mid-point of the active zone (i.e. assume just one curve to target) and operate toward that at all times (giving you a very simple release function).
L503. Above you state that Clinton dam has a hydropower main purpose.
Figure 4. Is this average monthly discharge over a number of years, or are you showing a single year’s output?
L588 – this is an inadequate way to evaluate storage dynamics improvement. You have observation and results. Compute NSE / RMSE / KGE or similar for each dam (sim vs obs) and show the difference across a distribution (perhaps splitting by continent or large basin).
Figure 7. It’s not clear why the data-derived storage curves result in a different seasonal storage pattern than GloLAKES for North America. Aren’t the curves based on GloLAKES data?

Citation: https://doi.org/10.5194/egusphere-2024-3658-RC1
- AC1: 'Reply on RC1', Jennie C. Steyaert, 24 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3658/egusphere-2024-3658-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3658-AC1
RC2:
'Comment on egusphere-2024-3658', Anonymous Referee #2, 17 Mar 2025

I have attached a file with my comments.

Citation: https://doi.org/10.5194/egusphere-2024-3658-RC2
- AC2: 'Reply on RC2', Jennie C. Steyaert, 24 Apr 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2024-3658/egusphere-2024-3658-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-3658-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (24 Apr 2025) by Yi He

AR by Jennie C. Steyaert on behalf of the Authors (14 May 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (30 May 2025) by Yi He

RR by Anonymous Referee #1 (03 Jul 2025)

ED: Publish subject to revisions (further review by editor and referees) (14 Aug 2025) by Yi He

AR by Jennie C. Steyaert on behalf of the Authors (03 Oct 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (14 Oct 2025) by Yi He

AR by Jennie C. Steyaert on behalf of the Authors (16 Oct 2025)

Post-review adjustments

AA – Author's adjustment | EA – Editor approval

AA by Jennie C. Steyaert on behalf of the Authors (17 Nov 2025) Author's adjustment Manuscript

EA: Adjustments approved (17 Nov 2025) by Yi He

Journal article(s) based on this preprint

19 Nov 2025

Data derived reservoir operations simulated in a global hydrologic model

Jennie C. Steyaert, Edwin H. Sutanudjaja, Marc Bierkens, and Niko Wanders

Hydrol. Earth Syst. Sci., 29, 6499–6527, https://doi.org/10.5194/hess-29-6499-2025,https://doi.org/10.5194/hess-29-6499-2025, 2025

Short summary

Jennie C. Steyaert, Edwin Sutanudjaja, Marc Bierkens, and Niko Wanders

Data sets

Modelled outputs for "A data derived workflow for reservoir operations (in global hydrologic models)" (Steyaert et al., 2025) Jennie C. Steyaert and Niko Wanders https://public.yoda.uu.nl/geo/UU01/F2UO5H.html

Jennie C. Steyaert, Edwin Sutanudjaja, Marc Bierkens, and Niko Wanders

Viewed

Total article views: 3,472 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,722	652	98	3,472	111	164

HTML: 2,722
PDF: 652
XML: 98
Total: 3,472
BibTeX: 111
EndNote: 164

Views and downloads (calculated since 30 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	152	16	8	176
Feb 2025	208	40	2	250
Mar 2025	136	32	6	174
Apr 2025	122	22	8	152
May 2025	52	38	4	94
Jun 2025	60	20	6	86
Jul 2025	66	44	2	112
Aug 2025	292	40	8	340
Sep 2025	1,098	48	2	1,148
Oct 2025	116	14	8	138
Nov 2025	90	74	4	168
Dec 2025	60	100	10	170
Jan 2026	70	32	8	110
Feb 2026	64	24	0	88
Mar 2026	60	20	12	92
Apr 2026	35	39	3	77
May 2026	22	26	2	50
Jun 2026	6	6	1	13
Jul 2026	13	17	4	34

Cumulative views and downloads (calculated since 30 Jan 2025)

Month	HTML	PDF	XML	Total
Jan 2025	152	16	8	176
Feb 2025	208	40	2	250
Mar 2025	136	32	6	174
Apr 2025	122	22	8	152
May 2025	52	38	4	94
Jun 2025	60	20	6	86
Jul 2025	66	44	2	112
Aug 2025	292	40	8	340
Sep 2025	1,098	48	2	1,148
Oct 2025	116	14	8	138
Nov 2025	90	74	4	168
Dec 2025	60	100	10	170
Jan 2026	70	32	8	110
Feb 2026	64	24	0	88
Mar 2026	60	20	12	92
Apr 2026	35	39	3	77
May 2026	22	26	2	50
Jun 2026	6	6	1	13
Jul 2026	13	17	4	34

Viewed (geographical distribution)

Total article views: 3,464 (including HTML, PDF, and XML) Thereof 3,464 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 26 Jul 2026

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (31939 KB)
Metadata XML

Short summary

Using machine learning techniques and remotely sensed reservoir data, we develop a workflow to derive reservoir storage bounds. We put these bounds in a global hydrologic model, PCR-GLOBWB 2, and evaluate the difference between generalized operations (the schemes typically in global models) and this data derived method. We find that modelled storage is more accurate in the data derived operations. We also find that generalized operations over estimate storage and can underestimate water gaps.


Total:	0
HTML:	0
PDF:	0
XML:	0

A data derived workflow for reservoir operations for simulating reservoir operations in a global hydrologic model

Journal article(s) based on this preprint

Interactive discussion

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Post-review adjustments

Journal article(s) based on this preprint

Data sets

Viewed

Viewed (geographical distribution)