the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Global hyper-resolution modeling of historical and future groundwater dynamics
Abstract. The sustainable management of global groundwater resources is a key societal challenge and is central to the Sustainable Development Goals. The localized dynamics of groundwater abstraction, topography, and surface-water interactions, as well as the sensitivity of groundwater-dependent ecosystems, call for high-resolution information to support effective groundwater management. At the same time, groundwater observations are very limited and concentrated in a few regions, rendering large parts of groundwater resources ungauged. To address limited observations and coarse global models, we applied the global groundwater model GLOBGM (v1.1) to simulate past and future groundwater heads and water table depth at 30 arc-seconds (~1 km) on a monthly time step. Model calibration improved mean bias in water table depth predictions from -4.8 m to 3.6 m compared to GLOBGM v1.0, with depth-weighted bias reduced from 34.2 m to 32.5 m across 34 800 observation wells. Groundwater dynamics are simulated for a historical reference period (1960–2019) to support model evaluation and attribution of observed impacts to climate variability and change. Baselines (1960–2014) and three combined socioeconomic-climate scenarios (2015–2100; SSP1-RCP2.6, SSP3-RCP7.0, SSP5-RCP8.5) are simulated with five GCMs, supporting detection and impact assessment of future change. Validation against monthly observations yielded skillful predictions (KGE-NP > -0.41) in approximately 75 % of deep wells (>60 m) and 90 % of shallow to intermediate wells (0–20 m). When validated against annual observations, approximately 80 % of sites showed skillful predictions regardless of depth. Historical trend analysis (1960–2019) accurately reproduced known groundwater depletion regions such as the U.S. High Plains, Arabian Peninsula, and Indo-Gangetic Plain, while also identifying rising water table depths in northern latitudes and Arctic regions potentially linked to climate-driven recharge changes. Future scenario-based simulations suggest rising water table depths for most continents in the next century, with Europe being a notable exception. However, known regions of groundwater depletion are expected to persist. Regions of reduced reliability are mapped, and quality assurance flags are provided to guide the appropriate use and interpretation of the results. The resulting data set offers high-resolution information to assess groundwater dynamics for the past and future, supporting improved global water resource management and climate impact assessments.
- Preprint
(50914 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2026-804', Anonymous Referee #1, 07 Apr 2026
-
AC1: 'Reply on RC1', Barry van Jaarsveld, 25 Apr 2026
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2026/egusphere-2026-804/egusphere-2026-804-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Barry van Jaarsveld, 25 Apr 2026
-
RC2: 'Comment on egusphere-2026-804', Anonymous Referee #2, 17 May 2026
The manuscript describes further development of a global groundwater model (GLOBGM v1.1). This updated model incorporates key advances on its predecessor (GLOBGM v1.0, Verhaik et al., 2024) that include model calibration and downscaling of groundwater recharge. The manuscript is well structured, written and presented.
Major comments:
As outlined by the authors, the updated model marks an advance over its predecessor. However, the more pertinent and central question, especially given the authors’ arguments in the Introduction, Discussion and Conclusions, is whether this advance marks a progression such that the updated, calibrated model provides meaningful simulations of observed changes in groundwater storage over the historical period (1960–2014) and thus projected changes in groundwater storage under socio-economic scenarios (i.e. 2015–2100: SSP1-RCP2.6, SSP3-RCP7.0, SSP5-RCP8.5). What is meant by “meaningful” may well depend upon one’s perspective but it would broadly be expected to demonstrate skill in representing historical observations. Objectively, the authors rely on the application of the non-parametric form of the Kling-Gupta Efficiency (KGE-NP). Beyond reference to the threshold value of -0.41 cited from Knoben et al. (2019), the authors need to make a stronger case as to why this threshold is appropriate, together with an evaluation of the implications of the uncertainty in defining this threshold. Indeed, greater explicit quantitative discussion is needed in section 4 of the consequences of the observed uncertainty in model simulations indicated by the CDFs and their (limited) spatial distribution in correlations in Figure 5.
Given the hyper-resolution (~1 km) scale of the model, I can understand the authors’ necessary focus on reconciling the GLOBGM v1.1 to measurements of groundwater head from piezometric observations. I am surprised, however, not to see an effort to reconcile the latest model to evidence of changes in terrestrial water storage, albeit at lower resolution (aggregating outcomes to larger scales), from satellite gravimetry (GRACE/GRACE-FO).
It is well accepted that what the authors seek to achieve (i.e. a global hyper-resolution groundwater model) is hugely challenging and of very considerable importance. It is also heartening to see groundwater specialists leading this effort. The question is whether GLOBGMv1.1 marks an important technical advance on previous global groundwater models or whether the model has now reached a stage where it can be applied to simulate meaningfully the impact of human interventions and climate change; the current text attests the latter but current outcomes suggest the former.
Minor comments:
- There appears to be imprecision in mapped features in Figures 1 and 4 that presumably derive from the databases used to generate these. In Figure 1 for example, aquifers throughout Bangladesh are depicted as confined whereas, in really, confining conditions are restricted to areas overlain by Pleistocene terrace deposits (Shamsudduha et al., 2022). In the same figure in Canada, mapping of confined aquifers (overlain thick glacial tills, especially in the prairies) are absent. In Figure 4, permafrost regions – certainly for Canada - are of much greater extent than depicted.
- To better understand challenges to model performance, might the authors present (e.g. Supplementary Material) the employed spatio-temporal distribution in observations of groundwater head over from 1960 to 2019
- In Figure 2b, it is curious to see the exclusion of the United Kingdom from the analysis of water table depth bias. As these data are publicly available, do these not feature in the dataset of 34 800 observations wells from IGRAC?
References
Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical Note: Inherent Benchmark or Not? Comparing Nash–Sutcliffe and Kling–Gupta Efficiency Scores, Hydrology and Earth System Sciences, 23, 4323–4331, https://doi.org/10.5194/hess-23-4323-2019, 2019.
Shamsudduha, M., Taylor, R.G., Haq, M.I., Nowreen, S., Zahid, A. and Ahmed, K.M., 2022. The Bengal Water Machine: quantification of freshwater capture in Bangladesh. Science, Vol. 377 (6612), 1315-1319.
Verkaik, J., Sutanudjaja, E. H., Oude Essink, G. H. P., Lin, H. X., and Bierkens, M. F. P.: GLOBGM v1.0: A Parallel Implementation of a 30 Arcsec PCR-GLOBWB-MODFLOW Global-Scale Groundwater Model, Geoscientific Model Development, 17, 275–300, https://doi.org/10.5194/gmd-17-275-2024, 2024.
Citation: https://doi.org/10.5194/egusphere-2026-804-RC2
Data sets
globgm-cmip6-monthly Barry van Jaarsveld, Niko Wanders, Nicole Gyakowah Otoo, Edwin H. Sutanudjaja, Jarno Verkaik, Daniel Zamrsky, and Marc F. P. Bierkens https://doi.org/10.24416/UU01-1BXLPD
historical-reference-gswp3-w5e5 Barry van Jaarsveld, Niko Wanders, Nicole Gyakowah Otoo, Edwin H. Sutanudjaja, Jarno Verkaik, Daniel Zamrsky, and Marc F. P. Bierkens https://doi.org/10.24416/UU01-AKSHOX
globgm-cmip6-annual Barry van Jaarsveld, Niko Wanders, Nicole Gyakowah Otoo, Edwin H. Sutanudjaja, Jarno Verkaik, Daniel Zamrsky, and Marc F. P. Bierkens https://doi.org/10.24416/UU01-V6B9YS
globgm-cmip6-average Barry van Jaarsveld, Niko Wanders, Nicole Gyakowah Otoo, Edwin H. Sutanudjaja, Jarno Verkaik, Daniel Zamrsky, and Marc F. P. Bierkens https://doi.org/10.24416/UU01-SLRFI7
globgm-cmip6-quality Barry van Jaarsveld, Niko Wanders, Nicole Gyakowah Otoo, Edwin H. Sutanudjaja, Jarno Verkaik, Daniel Zamrsky, and Marc F. P. Bierkens https://doi.org/10.24416/UU01-16EJ3Y
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 877 | 326 | 64 | 1,267 | 43 | 54 |
- HTML: 877
- PDF: 326
- XML: 64
- Total: 1,267
- BibTeX: 43
- EndNote: 54
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
This study reports the results of a global groundwater model (GLOBGM v1.1) run at 1 km resolution under 3 CMIP6 scenarios and 5 climate models. Van Jaarsveld et al predict that groundwater levels will rise, on average, on most continents but that regions of groundwater depletion will persist. The study is ambitious in scope, and I applaud the authors for an attempt to make climate change predictions for an important component of the earth system. Unfortunately, the study suffers from several issues in methodology, validation, and interpretation, and because of this I believe the results are not reliable. Given the potential for the results to inform groundwater management policy, I think it would be inappropriate and potentially harmful to publish the manuscript. I unfortunately must recommend rejection.
My primary concern is the accuracy of the model and how it is evaluated. Overall, at approximately one third of wells the correlation to observations is negative: the model predicts the wrong direction of change. In my opinion, this is not good enough to make predictions about future trends. Based on Figure 5, panel a, the regions that show poor performance do not line up particularly well with the regions in Figure 4 where "GLOBGM has been shown to provide less accurate results". It seems the poor performance is primarily in places with high GW abstraction. Also, at 10%-20% of the wells the bias ratio is less than 0.1, meaning the simulated water table depth is more than 10 times too low? Again, this seems quite poor. Given that the model struggles to predict groundwater levels in regions where groundwater data exist and the subsurface is relatively well-understood, I am skeptical of the predictions for the rest of the world.
I looked at some previous publications related to this model, and similar concerns regarding accuracy have been raised (https://doi.org/10.5194/gmd-2022-226-RC1 , https://doi.org/10.5194/egusphere-2024-1025-RC3). In those cases the authors argued that the performance cannot be expected to be high, given that the model was not calibrated. They argued that the model is still valuable because "The philosophy of these models is that they try to capture the right processes and do not rely on calibration to correct for errors in process representation, parameterization and/or meteorological forcing data." Now the authors present a model that is calibrated, and wherein at least one of the inputs (groundwater recharge) in statistically downscaled to match observations. In addition, the authors use a machine learning model to adjust the predicted groundwater levels to match observations. So now the model is not conserving mass, and nevertheless the performance remains unsatisfying. I think it is time to either a) overhaul this model to try to achieve better global performance, b) focus only regions where the model matches historical observation, or c) return to a coarse-resolution model that might simulate regional dynamics without claiming 1-km resolution.
The use of a threshold of -0.41 was proposed by Knoben et al. (2019) for the KGE, but it is not valid for the KGE-NP. The threshold should be 0. Suppose the GW head varies around 50 m with a standard deviation of 5 m. For the mean of observations, I get a KGE-NP of -0.0008. This occurs because the Alpha component goes to 1 if distribution is close to symmetrical and not close to 0. I encourage the authors to verify this with their observational data.
The performance of the groundwater recharge downscaling (Section 2.1.2) is not reported. Was any cross-validation attempted? Moeck's (2020) data are not uniformly globally distributed. If you remove the Australian data from the training data, for example, can you predict those recharge rates with the regression model? In addition, limiting GWR_corrected to less than or equal to precipitation is not well-justified. In most cases this is probably a quite liberal constraint but in agricultural areas return flows can lead to recharge that exceeds precipitation. The choice of a multiplicative correction factor is also not justified.
The Machine Learing Bias Correction (2.4.3) should be also cross-validated regionally. The appendix indicates that the R-squared value is 0.6, which is already not very convincing as far as machine learning algorithms go. Can the model predict the bias for regions not included in training data? This is what would be required to apply this bias correction globally. Also, the labels in Figure A1 (a) are not defined.
My second major concern is the interpretation and discussion of results.
I would have expected to see more specific discussion of the future trends in different regions. What will happen to the major agricultural regions? Grouping by continents is not very informative. The fact that groundwater will rise in the Andes does not help northern Colombia and Central America, where groundwater levels could fall by ~50 m by the end of the century! Similarly for rising groundwater in Tibet and falling levels in the Ganges-Bramhmaputra. Also note that the majority of the rise seems to be happening in the Mountain regions where the authors say the model is less reliable. And what is causing the trend reversal in northwest North America?
A more robust discussion of uncertainty is needed, and the results should be compared to the literature. Given (a) the performance of the model in data-rich regions, (b) uncertain parameterization of the model in data-scarce regions, and (c) the uncertainty of the GCMs (you have included only five, and only one variant from each), how confident can we be in the projected trends? How do they compare to previous studies? For example, across the United States, Condon et al. (2020) predict deepening water tables across the US under warming. Meixner et al (2016) predicted decreased recharge over the southwestern US, little change of the northwest, and also that mountain recharge would decrease. Other regional assessments exist and should be compared as well.
I have some further minor comments and suggestions for the authors should they consider revising and resubmitting, in no particular order:
The authors state that forcing for the model are abstraction, recharge, and discharge (L227). Is this correct? If so, what is the model doing other than accounting for inputs and outputs? Why does the dynamic drainage elevation (L104) matter if groundwater discharge is already prescribed?
Equations 12-15: variables are not consistently labelled. What is alpha in equation 13? Is this different from the alpha in eq 10? Is WTDsim different from Wsim?
There is no confining layer over Canada at all? Surely this could be improved.
Two of the GCMs (IPSL-CM6A-LR and UKESM1-0-LL) in your ensemble are 'hot' - they have and Equilibrium Climate Sensitivity (ECS) and Transient Climate Response (TCR) above the assessed 'very likely' ranges estimated by the IPCC. This means that their projected warming is probably too pessimistic for a given scenario. The recommendation is, if the warming trajectory is important (which I think it is here) to use only models that lie within the likely range (Hausfather, 2022). Consider reporting the ensemble mean just for the three models that do lie within the 'likely' range, and including the others in an appendix.
The model struggles to simulate well-based GW levels and trends, particularly in places where GW use is high. Maybe that is not surprising, given that your water use data (Lange et al, 2021) is originally at 0.5 degree resolution, and the recharge is also based on downscaling coarser-resolution data. Perhaps simulating well-based trends is too difficult a task at the global scale. Does the model accurately simulate regional trends? I suggest the authors perform a simple experiment: aggregate the well-based data at increasingly coarse resolutions (say, 1 km, 2 km, 10 km, 50 km, and 100 km) and do the same for the model data, and then calculate your performance metrics at each resolution. I would expect you'll find performance will improve with coarser resolutions. Then focus on reporting results at the finest resolution that provides an adequate match to observations.
Eq. 17: Is this missing a month index?
Figure 6a: The color scale for this figure should be the same as for the uncorrected WTD bias (Figure 2).
Figure 7: It's unclear why these regions were chosen for insets. In any case they provide only about 2X magnification. I suggest removing the insets.
Discussion: The discussion of modelling choices, calibration, and advances is somewhat incongruous with ESD. I would expect this in a journal like Geoscientific Model Development but I think for ESD the discussion should be more general, and focus more on the implications.
L424 - The beginning of this paragraph seems to be missing. "[Major rivers], such as..."?
L435: Are these rising water tables in the north robust? Do they match observations? The authors state that 'This could conceivably be due to climate change enhancing precipitation and groundwater recharge dynamics'. It seems to me there is no need to speculate here - are those two processes actually occurring in the model?
Line 245: You could report the number of CPU-hours also. In addition, it would be responsible to report the CO2 footprint of these computations. This can be estimated as:
(node-hours) * (12 nodes/ # nodes at Snellius) * (power usage at Snellius) * (carbon intensity of Dutch grid)
Based on a quick search I get:
551 h * (12/1557) * (1200 kW) * 235 gCO2e/kWh = 1200 kg CO2e, about equal to a round trip flight ticket from Amsterdam to Beijing.
3.2.2 - Are these values for the ML-corrected data or the raw model output?
If the model purports to include anthropogenic influences on the water table, why are regions with anthropogenic influence excluded from calibration?
L451: I would not call this 'disagreement' - rather, divergence in scenarios, or you could say the direction of change is scenario-dependent.
L288 'The performance of the model' - which model? GLOBGM or the XGBoost model?
Table 4: it would be more useful to report the absolute bias, so negative and positive biases do not cancel each other out. Did the calibration reduce the mean absolute bias?
Section 3.1.1. This section seems more like methods than results.
L390 and L473: The shallower depths show better correlations for monthly data but not for yearly. That suggests that at shallow depths you can better capture water table variation related to precipitation seasonality, but not long-term variation and change. Seasonal variation is probably only strong for shallow wells, so it makes sense to me that the difference in performance disappaears in the yearly time series. I therefore disagree that "seasonal dynamics are more challenging to capture than inter-annual trends".
Figure 10: is the y-axis in m? What does the uncertainty represent - standard deviation of the five GCMS? The GSWP3-W5E5 simulation should also be plotted on these graphs for comparison.
References
Condon, L. E., Atchley, A. L., and Maxwell, R. M.: Evapotranspiration depletes groundwater under warming over the contiguous United States, Nat Commun, 11, 873, https://doi.org/10.1038/s41467-020-14688-0, 2020.
Hausfather, Z., Marvel, K., Schmidt, G. A., Nielsen-Gammon, J. W., and Zelinka, M.: Climate simulations: recognize the ‘hot model’ problem, Nature, 605, 26–29, https://doi.org/10.1038/d41586-022-01192-2, 2022.
Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores, Catchment hydrology/Modelling approaches, https://doi.org/10.5194/hess-2019-327, 2019.
Meixner, T., Manning, A. H., Stonestrom, D. A., Allen, D. M., Ajami, H., Blasch, K. W., Brookfield, A. E., Castro, C. L., Clark, J. F., Gochis, D. J., Flint, A. L., Neff, K. L., Niraula, R., Rodell, M., Scanlon, B. R., Singha, K., and Walvoord, M. A.: Implications of projected climate change for groundwater recharge in the western United States, Journal of Hydrology, 534, 124–138, https://doi.org/10.1016/j.jhydrol.2015.12.027, 2016.