Global hyper-resolution modeling of historical and future groundwater dynamics
Abstract. The sustainable management of global groundwater resources is a key societal challenge and is central to the Sustainable Development Goals. The localized dynamics of groundwater abstraction, topography, and surface-water interactions, as well as the sensitivity of groundwater-dependent ecosystems, call for high-resolution information to support effective groundwater management. At the same time, groundwater observations are very limited and concentrated in a few regions, rendering large parts of groundwater resources ungauged. To address limited observations and coarse global models, we applied the global groundwater model GLOBGM (v1.1) to simulate past and future groundwater heads and water table depth at 30 arc-seconds (~1 km) on a monthly time step. Model calibration improved mean bias in water table depth predictions from -4.8 m to 3.6 m compared to GLOBGM v1.0, with depth-weighted bias reduced from 34.2 m to 32.5 m across 34 800 observation wells. Groundwater dynamics are simulated for a historical reference period (1960–2019) to support model evaluation and attribution of observed impacts to climate variability and change. Baselines (1960–2014) and three combined socioeconomic-climate scenarios (2015–2100; SSP1-RCP2.6, SSP3-RCP7.0, SSP5-RCP8.5) are simulated with five GCMs, supporting detection and impact assessment of future change. Validation against monthly observations yielded skillful predictions (KGE-NP > -0.41) in approximately 75 % of deep wells (>60 m) and 90 % of shallow to intermediate wells (0–20 m). When validated against annual observations, approximately 80 % of sites showed skillful predictions regardless of depth. Historical trend analysis (1960–2019) accurately reproduced known groundwater depletion regions such as the U.S. High Plains, Arabian Peninsula, and Indo-Gangetic Plain, while also identifying rising water table depths in northern latitudes and Arctic regions potentially linked to climate-driven recharge changes. Future scenario-based simulations suggest rising water table depths for most continents in the next century, with Europe being a notable exception. However, known regions of groundwater depletion are expected to persist. Regions of reduced reliability are mapped, and quality assurance flags are provided to guide the appropriate use and interpretation of the results. The resulting data set offers high-resolution information to assess groundwater dynamics for the past and future, supporting improved global water resource management and climate impact assessments.
This study reports the results of a global groundwater model (GLOBGM v1.1) run at 1 km resolution under 3 CMIP6 scenarios and 5 climate models. Van Jaarsveld et al predict that groundwater levels will rise, on average, on most continents but that regions of groundwater depletion will persist. The study is ambitious in scope, and I applaud the authors for an attempt to make climate change predictions for an important component of the earth system. Unfortunately, the study suffers from several issues in methodology, validation, and interpretation, and because of this I believe the results are not reliable. Given the potential for the results to inform groundwater management policy, I think it would be inappropriate and potentially harmful to publish the manuscript. I unfortunately must recommend rejection.
My primary concern is the accuracy of the model and how it is evaluated. Overall, at approximately one third of wells the correlation to observations is negative: the model predicts the wrong direction of change. In my opinion, this is not good enough to make predictions about future trends. Based on Figure 5, panel a, the regions that show poor performance do not line up particularly well with the regions in Figure 4 where "GLOBGM has been shown to provide less accurate results". It seems the poor performance is primarily in places with high GW abstraction. Also, at 10%-20% of the wells the bias ratio is less than 0.1, meaning the simulated water table depth is more than 10 times too low? Again, this seems quite poor. Given that the model struggles to predict groundwater levels in regions where groundwater data exist and the subsurface is relatively well-understood, I am skeptical of the predictions for the rest of the world.
I looked at some previous publications related to this model, and similar concerns regarding accuracy have been raised (https://doi.org/10.5194/gmd-2022-226-RC1 , https://doi.org/10.5194/egusphere-2024-1025-RC3). In those cases the authors argued that the performance cannot be expected to be high, given that the model was not calibrated. They argued that the model is still valuable because "The philosophy of these models is that they try to capture the right processes and do not rely on calibration to correct for errors in process representation, parameterization and/or meteorological forcing data." Now the authors present a model that is calibrated, and wherein at least one of the inputs (groundwater recharge) in statistically downscaled to match observations. In addition, the authors use a machine learning model to adjust the predicted groundwater levels to match observations. So now the model is not conserving mass, and nevertheless the performance remains unsatisfying. I think it is time to either a) overhaul this model to try to achieve better global performance, b) focus only regions where the model matches historical observation, or c) return to a coarse-resolution model that might simulate regional dynamics without claiming 1-km resolution.
The use of a threshold of -0.41 was proposed by Knoben et al. (2019) for the KGE, but it is not valid for the KGE-NP. The threshold should be 0. Suppose the GW head varies around 50 m with a standard deviation of 5 m. For the mean of observations, I get a KGE-NP of -0.0008. This occurs because the Alpha component goes to 1 if distribution is close to symmetrical and not close to 0. I encourage the authors to verify this with their observational data.
The performance of the groundwater recharge downscaling (Section 2.1.2) is not reported. Was any cross-validation attempted? Moeck's (2020) data are not uniformly globally distributed. If you remove the Australian data from the training data, for example, can you predict those recharge rates with the regression model? In addition, limiting GWR_corrected to less than or equal to precipitation is not well-justified. In most cases this is probably a quite liberal constraint but in agricultural areas return flows can lead to recharge that exceeds precipitation. The choice of a multiplicative correction factor is also not justified.
The Machine Learing Bias Correction (2.4.3) should be also cross-validated regionally. The appendix indicates that the R-squared value is 0.6, which is already not very convincing as far as machine learning algorithms go. Can the model predict the bias for regions not included in training data? This is what would be required to apply this bias correction globally. Also, the labels in Figure A1 (a) are not defined.
My second major concern is the interpretation and discussion of results.
I would have expected to see more specific discussion of the future trends in different regions. What will happen to the major agricultural regions? Grouping by continents is not very informative. The fact that groundwater will rise in the Andes does not help northern Colombia and Central America, where groundwater levels could fall by ~50 m by the end of the century! Similarly for rising groundwater in Tibet and falling levels in the Ganges-Bramhmaputra. Also note that the majority of the rise seems to be happening in the Mountain regions where the authors say the model is less reliable. And what is causing the trend reversal in northwest North America?
A more robust discussion of uncertainty is needed, and the results should be compared to the literature. Given (a) the performance of the model in data-rich regions, (b) uncertain parameterization of the model in data-scarce regions, and (c) the uncertainty of the GCMs (you have included only five, and only one variant from each), how confident can we be in the projected trends? How do they compare to previous studies? For example, across the United States, Condon et al. (2020) predict deepening water tables across the US under warming. Meixner et al (2016) predicted decreased recharge over the southwestern US, little change of the northwest, and also that mountain recharge would decrease. Other regional assessments exist and should be compared as well.
I have some further minor comments and suggestions for the authors should they consider revising and resubmitting, in no particular order:
The authors state that forcing for the model are abstraction, recharge, and discharge (L227). Is this correct? If so, what is the model doing other than accounting for inputs and outputs? Why does the dynamic drainage elevation (L104) matter if groundwater discharge is already prescribed?
Equations 12-15: variables are not consistently labelled. What is alpha in equation 13? Is this different from the alpha in eq 10? Is WTDsim different from Wsim?
There is no confining layer over Canada at all? Surely this could be improved.
Two of the GCMs (IPSL-CM6A-LR and UKESM1-0-LL) in your ensemble are 'hot' - they have and Equilibrium Climate Sensitivity (ECS) and Transient Climate Response (TCR) above the assessed 'very likely' ranges estimated by the IPCC. This means that their projected warming is probably too pessimistic for a given scenario. The recommendation is, if the warming trajectory is important (which I think it is here) to use only models that lie within the likely range (Hausfather, 2022). Consider reporting the ensemble mean just for the three models that do lie within the 'likely' range, and including the others in an appendix.
The model struggles to simulate well-based GW levels and trends, particularly in places where GW use is high. Maybe that is not surprising, given that your water use data (Lange et al, 2021) is originally at 0.5 degree resolution, and the recharge is also based on downscaling coarser-resolution data. Perhaps simulating well-based trends is too difficult a task at the global scale. Does the model accurately simulate regional trends? I suggest the authors perform a simple experiment: aggregate the well-based data at increasingly coarse resolutions (say, 1 km, 2 km, 10 km, 50 km, and 100 km) and do the same for the model data, and then calculate your performance metrics at each resolution. I would expect you'll find performance will improve with coarser resolutions. Then focus on reporting results at the finest resolution that provides an adequate match to observations.
Eq. 17: Is this missing a month index?
Figure 6a: The color scale for this figure should be the same as for the uncorrected WTD bias (Figure 2).
Figure 7: It's unclear why these regions were chosen for insets. In any case they provide only about 2X magnification. I suggest removing the insets.
Discussion: The discussion of modelling choices, calibration, and advances is somewhat incongruous with ESD. I would expect this in a journal like Geoscientific Model Development but I think for ESD the discussion should be more general, and focus more on the implications.
L424 - The beginning of this paragraph seems to be missing. "[Major rivers], such as..."?
L435: Are these rising water tables in the north robust? Do they match observations? The authors state that 'This could conceivably be due to climate change enhancing precipitation and groundwater recharge dynamics'. It seems to me there is no need to speculate here - are those two processes actually occurring in the model?
Line 245: You could report the number of CPU-hours also. In addition, it would be responsible to report the CO2 footprint of these computations. This can be estimated as:
(node-hours) * (12 nodes/ # nodes at Snellius) * (power usage at Snellius) * (carbon intensity of Dutch grid)
Based on a quick search I get:
551 h * (12/1557) * (1200 kW) * 235 gCO2e/kWh = 1200 kg CO2e, about equal to a round trip flight ticket from Amsterdam to Beijing.
3.2.2 - Are these values for the ML-corrected data or the raw model output?
If the model purports to include anthropogenic influences on the water table, why are regions with anthropogenic influence excluded from calibration?
L451: I would not call this 'disagreement' - rather, divergence in scenarios, or you could say the direction of change is scenario-dependent.
L288 'The performance of the model' - which model? GLOBGM or the XGBoost model?
Table 4: it would be more useful to report the absolute bias, so negative and positive biases do not cancel each other out. Did the calibration reduce the mean absolute bias?
Section 3.1.1. This section seems more like methods than results.
L390 and L473: The shallower depths show better correlations for monthly data but not for yearly. That suggests that at shallow depths you can better capture water table variation related to precipitation seasonality, but not long-term variation and change. Seasonal variation is probably only strong for shallow wells, so it makes sense to me that the difference in performance disappaears in the yearly time series. I therefore disagree that "seasonal dynamics are more challenging to capture than inter-annual trends".
Figure 10: is the y-axis in m? What does the uncertainty represent - standard deviation of the five GCMS? The GSWP3-W5E5 simulation should also be plotted on these graphs for comparison.
References
Condon, L. E., Atchley, A. L., and Maxwell, R. M.: Evapotranspiration depletes groundwater under warming over the contiguous United States, Nat Commun, 11, 873, https://doi.org/10.1038/s41467-020-14688-0, 2020.
Hausfather, Z., Marvel, K., Schmidt, G. A., Nielsen-Gammon, J. W., and Zelinka, M.: Climate simulations: recognize the ‘hot model’ problem, Nature, 605, 26–29, https://doi.org/10.1038/d41586-022-01192-2, 2022.
Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores, Catchment hydrology/Modelling approaches, https://doi.org/10.5194/hess-2019-327, 2019.
Meixner, T., Manning, A. H., Stonestrom, D. A., Allen, D. M., Ajami, H., Blasch, K. W., Brookfield, A. E., Castro, C. L., Clark, J. F., Gochis, D. J., Flint, A. L., Neff, K. L., Niraula, R., Rodell, M., Scanlon, B. R., Singha, K., and Walvoord, M. A.: Implications of projected climate change for groundwater recharge in the western United States, Journal of Hydrology, 534, 124–138, https://doi.org/10.1016/j.jhydrol.2015.12.027, 2016.