the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An ensemble groundwater prediction (EGP) system to forecast groundwater levels in alluvial aquifers in Switzerland
Abstract. Groundwater is a key source of freshwater for drinking water supply and agricultural irrigation on a global scale. Groundwater in Switzerland (and beyond) is traditionally regarded as a reliable source of freshwater. Recent extreme drought events (i.e., in 2018, 2020, and 2022) have shown, however, that groundwater does respond to these events and can cause problems in water supply and groundwater availability. With hydrological extremes becoming more frequent, there is a growing need for early warning systems and improved forecasting. This study develops and tests a scalable ensemble groundwater prediction (EGP) system with a 32-day lead time. The system combines extended-range precipitation and temperature forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) with the lumped-parameter groundwater model Pastas. Forecasts were evaluated at six monitoring wells across Switzerland, representing diverse hydrogeological settings, and compared against naive persistence and climatology benchmarks. Results indicate that the EGP system produces skillful forecasts up to one month ahead, with Spearman correlations exceeding 0.77 for most wells. However, the required model–data complexity varies: in long-memory aquifers, forecasts driven by recent meteorology and climatology are sufficient, while in short-memory systems, meteorological forecast data adds clear value. Forecast skill in mountainous regions (e.g., Davos) remains limited due to difficulties in predicting local meteorology. These findings highlight both the potential and the limitations of short-term groundwater forecasting. Future work should explore larger lead times, particularly in slow-responding aquifers, and investigate methods to improve forecasts in alpine environments.
- Preprint
(4081 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 29 Dec 2025)
- RC1: 'Comment on egusphere-2025-4653', Anonymous Referee #1, 01 Dec 2025 reply
-
RC2: 'Comment on egusphere-2025-4653', Anonymous Referee #2, 15 Dec 2025
reply
This manuscript describes an ensemble groundwater forecasting method that is applied to produce daily groundwater level forecasts to lead times of 32 days. The method uses a conceptual model to simulate groundwater levels in response to climate forcing and generates forecasts groundwater level forecasts using downscaled, but not bias-corrected, extended-range forecast forcing from ECMWF. A simple auto-regressive error correction model is applied to update forecasts using recent groundwater level observations. Forecast uncertainties are derived from two sources (i) ensemble forecast forcing and (ii) an ensemble of parameters for the conceptual model. The method is applied to six groundwater wells with a range of characteristics that allow the exploration of sources of forecast skill, and the performance of forecasts is assessed using several standard forecast verification metrics. This is a nice study, and one of only a handful of investigations into forecasting groundwater. I have a couple of more major concerns and some minor suggestions.
Section 2.3 introduces a residual error model (equation 1) that has two terms an autoregressive term and a white noise term and the authors argue this introduces only one additional parameter. There appears to be a parameter that has not been considered here, i.e. the variance of the white noise term, which is unlikely to be equal to 1 given the range of predictions. This white noise term appears to be subsequently neglected but is likely to form an important source of predictive uncertainty of the combined groundwater level and error correction models.
Section 2.5 then introduces a 3-step process for ensemble groundwater prediction. The methods third step (line 148) appears to be generating a parametric estimate of the forecast error from 5100 individual ensemble members, using the autoregressive parameter from the noise model, and using the parametric estimate of forecast error to obtain prediction intervals. I think there are two improvements that that could be made to this third step that may improve the performance of the verification metrics. Firstly, the prediction intervals could be computed directly from the ensemble members rather than from a parametric estimate of the forecast error. This would circumvent the need to assume the ensemble follows a normal distribution and therefore potentially address the limitation identified in the discussion related to the heteroscedasticity of forecast errors. Secondly, the estimation of the forecast error uses the autoregressive parameter derived from the residual error model introduced in Section 2.3. The forecast errors are unlikely to have the same autocorrelation structure as the simulation residuals of the conceptual groundwater model because the characteristics of the rainfall forcing, observed rainfall in the case of the residual error model and forecast rainfall in the case of the forecast errors, will be different. Rather than seeking to describe the forecast errors using a parametric distribution, this third step could simply add noise reflecting the hydrological model simulation errors following the autoregressive updating - which is likely to increase the spread of the forecast ensemble and therefore address the issue of underconfident forecasts that appear to exist for many wells (Figure 7).Practical significance of the work... I would like to see a bit more context on groundwater dynamics and management in Switzerland added that would highlight the significance of the work for practical applications. Personally, I think of groundwater levels as a variable that changes relatively slowly and therefore a forecast with 30 days lead time is unlikely to have practical benefit for groundwater management decision-making. However, it is plausible that some groundwater systems may respond over 30-day time horizons and therefore the forecasts for these periods could potentially be practically useful.
Minor points
Line 25: sentence beginning at end of line - reference style should be Author (Date)
line 63: The groundwater level prediction model is referred to as a 'lumped-parameter' model, which seems an odd characterisation. I wonder whether it could be better described as a conceptual model for groundwater level prediction, as the idea of 'lumped-parameters' suggest to me that a common set of model parameters is used for many locations.
Figure 6: I wonder if there could be a better color scale that would highlight subtleties in forecast skill patterns.Citation: https://doi.org/10.5194/egusphere-2025-4653-RC2 -
RC3: 'Comment on egusphere-2025-4653', Anonymous Referee #3, 18 Dec 2025
reply
The authors present a new approach to forecast groundwater levels in observation wells, based on measurement in the observation wells prior to the forecast and on forecasts of meteo data. They test their approach for up to 32 days into the future for 6 wells in Switzerland with different characteristics. They do a thorough job in assessing the skill of their forecast using historical data and clearly demonstrate that their forecast approach outperforms naive forecasting approaches. The authors also estimate and assess the uncertainty of their forecast. The approach works well for most of the wells considered. I really like the paper and am not aware of a more thorough study on forecasting groundwater levels. But I think the paper can be improved in some places by better explaining the chosen approach. Some of the language and the conclusions can probably be sharpened a bit. I have quite a few small suggestions, but none of them are really major.
Minor comments
1. Is there any reason why the lead time is chosen to be 32 days? It seems a bit of an odd choice (but maybe common in the forecasting community). Why not 28 days (4 weeks), 30 days (for those of us liking the decimal system). But 32? It is 4 weeks and 4 days. And it is a power of 2. But other than that?
2. The first verification of the forecast is the Spearman coefficient. Why Spearman and not Pearson, which I would find a bit more logical and convincing choice. On line 204 it is not indicated what is a good number. On line 10 in the abstract the authors report 0.77, so apparently they think that is a good number. Please explain (maybe with a few references).
3. Lines 11-12. Make explicit in the Abstract what the difference is between “driven by recent meteorology and climatology” and “meteorological forecast data”, as this is a main conclusion, but not understandable from these sentences (it is understandable when reading the entire paper).
4. Fig. 1. I find Fig. 1 somewhat confusing. Some ideas: Put calibration first and forecasting below that. Add the word “calibration” above the first box and the word “forecasting” to the second box. The word “orange” in the caption should probably be “green”
5. Line 112-113. “In forecasting mode … in a post-processing step”. Please explain what is done.
6. An AR1 noise model is applied in an attempt to transform the residuals to uncorrelated noise. I assume it reduced the autocorrelation, but probably didn’t eliminate it (especially since daily data was used). Please explain that either the autocorrelation was removed or that the autocorrelation was just reduced but it was still assumed the alpha parameter could be used in forecasts.
7. Line 136. Why 51? Seems an odd number as well. I guess it is 3 times 17, but other than that? Why not just 50?
8. Line 150. Isn’t the “forecast error” the “forecast variance”? Furthermore, on this line “h” is the time step number, while in Eq. 2 “h” is the lead time. Also, Eq. 3 is only for constant time steps while Eq. 2 is for variable time step. Please fix.
9. Line 214 and further. Please explain what the “spread-to-error” ratio is, as it is not commonly used in groundwater hydrology. Do you compute sigma for each ensemble and then take the mean? Or something else? Furthermore, shouldn’t the standard deviation of the forecast be compared to the standard deviation of the data? That gives an idea how much the method has added.
10. Lines 289-290. As the majority of the prediction uncertainty comes from the meteorological forecast rather than the parameter uncertainty, is your forecast less skillful when not using the uncertainty in the parameters at all?
11. Figure 3. There are no lines (or measurement points) outside the 95% ensemble forecast. Does that make sense? Or is this not the 95% interval? Please clarify.
12. Line 315. “As errors in the meteo input accumulate” and again the word “error” on line 318 and line 325. Why are there errors in the meteo data? You mean variations?
13. Lines 394-400. I find this less clear than the conclusion in the previous paragraph. The first sentence is based on what? Figure 6? Really only Lamone, Gossau and Trub in Fig. 5. And then in the last sentence “in winter” so now suddenly back to winter? Please clarify this discussion.
Editorial comments
1. On line 6 the authors talk about 32-day lead time, and on line 10 about one-month ahead. Stick with the 32 days as that is investigated.
2. Remove the last sentence of the Abstract. Doesn’t add anything.
3. Line 25. “Sherrer et al.” should not be inside the parentheses.
4. Line 29. “discharge” -> “recharge”?
5. Line 70. Remove. Doesn’t add anything.
6. Line 92. “model” -> “Pastas model”. Line 94 “applied model” -> “Pastas model”. Line 97. “model” -> “Pastas model”
7. Line 99-100. “when subtracting the measurements from the simulation the residuals are obtained”. Isn’t that the other way around?
8. Line 125. “These standard errors”. I assume the entire covariance matrix was used?
9. Line 158. Please give reference for “law of total variance”.
10. Line 163-168. “The performance … approaches”. This whole discussion should be moved to the “Discussion” section.
11. Line 229. Space missing at end of sentence.
12. Line 236. Table 1 is not in the appendix, but on the next page. Again on line 253.
13. Table 1. DTW, isn’t that variable? Is this the mean? Also, please provide units for all quantities in the caption. And finally, what is a snow day? A day that snow falls or a day that there is snow on the ground (which may be more relevant for groundwater)?
14. Line 254. The “yearly” precipitation. (please add “yearly”). Then on line 255 the word “between” appears twice.
15. Table 2. Confusing that halfway there is a shift from MAE to NSE. Can the table be improved?
16. Line 307. “perfect git” -> “perfect forecast”.
17. Fig 4. Fix overlapping numbers on horizontal axis. In caption: The orange line “indicated” -> “indicates”.
18. I think something went wrong with this figure as the label on the vertical axis appears in strange places. Also in caption “increasing” -> “larger” and “declining forecast quality” -> “worse forecast”.
19 Line 351-352. “combination … forecasted” -> “EGP”.
20. Fig. 6. It is unclear what is compared to what in which column, and I think the middle column is never even mentioned.
21. Line 385-386. “for small lead times” but then at end of sentence “for lead times up to 32 days”. Which one is it? Also 2 lines down “did perform” -> “performed”
22. Line 416. “likely explains” -> “likely contributes to”?
23. Line 449. “forecasts with high quality”. That is pretty vague and probably too much. Howbout “skillful forecasts”, which is more precise and also used in the Abstract.
Citation: https://doi.org/10.5194/egusphere-2025-4653-RC3
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 220 | 86 | 21 | 327 | 13 | 11 |
- HTML: 220
- PDF: 86
- XML: 21
- Total: 327
- BibTeX: 13
- EndNote: 11
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors combine the PIRFICT approach to transfer-function noise modelling with ECWMF ensemble forecasts to predict groundwater levels about a month ahead with daily time steps. They compare the results with more naïve prediction methods and show the added value of Ensemble Groundwater Prediction (EGP) over e.g. using climatology for fast reacting systems without snow dynamics. Their results are not that surprising as similar conclusions have been drawn for streamflow already. Nevertheless, the paper provides one of the first attempts of setting up an EGP system and is worth publishing after revisions.
Two more major issues.
Minor issues