Performance assessment of geospatial and time series features on groundwater level forecasting with deep learning
Abstract. Groundwater level (GWL) forecasting with machine learning has been widely studied due to its generally accurate results and little input data requirements. Furthermore, machine learning models for this purpose are set up and trained in a short time when compared to the effort required for process-based numerical models. Despite the high performance of models obtained at specific locations, applying the same model architecture to multiple sites across a regional area might lead to contrasting accuracies. Likely causalities of this discrepancy in model performance have been barely examined in previous studies. Here, we investigate the link between model performance and the effects of geospatial site and time series features. Using precipitation (P) and temperature (T) as predictors, we model groundwater levels at approximately 500 observation wells in Lower Saxony, Germany, applying a 1-D convolutional neural network (CNN) with a fixed architecture and hyperparameters tuned for each time series individually. The GWL observations range from 21 to 71 years, leading to a variable test and training dataset time range. The performances are evaluated against relevant geospatial characteristics (e.g. landcover, distance to water works, and leaf area index) and time series features (e.g. autocorrelation, flat spots, and number of peaks) using Pearson correlation coefficients. We found that model performance is negatively influenced at sites near waterworks and densely vegetated areas. Longer subsequences of GWL measurements above or below the mean negatively impact the metrics and might be associated with anthropogenic influence or wetter and drier periods. Besides, complex GWL time series exhibit better metrics, possibly due to a closer link with precipitation dynamics. As deep learning models are known to be black-box models missing the physical processes understanding, our work shows new insights into the degree of affectation that external physical factors have on the input-output relation of a GWL forecasting model.