the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Strategies for Incorporating Static Features into Global Deep Learning Models
Abstract. Global deep learning (DL) models are increasingly used in hydrology and hydrogeology to model time series data across multiple sites simultaneously. To account for site-specific behavior, static input features are commonly included in these models. Although the method of integration of static features into model architectures can influence performance, this aspect is seldom systematically evaluated. In this study, we systematically compare four strategies for incorporating static features into a global DL model for groundwater level prediction, including approaches commonly used in water science (repetition, concatenation) and two adopted from related disciplines (attention, conditional initialization). The models are evaluated using a large-scale groundwater dataset from Germany, tested under both in-sample (temporal generalization) and out-of-sample (spatiotemporal generalization) settings, and with both environmental and time-series-derived static features.
Our results show that all integration methods perform rather similar in terms of average metrics, though their performance varies across wells and settings. The repetition approach achieves slightly better overall performance but is computationally inefficient due to the redundant replication of static features. Therefore, it may be worthwhile to explore alternative integration strategies that can offer comparable results with lower computational cost. Importantly, the choice of integration method becomes less critical than the quality of the static features themselves. These findings underscore the importance of careful feature selection and provide practical guidance for the design of global deep learning models in hydrologic applications.
- Preprint
(2602 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- CC1: 'Comment on egusphere-2025-4048', Willem Zaadnoordijk, 04 Dec 2025
-
RC1: 'Comment on egusphere-2025-4048', Anonymous Referee #1, 04 Jan 2026
The authors compare four different strategies for incorporating static features into global deep learning models. They show that the repetition method generally achieves the best performance but they are less computationally efficient. They also state that the selection of static features is more important than the choice of integration strategies. The manuscript is generally well-presented. However, I still have the following concerns.
In the Table 1 of Heudorfer et al. (2023), it shows that the time series features were derived from past groundwater level time series until 2011. The training, validation, and test periods for this study are 1991-2007, 2008-2012, 2013-2022, respectively. If the authors directly adopt the time series features from Heudorfer et al. (2023), there might be data leaking issue during validation. Also, while the authors have provided references for the time series features, it is nice to list the time series features in the main text or appendix for readability.
For the conditional model, it is unclear to me how the output of this layer is split and used to initialize the hidden and cell states of the first LSTM layer in the dynamic branch (Line 167-168). Does it mean that the output is directly used as the initial condition of the hidden and cell states? Please clarify.
In Line 176, the authors mentioned that there are 10 model initializations. Does it mean that the authors train 10 global models with different initial conditions?
For the results, the authors that the repetition model performs the best, but they show the results for the 256-neuron model for the repetition model and the 128-neuron model for the other models. It is hard to identify whether the better performance is due to the integration strategy or the more hidden neurons.
Specific comments:
Explain labels/legends in Figures 2 and 3.
Citation: https://doi.org/10.5194/egusphere-2025-4048-RC1 -
RC2: 'Comment on egusphere-2025-4048', Anonymous Referee #2, 06 Jan 2026
This paper explores different strategies for global deep learning models that account for basin and hydrogeological "static" properties and characteristics. These strategies aim to enhance the models' generalization capabilities and overall performance. The authors tested several approaches that differ in how static properties are incorporated into the model and conducted these tests using two types of modeling methods ("in-sample"/training on all available wells, and "out-of-sample"/training 90% of the wells with test on the remaining 10%, test period being equal in both approaches). From the Deep Learning point of view, their study builds on this same technical issue that also emerged from other scientific fields. 4 integration strategies were tested for each modeling approach (in-sample and out-of-sample). The simplest integration strategy (repetition) appeared to perform the best, at the cost of much lower computational efficiency. However, the authors conclude that other strategies, particularly concatenation and conditional initialization of LSTM weights, deserve thorough consideration as they offer a good balance of performance and computational efficiency. The paper addresses an important issue; it summarizes the usefulness and relevance of existing strategies for incorporation of static features in LSTM in the specific case of hydrogeology. It is a very nice study, clearly written and organized. There are a few points that might be addressed or highlighted in the paper before publication in my opinion.
The paper should provide examples of time series (e.g. in the form of a panel with 4 or 5 of them) without requiring the reader to download them. I think it is important to have a straightforward understanding of the context of hydrological modeling by knowing what ground-truth data looks like. Therefore, it is crucial that a few examples be presented directly in the text. For instance, if most time series consist of almost pure periodic annual variations with constant amplitude through time, expectations regarding the model's performance would not necessarily be the same as for more complex variability. After downloading the time series and briefly examining them, significant differences in statistical properties can be observed (more or less weak trends, very short-term variations, strong amplitude of the water year cyclicity...). Do the authors know if, and to what extent, such differences may play any role in the models' performance: are there some behaviors for which the models systematically perform poorly, or very well?
Although it is beyond the scope of the paper, it seems like a lot of meteorological inputs was used. To what extent are they all "meaningful" for the application? Have previous studies that used this database conducted SHAP analysis or similar methods to determine which features such models learn from most effectively?
I think one or two lines on the concept of "meaningful" static features as it is used here would be needed. Here, "meaningful" stands for "informative" if I am not mistaken; maybe this term would be more appropriate.
It should be said at the beginning and justified why no hyperparameter optimization was conducted: this is a technical context that should be mentioned and explained (even briefly), especially for researchers who intend to use a similar approach.Â
Would one-hot encoding be very different than the repetition approach? This is the simplest way to attach an identifier to the wells, so in the framework of this study it would be interesting to recall this.
Line 149 (LSTM layer size): Why is it 128 when no hyperparameter optimization has been performed? Given that the results of the repetition model were presented for an LSTM layer of size 256, wouldn't it be preferable to present all results with an LSTM size of 256? I am not questioning the relevance of the results presented here, but it is important in my opinion that the presentation of the methodology does not raise any unnecessary questions for researchers interested in developing a similar approach.
Line 106: replace "real" with "actual".Â
Line 177: Section 3.3.2 and 3.3.3 should be merged in a single "In-sample and Out-of-sample" 3.3.2 section.
About the quality of environmental static features:
Line 220: I am not sure I understand this point well, which also seems to be about the heterogeneity of physical characteristics, as discussed in the previous point. Regarding the representativeness of the database for sampling general hydrogeological characteristics properly, I believe this can be addressed with general hydrogeological knowledge. Additionally, the spatial coverage and number of wells appear sufficient for consistent sampling of hydrogeological properties.
Overall, I don't quite understand how one could conclude that poor-quality static features have a greater impact on model performance than the integration strategy. Does this mean that all the used static features were poor quality? Or are these features considered poor quality compared to static features derived from time series, which are all meaningful? It will always be very difficult (not to say almost impossible sometimes) to have all at once high-quality, large-scale, high-resolution hydrogeological characteristics data that precisely accounts for spatial heterogeneity. Are we then reaching a major limitation to improve even more the generalization capabilities of Deep Learning models? In that case, would there be some particularly crucial environmental static features to focus on?Citation: https://doi.org/10.5194/egusphere-2025-4048-RC2
Data sets
Groundwater level time series, meteorological forcings and static feature dataset for 667 wells in Germany Tanja Liesch, Marc Ohmer https://zenodo.org/records/16601180
Model code and software
GitHub Repository for "Strategies for Incorporating Static Features into Global Deep Learning Models" Tanja Liesch https://github.com/KITHydrogeology/dynamic_static
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 178 | 81 | 20 | 279 | 46 | 44 |
- HTML: 178
- PDF: 81
- XML: 20
- Total: 279
- BibTeX: 46
- EndNote: 44
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
I would suggest to include the purpose of the models in the title, e.g. by adding "for groundwater levels in Germany".
Best wishes, Willem Zaadnoordijk