Strategies for Incorporating Static Features into Global Deep Learning Models

Liesch, Tanja; Ohmer, Marc

doi:10.5194/egusphere-2025-4048

Preprints

https://doi.org/10.5194/egusphere-2025-4048

Preprints

03 Dec 2025

| 03 Dec 2025

Strategies for Incorporating Static Features into Global Deep Learning Models

Tanja Liesch and Marc Ohmer

Abstract. Global deep learning (DL) models are increasingly used in hydrology and hydrogeology to model time series data across multiple sites simultaneously. To account for site-specific behavior, static input features are commonly included in these models. Although the method of integration of static features into model architectures can influence performance, this aspect is seldom systematically evaluated. In this study, we systematically compare four strategies for incorporating static features into a global DL model for groundwater level prediction, including approaches commonly used in water science (repetition, concatenation) and two adopted from related disciplines (attention, conditional initialization). The models are evaluated using a large-scale groundwater dataset from Germany, tested under both in-sample (temporal generalization) and out-of-sample (spatiotemporal generalization) settings, and with both environmental and time-series-derived static features.

Our results show that all integration methods perform rather similar in terms of average metrics, though their performance varies across wells and settings. The repetition approach achieves slightly better overall performance but is computationally inefficient due to the redundant replication of static features. Therefore, it may be worthwhile to explore alternative integration strategies that can offer comparable results with lower computational cost. Importantly, the choice of integration method becomes less critical than the quality of the static features themselves. These findings underscore the importance of careful feature selection and provide practical guidance for the design of global deep learning models in hydrologic applications.

Received: 19 Aug 2025 – Discussion started: 03 Dec 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Tanja Liesch and Marc Ohmer

Status: final response (author comments only)

CC1:
'Comment on egusphere-2025-4048', Willem Zaadnoordijk, 04 Dec 2025

I would suggest to include the purpose of the models in the title, e.g. by adding "for groundwater levels in Germany".
Best wishes, Willem Zaadnoordijk

Citation: https://doi.org/10.5194/egusphere-2025-4048-CC1
- AC1: 'Reply on CC1', Tanja Liesch, 02 Feb 2026
  
  Dear William,
  thank you for the suggestion to further specify the application domain in the title. However, we deliberately chose to keep the title general, as the primary contribution of this study is methodological. The proposed strategies for incorporating static features and for performance evaluation are not specific to groundwater level prediction in Germany, but are intended to be transferable to other regions and to related environmental prediction tasks, such as surface water runoff prediction (see last paragraph of our Conclusions). The German groundwater dataset serves as a comprehensive and well-documented example to demonstrate these strategies, rather than as a limiting case study. To avoid implying restricted applicability, we therefore decided to retain a general title.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4048-AC1
RC1:
'Comment on egusphere-2025-4048', Anonymous Referee #1, 04 Jan 2026

The authors compare four different strategies for incorporating static features into global deep learning models. They show that the repetition method generally achieves the best performance but they are less computationally efficient. They also state that the selection of static features is more important than the choice of integration strategies. The manuscript is generally well-presented. However, I still have the following concerns.
In the Table 1 of Heudorfer et al. (2023), it shows that the time series features were derived from past groundwater level time series until 2011. The training, validation, and test periods for this study are 1991-2007, 2008-2012, 2013-2022, respectively. If the authors directly adopt the time series features from Heudorfer et al. (2023), there might be data leaking issue during validation. Also, while the authors have provided references for the time series features, it is nice to list the time series features in the main text or appendix for readability.
For the conditional model, it is unclear to me how the output of this layer is split and used to initialize the hidden and cell states of the first LSTM layer in the dynamic branch (Line 167-168). Does it mean that the output is directly used as the initial condition of the hidden and cell states? Please clarify.
In Line 176, the authors mentioned that there are 10 model initializations. Does it mean that the authors train 10 global models with different initial conditions?
For the results, the authors that the repetition model performs the best, but they show the results for the 256-neuron model for the repetition model and the 128-neuron model for the other models. It is hard to identify whether the better performance is due to the integration strategy or the more hidden neurons.
Specific comments:
Explain labels/legends in Figures 2 and 3.

Citation: https://doi.org/10.5194/egusphere-2025-4048-RC1
- AC2: 'Reply on RC1', Tanja Liesch, 02 Feb 2026
  
  We thank the reviewer for the positive assessment of the manuscript and for the constructive comments, which we believe help to further improve the quality and clarity of the paper. Please find our answers (in blue) to all comments (black) in the attached pdf file.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4048-AC2
RC2:
'Comment on egusphere-2025-4048', Anonymous Referee #2, 06 Jan 2026

This paper explores different strategies for global deep learning models that account for basin and hydrogeological "static" properties and characteristics. These strategies aim to enhance the models' generalization capabilities and overall performance. The authors tested several approaches that differ in how static properties are incorporated into the model and conducted these tests using two types of modeling methods ("in-sample"/training on all available wells, and "out-of-sample"/training 90% of the wells with test on the remaining 10%, test period being equal in both approaches). From the Deep Learning point of view, their study builds on this same technical issue that also emerged from other scientific fields. 4 integration strategies were tested for each modeling approach (in-sample and out-of-sample). The simplest integration strategy (repetition) appeared to perform the best, at the cost of much lower computational efficiency. However, the authors conclude that other strategies, particularly concatenation and conditional initialization of LSTM weights, deserve thorough consideration as they offer a good balance of performance and computational efficiency. The paper addresses an important issue; it summarizes the usefulness and relevance of existing strategies for incorporation of static features in LSTM in the specific case of hydrogeology. It is a very nice study, clearly written and organized. There are a few points that might be addressed or highlighted in the paper before publication in my opinion.
The paper should provide examples of time series (e.g. in the form of a panel with 4 or 5 of them) without requiring the reader to download them. I think it is important to have a straightforward understanding of the context of hydrological modeling by knowing what ground-truth data looks like. Therefore, it is crucial that a few examples be presented directly in the text. For instance, if most time series consist of almost pure periodic annual variations with constant amplitude through time, expectations regarding the model's performance would not necessarily be the same as for more complex variability. After downloading the time series and briefly examining them, significant differences in statistical properties can be observed (more or less weak trends, very short-term variations, strong amplitude of the water year cyclicity...). Do the authors know if, and to what extent, such differences may play any role in the models' performance: are there some behaviors for which the models systematically perform poorly, or very well?
Although it is beyond the scope of the paper, it seems like a lot of meteorological inputs was used. To what extent are they all "meaningful" for the application? Have previous studies that used this database conducted SHAP analysis or similar methods to determine which features such models learn from most effectively?
I think one or two lines on the concept of "meaningful" static features as it is used here would be needed. Here, "meaningful" stands for "informative" if I am not mistaken; maybe this term would be more appropriate.
It should be said at the beginning and justified why no hyperparameter optimization was conducted: this is a technical context that should be mentioned and explained (even briefly), especially for researchers who intend to use a similar approach.
Would one-hot encoding be very different than the repetition approach? This is the simplest way to attach an identifier to the wells, so in the framework of this study it would be interesting to recall this.
Line 149 (LSTM layer size): Why is it 128 when no hyperparameter optimization has been performed? Given that the results of the repetition model were presented for an LSTM layer of size 256, wouldn't it be preferable to present all results with an LSTM size of 256? I am not questioning the relevance of the results presented here, but it is important in my opinion that the presentation of the methodology does not raise any unnecessary questions for researchers interested in developing a similar approach.
Line 106: replace "real" with "actual".
Line 177: Section 3.3.2 and 3.3.3 should be merged in a single "In-sample and Out-of-sample" 3.3.2 section.
About the quality of environmental static features:

Line 220: I am not sure I understand this point well, which also seems to be about the heterogeneity of physical characteristics, as discussed in the previous point. Regarding the representativeness of the database for sampling general hydrogeological characteristics properly, I believe this can be addressed with general hydrogeological knowledge. Additionally, the spatial coverage and number of wells appear sufficient for consistent sampling of hydrogeological properties.

Overall, I don't quite understand how one could conclude that poor-quality static features have a greater impact on model performance than the integration strategy. Does this mean that all the used static features were poor quality? Or are these features considered poor quality compared to static features derived from time series, which are all meaningful? It will always be very difficult (not to say almost impossible sometimes) to have all at once high-quality, large-scale, high-resolution hydrogeological characteristics data that precisely accounts for spatial heterogeneity. Are we then reaching a major limitation to improve even more the generalization capabilities of Deep Learning models? In that case, would there be some particularly crucial environmental static features to focus on?

Citation: https://doi.org/10.5194/egusphere-2025-4048-RC2
- AC3: 'Reply on RC2', Tanja Liesch, 02 Feb 2026
  
  We thank the reviewer for the thorough and insightful review, as well as for the very positive assessment of the manuscript. We particularly appreciate the constructive comments and suggestions, which we believe help to further strengthen the clarity and relevance of the paper. Please find our answers (in blue) to all comments (black) in the attached pdf file.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4048-AC3

Tanja Liesch and Marc Ohmer

Data sets

Groundwater level time series, meteorological forcings and static feature dataset for 667 wells in Germany Tanja Liesch, Marc Ohmer https://zenodo.org/records/16601180

Model code and software

GitHub Repository for "Strategies for Incorporating Static Features into Global Deep Learning Models" Tanja Liesch https://github.com/KITHydrogeology/dynamic_static

Tanja Liesch and Marc Ohmer

Viewed

Total article views: 513 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
322	158	33	513	268	250

HTML: 322
PDF: 158
XML: 33
Total: 513
BibTeX: 268
EndNote: 250

Views and downloads (calculated since 03 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	135	53	15	203
Jan 2026	107	72	8	187
Feb 2026	80	33	10	123

Cumulative views and downloads (calculated since 03 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	135	53	15	203
Jan 2026	107	72	8	187
Feb 2026	80	33	10	123

Viewed (geographical distribution)

Total article views: 502 (including HTML, PDF, and XML) Thereof 502 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 28 Feb 2026

Short summary

We studied how to add site information to deep learning models that predict groundwater levels at many wells at once. Using data from Germany, we compared four simple ways to combine time varying weather with time invariant site characteristics. All methods gave similar average accuracy. Repeating site data at each time step was slightly best but used more computer power. The quality of site information mattered more than the method, guiding future model design.


Total:	0
HTML:	0
PDF:	0
XML:	0