the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Training deep learning models with a multi-station approach and static aquifer attributes for groundwater level simulation: what’s the best way to leverage regionalised information?
Abstract. In this study, we used deep learning models with recurrent structure neural networks to simulate large-scale groundwater level (GWL) fluctuations in northern France. We developed a multi-station collective training for GWL simulations, using both “dynamic” variables (i.e. climatic) and static aquifer characteristics. This large-scale approach offers the possibility of incorporating dynamic and static features to cover more reservoir heterogeneities in the study area. Further, we investigated the performance of relevant feature extraction techniques such as clustering and wavelet transform decomposition, intending to simplify network learning using regionalised information. Several modelling performance tests were conducted. Models specifically trained on different types of GWL, clustered based on the spectral properties of the data, performed significantly better than models trained on the whole dataset. Clustering-based modelling reduces complexity in the training data and targets relevant information more efficiently. Applying multi-station models without prior clustering can lead the models to learn the dominant station behavior preferentially, ignoring unique local variations. In this respect, wavelet pre-processing was found to partially compensate clustering, bringing out common temporal and spectral characteristics shared by all available time series even when these characteristics are “hidden” because of too small amplitude. When employed along with prior clustering, thanks to its capability of capturing essential features across all time scales (high and low), wavelet decomposition used as a pre-processing technique provided significant improvement in model performance, particularly for GWLs dominated by low-frequency variations. This study advances our understanding of GWL simulation using deep learning, highlighting the importance of different model training approaches, the potential of wavelet preprocessing, and the value of incorporating static attributes.
- Preprint
(2818 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on egusphere-2024-794', Anonymous Referee #1, 11 Jun 2024
Chidepudi et al. used deep learning approaches to simulate and predict groundwater level dynamics. Authors compared and discussed the performance of different approaches of different combinations, such as different DL models, different inputs (i.e., dynamic factors and static factors), wavelet decomposition of precipitation, one-hot encoding etc. Using deep learning approach to simulate and predict dynamic groundwater levels is challenging. This work is important and could be a good reference for the community. The paper is generally well organized but there are still a lot of details unclear. Major revision is needed for further review.
- There are no clear introductions of model structures.
- I didn’t find details of the model input or the structure of the input data. I especially wanted to know this in the multi-station approach
- How did you choose the training and test sets?
- I didn’t find how large your research area (only a figure). The resolution of ERA5 is low and the true variations of these hydrometeorological variables may not be accurately presented by the products
- What do you think about the uncertainties of data products from ERA5
- Did you only conduct the wavelet decomposition on precipitation or other variables also?
- What is the resolution of the data products of static attributes?
- What do you think the effects of hydraulic conductivity, elevation, slope etc. static attributes.
- Location of the well, i.e., in confined or unconfined aquifers may also be important
Citation: https://doi.org/10.5194/egusphere-2024-794-RC1 - AC1: 'Reply on RC1', Sivarama Krishna Reddy Chidepudi, 20 Jun 2024
-
RC2: 'Comment on egusphere-2024-794', Anonymous Referee #2, 09 Jul 2024
Review comments on the manuscript: Training deep learning models with a multi-station approach and static aquifer attributes for groundwater level simulation: what’s the best way to leverage regionalised information? by Chidepudi et al.
The manuscript presents several different deep learning approaches to simulate groundwater levels. Dynamic as well as static variables are used to train deep learning models to represent fluctuations on a high temporal resolution (daily data) in northern France. These different deep learning models were combined with different sets of input data (including preprocessing) and training strategies. Overall, the work is timely and covers the important topic of data-driven approaches to simulate dynamic groundwater levels. However, the manuscript has several shortcomings which are listed below. Major revision is needed.
Main Comments
What is the best way to leverage regionalised information? - The authors raise this question in the manuscript title but in my opinion, they do not answer the question in a sufficient way. This has mainly two reasons:
- The manuscript seems to be a combination of a technical note and a case study which leads to the result that a lot of essential information are missing. Reviewer 1 already pointed out several of the technical issues. In addition, a description of the data set is entirely missing. The only information available for the reader is the rough distance between the observation wells and the density in the region. Important information to understand the results and therefore the feasibility of the applied methodology is not supplied by the authors. For example: What is the distribution of static attributes in the different cluster groups? Looking at the attributes presented in Table 1, large differences between lithologies are to be expected (e.g karst vs. clay). Could it be that the annual group consist mainly of observation wells located in karstic/fractured areas and what would this tell us about the outcome of the study? Are these static attributes even presented/discussed in Chapter 4 (I assume that you can see them in Fig. 9 but they are not even named somewhere?
- The presentation and discussion of the results lacks the already mentioned discussion of the regional context but also a discussion of the results in a broader context. For example, the authors write L398:“However, wavelet pre-processing shifts the importance towards dynamic components, reducing the contributions of static features or OHE. When clustering is combined with wavelet preprocessing, low-frequency precipitation components emerge as key contributors, improving model performance.
Does this mean that the importance of all dynamic components is higher by default, and we do not need to consider geological/hydrodynamic/topographic features? Does this apply to all kind of unconfined aquifer systems (shallow, deep, karstic…)? Here it would be interesting to combine/compare your results with/to other available publications considering static attributes on a regional scale (e.g. Heudorfer et al., 2024 or Haaf et al., 2023).
The quality in writing (language, clarity etc.) differs a lot throughout the manuscript. This makes it difficult to follow the central theme and therefore requires revision. Sometimes sentences reoccur, e.g. L73: DL models have proved effective on a local scale, and are also on a larger scale by collectively training a significant number of piezometers (Chidepudi et al., 2023b; Heudorfer et al., 2024) vs. L80: The DL models have proved effective at local scale and are also proving more effective on a larger scale. At the same time the introduction of terms and abbreviation is totally off, some examples: GWLs is first introduces in the Introduction and then again in line 185, 308, 378 and 436; SHAP is first introduced in line 231 and then again in 461; an introduction (even though they are quiet common) for AI/DL/KGE and NSE is entirely missing. Altogether it feels like sections/paragraphs of different origin were put together.
Secondary Comments
L85: sensitivity to human activities - I do not really understand why this is an additional challenge compared to runoff data. Does it mean runoff data are not sensitive to human activities (e.g. river straightening, dam construction etc.)?
L121: their application to GWL simulation is still questionable. – Do you really mean questionable?
L141: We refer to (Beven and Young, 2013), for differences in the use of the terms simulation and forecasting. - I do not see the connection between the sentence and the rest of the paragraph. Maybe a few more words are needed?
L164: Although they seem somehow redundant, they are expected to provide complimentary information about the hydrogeological nature of the hydrosystems – This could and should be tested at one point (which does not mean that you have to add it here).
L167/ L173/180/323: Baulon et al., 2022a/b?
L187: Bidirectional LSTM - I would be good to provide a reference especially since you write in L192: BiLSTM […] are particularly good at identifying various patterns in data sequences, making them ideal for simulating GWLs, that change over time. or is this already a result of your study?
L304: Further explanation needed. The figure does not provide any details, especially no comparison, as written by the authors.
L355: This is an information you expect earlier in the manuscript.
L372: Why do you formulate “new research questions” here, is this necessary?
L425: No_ohe_no_stat approach?
References: Nourani, V., Alami, M. T., & Vousoughi, F. D. (2015). - I do not find a citation of this paper.
References:
- Heudorfer, B., Liesch, T., & Broda, S. (2024). On the challenges of global entity-aware deep learning models for groundwater level prediction. Hydrol. Earth Syst. Sci, 28, 525–543. https://doi.org/10.5194/hess-28-525-2024
- Haaf, E., Giese, M., Reimann, T., & Barthel, R. (2023). Data-driven estimation of groundwater level time-series at unmonitored sites using comparative regional analysis. Water Resources Research, 59, e2022WR033470. https://doi.org/10.1029/2022WR033470
Citation: https://doi.org/10.5194/egusphere-2024-794-RC2 - AC2: 'Reply on RC2', Sivarama Krishna Reddy Chidepudi, 25 Jul 2024
Status: closed
-
RC1: 'Comment on egusphere-2024-794', Anonymous Referee #1, 11 Jun 2024
Chidepudi et al. used deep learning approaches to simulate and predict groundwater level dynamics. Authors compared and discussed the performance of different approaches of different combinations, such as different DL models, different inputs (i.e., dynamic factors and static factors), wavelet decomposition of precipitation, one-hot encoding etc. Using deep learning approach to simulate and predict dynamic groundwater levels is challenging. This work is important and could be a good reference for the community. The paper is generally well organized but there are still a lot of details unclear. Major revision is needed for further review.
- There are no clear introductions of model structures.
- I didn’t find details of the model input or the structure of the input data. I especially wanted to know this in the multi-station approach
- How did you choose the training and test sets?
- I didn’t find how large your research area (only a figure). The resolution of ERA5 is low and the true variations of these hydrometeorological variables may not be accurately presented by the products
- What do you think about the uncertainties of data products from ERA5
- Did you only conduct the wavelet decomposition on precipitation or other variables also?
- What is the resolution of the data products of static attributes?
- What do you think the effects of hydraulic conductivity, elevation, slope etc. static attributes.
- Location of the well, i.e., in confined or unconfined aquifers may also be important
Citation: https://doi.org/10.5194/egusphere-2024-794-RC1 - AC1: 'Reply on RC1', Sivarama Krishna Reddy Chidepudi, 20 Jun 2024
-
RC2: 'Comment on egusphere-2024-794', Anonymous Referee #2, 09 Jul 2024
Review comments on the manuscript: Training deep learning models with a multi-station approach and static aquifer attributes for groundwater level simulation: what’s the best way to leverage regionalised information? by Chidepudi et al.
The manuscript presents several different deep learning approaches to simulate groundwater levels. Dynamic as well as static variables are used to train deep learning models to represent fluctuations on a high temporal resolution (daily data) in northern France. These different deep learning models were combined with different sets of input data (including preprocessing) and training strategies. Overall, the work is timely and covers the important topic of data-driven approaches to simulate dynamic groundwater levels. However, the manuscript has several shortcomings which are listed below. Major revision is needed.
Main Comments
What is the best way to leverage regionalised information? - The authors raise this question in the manuscript title but in my opinion, they do not answer the question in a sufficient way. This has mainly two reasons:
- The manuscript seems to be a combination of a technical note and a case study which leads to the result that a lot of essential information are missing. Reviewer 1 already pointed out several of the technical issues. In addition, a description of the data set is entirely missing. The only information available for the reader is the rough distance between the observation wells and the density in the region. Important information to understand the results and therefore the feasibility of the applied methodology is not supplied by the authors. For example: What is the distribution of static attributes in the different cluster groups? Looking at the attributes presented in Table 1, large differences between lithologies are to be expected (e.g karst vs. clay). Could it be that the annual group consist mainly of observation wells located in karstic/fractured areas and what would this tell us about the outcome of the study? Are these static attributes even presented/discussed in Chapter 4 (I assume that you can see them in Fig. 9 but they are not even named somewhere?
- The presentation and discussion of the results lacks the already mentioned discussion of the regional context but also a discussion of the results in a broader context. For example, the authors write L398:“However, wavelet pre-processing shifts the importance towards dynamic components, reducing the contributions of static features or OHE. When clustering is combined with wavelet preprocessing, low-frequency precipitation components emerge as key contributors, improving model performance.
Does this mean that the importance of all dynamic components is higher by default, and we do not need to consider geological/hydrodynamic/topographic features? Does this apply to all kind of unconfined aquifer systems (shallow, deep, karstic…)? Here it would be interesting to combine/compare your results with/to other available publications considering static attributes on a regional scale (e.g. Heudorfer et al., 2024 or Haaf et al., 2023).
The quality in writing (language, clarity etc.) differs a lot throughout the manuscript. This makes it difficult to follow the central theme and therefore requires revision. Sometimes sentences reoccur, e.g. L73: DL models have proved effective on a local scale, and are also on a larger scale by collectively training a significant number of piezometers (Chidepudi et al., 2023b; Heudorfer et al., 2024) vs. L80: The DL models have proved effective at local scale and are also proving more effective on a larger scale. At the same time the introduction of terms and abbreviation is totally off, some examples: GWLs is first introduces in the Introduction and then again in line 185, 308, 378 and 436; SHAP is first introduced in line 231 and then again in 461; an introduction (even though they are quiet common) for AI/DL/KGE and NSE is entirely missing. Altogether it feels like sections/paragraphs of different origin were put together.
Secondary Comments
L85: sensitivity to human activities - I do not really understand why this is an additional challenge compared to runoff data. Does it mean runoff data are not sensitive to human activities (e.g. river straightening, dam construction etc.)?
L121: their application to GWL simulation is still questionable. – Do you really mean questionable?
L141: We refer to (Beven and Young, 2013), for differences in the use of the terms simulation and forecasting. - I do not see the connection between the sentence and the rest of the paragraph. Maybe a few more words are needed?
L164: Although they seem somehow redundant, they are expected to provide complimentary information about the hydrogeological nature of the hydrosystems – This could and should be tested at one point (which does not mean that you have to add it here).
L167/ L173/180/323: Baulon et al., 2022a/b?
L187: Bidirectional LSTM - I would be good to provide a reference especially since you write in L192: BiLSTM […] are particularly good at identifying various patterns in data sequences, making them ideal for simulating GWLs, that change over time. or is this already a result of your study?
L304: Further explanation needed. The figure does not provide any details, especially no comparison, as written by the authors.
L355: This is an information you expect earlier in the manuscript.
L372: Why do you formulate “new research questions” here, is this necessary?
L425: No_ohe_no_stat approach?
References: Nourani, V., Alami, M. T., & Vousoughi, F. D. (2015). - I do not find a citation of this paper.
References:
- Heudorfer, B., Liesch, T., & Broda, S. (2024). On the challenges of global entity-aware deep learning models for groundwater level prediction. Hydrol. Earth Syst. Sci, 28, 525–543. https://doi.org/10.5194/hess-28-525-2024
- Haaf, E., Giese, M., Reimann, T., & Barthel, R. (2023). Data-driven estimation of groundwater level time-series at unmonitored sites using comparative regional analysis. Water Resources Research, 59, e2022WR033470. https://doi.org/10.1029/2022WR033470
Citation: https://doi.org/10.5194/egusphere-2024-794-RC2 - AC2: 'Reply on RC2', Sivarama Krishna Reddy Chidepudi, 25 Jul 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
510 | 209 | 31 | 750 | 27 | 28 |
- HTML: 510
- PDF: 209
- XML: 31
- Total: 750
- BibTeX: 27
- EndNote: 28
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1