the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Integrating Ground Penetrating Radar and machine learning for assessment of lake bed permeability and potential vertical-water-loss zones in shallow lake under climatic stress
Abstract. Climate change and increasing anthropogenic pressures have intensified the vulnerability of inland water bodies, altering their hydrological balances, reducing their water levels, and degrading their water quality. One critical issue in this context is the limited understanding of lake bed hydrogeology, particularly the extent to which sediments hinder (as aquitards) or permit subsurface leakage. Although sediment sampling provides valuable point-based information, its spatial coverage is limited, emphasizing the need for high-resolution, lake-wide geophysical methods. This study determined whether the bed of Lake Vadkerti, a shallow lake experiencing persistent water level decline, facilitates vertical water loss. An integrated method combining ground-penetrating radar (GPR) and sediment sampling was used to evaluate subsurface sediment structures. A dense grid of GPR profiles was collected, enabling 2D profile interpretation and 3D time-slice visualization. Amplitude polarity, reflector geometry, and attenuation modeling were applied to identify stratified sedimentary layers. The resulting aquitard zoning map revealed heterogeneous lake bed conditions: low-permeability aquitards dominate the central and southern areas, whereas higher-permeability non-aquitards appear along the northeastern and central-western margins, indicating potential zones of groundwater interaction. The performance of four machine learning models—K-nearest neighbors, random forest, extra trees, and gradient boosting—in classifying aquitard zones based on GPR amplitude features was evaluated. The extra trees model demonstrated the most balanced performance across all classes and stronger generalization, with 97 % accuracy and high recall across all classes (aquitard: 100 %, leaky aquitard: 86 %, non-aquitard: 79 %). Moreover, its spatial predictions were consistent with observed hydrostratigraphic patterns. This approach provides a comprehensive framework for understanding the hydrological functioning of lake beds and informing sustainable water management in climatically sensitive freshwater systems.
- Preprint
(2104 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 24 Dec 2025)
- CC1: 'Comment on egusphere-2025-2904', Giacomo Medici, 07 Aug 2025 reply
-
RC1: 'Comment on egusphere-2025-2904', Anonymous Referee #1, 12 Nov 2025
reply
1 General comments
The authors present a case study from Lake Vadkerti (Hungary) where they applied offshore GPR measurements to assess the hydraulic permeability of the lake bottom. The goal behind this is to gain a better understanding of the water level variation and the role of geological conditions. The authors couple this investigation with a second comparative study where they apply 4 different machine learning algorithms to the data of the first study to find alternative hydraulic classifications and assess the applicability of ML in this context at the same time.
In the following I refer to the “classical” case study part of the article as “study 1” (including sections 3, 4.1, 4.2., 5.1 to 5.4) and to the machine learning part as “study 2” (consisting mainly of sections 4.3 and 5.5).
1.1 General comments on study 1
The approach of using the amplitudes of waves reflected from the seafloor and from shallow layer interface underneath to classify the sea or lake floor is not particularly new. However, it has been applied mostly in seismic studies so far, whereas lake studies using GPR measurements are still rare. Therefore, I consider the theme of the submitted Lake Vadkerti study as being interesting for many scientists working in this field.
The result of the efforts is a lake bottom classification map (Figure 7), which has been calibrated at some locations through soil samples taken from the lake bottom. Despite of this calibration, I have doubts that the map in the presented form is reliable, because the study shows some fundamental deficits in data processing and analysis, which I outline in the following, but also because the way how exactly the calibration was performed is neither explained nor illustrated:
- Antenna noise: The authors show only 4 examples of GPR profiles (Figure 5), all of which still contain a high level of antenna noise interfering with the reflection amplitudes to be analyzed. This noise is visible in form horizontal parallel stripes (“ringing”). In Figure 5 it is erroneously classified as “reflections from the lake water”. Before picking and interpreting the reflection amplitudes this noise must either be removed (e. g. through spatial filtering) or its influence on the picked amplitudes must be estimated in form of error bars or the like.
- Missing spreading correction: The picked GPR reflection amplitude must be corrected for the amplitude decay due to geometrical spreading and absorption. In their processing description the authors mention an absorption correction, but I have not found any mentioning of a geometrical spreading correction. It may be that they applied it but simply forgot to list it. However, if indeed no spreading correction was applied, then the results of all follow-up steps are wrong.
- Determining absorption coefficients of geological layers from radargrams measured in zero-offset configuration (such as in the present case) is not easy, usually requiring numerical modelling. Only the absorption coefficient of the water column as the uppermost layer can be determined directly. The authors do not outline how they determined this critical parameter and how accurate the results are. Indeed, the absorption coefficients they found or estimated for sand and clay (Figure 6) show unexpected values (absorption in sand higher than in clay). As this is contrary to the findings of most other authors (see compilations of Annan, or Schön, for example) it raises again the question of reliability.
- Polarization: The second type of observable (besides the amplitude strength) used in the classification is the polarization of the reflected signal. It is not clear how the authors determined it. In Figure 5 three examples (a1, a2, a3) are shown, but for only two of them the polarity can be identified by inspection without doubt (a1 and a2), the third one is less clear because of interference with noise (cf. point 1) and follow-up reflections. A closer view on the GPR sections shows that many more radargram traces are affected in the same way. There are several algorithms which can be applied to determine polarization even in the presence of coherent noise (such as given in the present case). I am wondering if one of them was applied.
- Soil classification: The authors show only the final result of their analysis which is the classification map (Figure 7), there is no assessment of the reliability. There is no presentation of the underlying (picked) amplitude and polarization data, no figures showing how they cluster and how the respective subsets relate to the soil samples and so on. This information is of course crucial for the reliability of the results.
My summary on study 1 is that the presented material and explanations are too sparse for a presentation in an international journal. As I mentioned earlier: I find the basic idea of the study attractive, but a fundamental reworking of this part and (possibly) correction of the analysis flow is necessary from my point view, where special emphasis has to be put on a reliability assessment (actually it is quite easy to present data cross plots, scattering diagrams, histograms and so on).
1.2 General comments on study 2
The questions of how machine learning can be applied to a classification task such as given here, which ML algorithms should be taken into consideration, how these would have to be configured in order to get reliable results, which of the preselected ML-algorithms is finally best suited are very interesting for many geoscientists and hydrologists. So, I am very much in favor of a study going into the direction as the authors present it in their manuscript. However, as in the case of study 2, there is too much information missing in the presentation of the applied approaches, in the reasoning on how and why they were configured in the way it was done, what the influence of the parameter selection might have been and so on. Since this information is missing a reader cannot draw a any useful conclusion from the result.
Again the reader is confronted with final results only (Figures 9 and 10) which have the appearance of the outcome of a black-box application. The classification maps from the four ML tools are much more similar to each other than to the original classification from study 1. Just from looking at the final outcomes opens no way to finding out what the truth is. Again, as in study 1, a reliability assessment is needed. This can be done in different ways, for example using subsets of the data, analyzing the clustering of the input data and other statistical aspects.
1.3 Comments on Discussion and conclusions
I haven’t reviewed the discussion and conclusion sections because of the many open questions regarding the data processing and analysis and the missing reliability assessment.
1.4 General comments on the structure of the article and summarizing recommendation
From my perspective the paper suffers, on the one hand, from crucial deficits in the analysis and presentation as listed above, and it is, on the other hand, thematically overloaded by presenting one case study and one methodical study in a single paper. Both studies require a fundamental reworking and also extension of the presentation of materials, analyses and reasonings for arriving at a publishable level. I am convinced that the results would be worth these efforts, but I find the necessary changes beyond a “major revision”.
Therefore, my recommendation to the authors is to withdraw the article, split it into two articles (classical case study (study 1) and ML methodical study (study 2)) and rework both carefully and resubmit them separately.
2 Specific comments
- Introduction: generally informative, but somewhat lengthy. Should be shortened (which would occur automatically if the article were split into two)
- Study area: In the description of the study area the geologically framework is missing. What formations are underlying the near-surface soils? Is it karst?
- Section 2, again: provide numerical values of the pH value, salinity and electric conductivity of the lake water.
- Lines 191-192: Sediment samples: provide grain size distribution
- Line 195: provide table with chosen processing parameters and example figures of how the filtering steps affected the data in the appendix. Spreading correction is not mentioned. Was it forgotten? This would lead to completely wrong results.
- Line 195 cont.: Absorption correction also not mentioned in the context of processing, but apparently considered later in the paper.
- Line 208: What is an “Ez tracker”?
- Line 211: “…reflection coefficient estimation ..,” How exactly was this performed? Explain and illustrate.
- Figure 4: apparently many steps of filtering where applied prior to amplitude modelling. Show example figures (perhaps in the appendix). How did this filtering affect the amplitudes? Again: spreading correction is missing.
- Illustrate and describe the process of “amplitude modelling” and its results. It should be waveform modelling.
- Equation 1: variable “R” is not explained
- Equation 2 and equation 1are inconsistent in their notation. How do they relate to each other?
- Note that equation 2 does not include geometrical spreading. It applies only to absorption.
- Lines 245-250: The determination of the attenuation parameters is crucial for the article. It requires an extensive explanation including data examples and result statistics.
- Line 257-258: “…A positive–negative–positive (P–N–P) polarity sequence was indicative of sand, whereas a negative–positive–negative (N–P–N) sequence was characteristic of clay …” This statement is wrong because reflection coefficients depend on the contrast of properties of the material found on both sides of an interface.
- Lines 271-275: This is a vague and unclear description of how the classification was performed. The classification is crucial for the article and must be clearly described and illustrated. Histograms and crossplots, may be helpful. Visualize parameter clustering and how it relates to the soil sampling.
- Lines 306-307: The statement that sand has a higher porosity than clay is wrong. The reverse is true. The statement applies to hydraulic permeability instead.
- Figure 5: most signals are overmodulated so the signal shapes cannot be recognized. Heavy antenna noise, needs to be reduced with spatial filtering.
- Figure 5c: not clearly recognizable what and where “PNP” is. Visible is only reverberation “white-black-white-black-white-black - …” Trace a3: neg-pos-neg sequence not convincingly recognizable because of noise and other reflection interference. Identify polarity through waveform modelling or correlation methods.
- Figure 5 again: “Reflections from lake water” are most likely antenna reverberations (artefacts). Dataset incompletely migrated (still diffractions visible).
- Figure 5, again: Explain: How was the “fine-grained layer verified? Through drilling?
- Figure 5, again: Lettering of axis labels is too small, not readable
- Figure 6: theoretical attenuation curves for sand and clay: How were the average attenuation coefficients determined? Explain and show diagrams with data, regression curves etc.
- Line 351: The statement that the attenuation in sand is higher than in clay is surprising regarding the fact that mostly the reverse has been observed (see the petrophysical literature) The result of the case study may be correct, but it needs to be discussed and supported by showing the data and analyses on which it is based. The uncertainty (error bars) needs to be shown and discussed.
- Figure 7: How exactly was this classification map derived? How does the underlying data base look like? Show histograms, scatter cross-plots of the involved parameters or whatever to illustrate the underlying parameter clusters and how they correlate with the different soil types.
- Figure 8a, b: The 3d view should be replaced or at least complemented by regular maps showing the depth of top and thickness of the sand layer.
- Figure 8c: The amplitude maps are questionable, possible wrong because of missing spreading correction.
- Sections 4.3 and 5.5 (ML models): It is not clear what parameters exactly were the input and what was the output of the ML tools. The text says “GPR amplitude features” … but what is meant? Amplitude and polarity only? Other questions: How were the training data sets selected? Which classifications were chosen? How were the results validated? Show figures illustrating these steps.
- Lines 440 to 520: Discussion and conclusion sections: I haven’t reviewed the discussion and conclusion sections because of the many open questions regarding the data processing and analysis and the missing reliability assessment
Citation: https://doi.org/10.5194/egusphere-2025-2904-RC1
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,244 | 103 | 20 | 1,367 | 37 | 29 |
- HTML: 1,244
- PDF: 103
- XML: 20
- Total: 1,367
- BibTeX: 37
- EndNote: 29
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
General comments
Very good geophysical research. Please, see my comments to fix the minor issues
Specific comments
Line 45. “of the lakes” must be deleted, there is a repetition.
Lines 55-56. “the properties of aquitards beneath lakebeds, particularly the distribution of low-permeability materials such as clay, play a crucial role in regulating vertical exchanges between groundwater and surface water”. Statement non backed-up by references; insert specific references on the role of aquitards in areas characterized by rivers and lakes:
- Medici, G., Munn, J. D., Parker, B.L. 2024. Delineating aquitard characteristics within a Silurian dolostone aquifer using high-density hydraulic head and fracture datasets. Hydrogeology Journal 32(6), 1663-1691.
- Taviani, S., Henriksen, H.J. 2015. The application of a groundwater/surface-water model to test the vulnerability of Bracciano Lake (near Rome, Italy) to climatic and water-use stresses. Hydrogeology Journal 23(7), 1481-1498.
Line 63. “2D profiling” of? Please, be more specific.
Lines 136-175. Describe the local stratigraphy for the sediments.
Lines 361. Can you estimate the approximate thickness of the aquitard units? Is it available from other information?
Line 445. Specify water seepage.
Line 445. Seepage in the un-saturated zone of the aquifer or not? Please, specify the point.
Figures and tables
Fig. 1. You need to insert a much larger map with the country/state visible.
Fig. 5. Increase graphic resolution for the traces.
Fig. 8c. Contouring method for the time slices? Please, provide methodological details.
Fig. 10. Coordinates too small.
Fig. 10. Legend too small and difficult to read.