the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Enhanced Markov-Type Categorical Prediction with Geophysical Soft Constraints for Hydrostratigraphic Modeling
Abstract. Accurately characterizing hydrostratigraphic structures is essential for reliable groundwater flow and transport modeling. Due to limited borehole coverage and geological complexity, uncertainty analysis plays a vital role in supporting robust hydrogeological modeling. Traditional geostatistical approaches such as Multiple-Point Statistics (MPS), offer flexibility in reproducing complex geological patterns and uncertainties, but they are computationally demanding, may struggle to maintain stratigraphic consistency, and can be difficult to apply in practice. Alternatively, the Markov-type Categorical Prediction (MCP) framework has a better computational efficiency and enforces stratigraphic ordering. However, its effectiveness is challenged in areas with sparse borehole data. To address this limitation, this study presents an enhanced MCP approach that incorporates airborne electromagnetic (AEM) geophysical data as soft probabilistic constraints on lithology occurrence. A tunable parameter controls the relative contribution of geophysical and geological information, allowing for flexible data integration within the simulation process. The approach is tested on both synthetic and real-world cases. Synthetic experiments of different scenarios demonstrate that incorporating geophysical constraints improves lithological prediction accuracy, particularly when combined with borehole data. In the field application from Egebjerg, Denmark, we demonstrate how a statistical relationship between lithology and resistivity can be derived by integrating SkyTEM data with borehole lithological logs and depth information. That relation is then combined with conditional probabilities from training images extracted from a 3D interpreted model, using the MCP framework. The results show that the integrated approach enhances the generations of complex geological features, such as buried valleys, especially in areas with limited direct observations. By embedding geophysical information into the MCP framework, the method combines the spatial consistency and stratigraphic ordering of MCP with the extensive coverage and subsurface sensitivity of geophysical data. This integration overcomes a key limitation of MCP and enables more reliable simulations in regions where direct subsurface observations are limited, providing a practical and adaptable tool for improving geological modeling in groundwater studies.
- Preprint
(15191 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2025-3160', Anonymous Referee #1, 20 Oct 2025
-
AC1: 'Reply on RC1', Liming Guo, 05 Nov 2025
The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-3160/egusphere-2025-3160-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Liming Guo, 05 Nov 2025
-
RC2: 'Comment on egusphere-2025-3160', Anonymous Referee #2, 17 Nov 2025
This manuscript introduces an enhanced version of the Markov-type Categorical Prediction (MCP) framework for hydrostratigraphic modeling, which integrates airborne electromagnetic (AEM) geophysical data as soft probabilistic constraints to improve lithological predictions, particularly in areas with sparse borehole data. The approach uses bivariate transition probabilities from training images and combines them with depth-dependent statistical relationships between resistivity and lithology. It is tested on synthetic scenarios (simulating AEM data and inversion) and a real-world case in Egebjerg, Denmark, using SkyTEM data and borehole logs. Results demonstrate improved accuracy in reproducing complex features, inclduing, better uncertainty quantification via entropy maps, and enhanced geological realism through post-processing corrections. The method addresses limitations of traditional MPS by being computationally efficient and enforcing stratigraphic ordering while leveraging geophysical coverage. Overall, the manuscript is well-written, with clear methodology, logical structure, and relevant applications to groundwater studies. However, I recommend major revisions to address methodological clarifications, validation, and potential extensions.
General Comments
- The use of airborne electromagnetic (AEM) data is promising, as it can provide extensive subsurface coverage. However, could the authors discuss whether commonly used drone-based electromagnetic systems (which are more accessible and cost-effective in some settings) could provide suitable information for this approach, or are there limitations in resolution or depth penetration that make them incompatible?
- The authors developed a depth-dependent conditional probability method. As far as I know, depth is location-dependent. Is it possible to use other more easily measured terms, such as bulk density, instead of depth to develop the dependent conditional probability method?
- The MCP method is a form of multiple-point statistics based on bivariate transitions. Based on a review of its advantages and disadvantages (e.g., advantages: computationally efficient, enforces stratigraphic ordering via zero-forcing, less reliant on highly repetitive training images; disadvantages: may produce unrealistic outputs in sparse data areas due to fallback on marginal probabilities, assumes conditional independence which can oversimplify complex patterns, and struggles with non-stationarity without extensions), could the authors address how their enhanced approach mitigates general limitations of standard MCP, such as handling non-stationarity or reducing artifacts in data-sparse regions?
- Has the resistivity-lithology relationship mentioned by the authors been validated through independent datasets or field experiments beyond the statistical linking in this study?
- Regarding the statement “More advanced inversion techniques (Deleersnyder et al., 2023) could potentially improve conditional probability from geophysical data as suggested in Hermans and Irving (2017)”: Can the authors elaborate on what specific advanced inversion techniques they have in mind (e.g., full-waveform or joint inversions)? Why is the current inversion technique (1D deterministic) not sufficient, and how might these alternatives quantitatively improve the conditional probabilities or reduce smoothing effects?
- This method seems to integrate soft and hard datasets effectively. However, as physics-informed neural networks (PINN) are becoming popular and the authors mention inversion and other methods, is it possible to integrate the proposed approach into a PINN framework by incorporating both soft (geophysical) and hard (borehole) datasets for potentially more robust predictions?
- The soft constraints show limited effectiveness in complex structures. As seen in the lower part of Figure 3b, the improvements from soft constraints are limited at stratigraphic interfaces or areas with strong resistivity contrasts, where the interface positions are still primarily controlled by the inversion model. This suggests that the method performs well under conditions where the resistivity-lithology relationship is relatively simple and interfaces are gentle. Does this indicate that the constraining effect may significantly weaken in environments with complex resistivity distributions or overlapping lithologies?
- Kernel density estimation (KDE) is a core step in establishing the lithology-resistivity relationship (Section 3.2), but the manuscript does not specify the method for determining the bandwidth. It is recommended to clearly state the basis for selecting the bandwidth and discuss its sensitivity to ensure the reproducibility of the results.
- The regularization parameters for the 1D TEM inversion are missing. The inversion uses GNCG + Tikhonov regularization (Section 3.1), but key parameters are not provided, including: the value of the regularization coefficient λ and its determination method; and the strategy for balancing data fit and model smoothness. This information is crucial for assessing the reliability of the inversion and reproducing the method.
- Figure 5 shows that τ ≈ 3 is optimal in the synthetic case, but the final τ value used in the Egebjerg field case is not specified. It is suggested to clearly state the value of τ at the beginning of Section 3.2 and explain the basis for its selection (e.g., sensitivity analysis, data quality, or borehole density).
- In the Code Availability section, the authors state that the simulation codes are proprietary and available upon reasonable request. To improve reproducibility, transparency, and accessibility for the scientific community, it is recommended that the codes be made publicly available, for example, on a platform like GitHub or Zenodo, if possible.
Specific Comments
- Line 48: The dot before "(He et al., 2014)" should be removed.
- Line 231: Please remove the extra "a" in the sentence "the a".
- Line 263: There is a repetition of "Lithology = 12"; please correct this.
- Line 352: "We" in "We mimic" should be lowercase (change to "we" for consistency if it's not starting a sentence, or check context).
- Line 402: For "a few deep boreholes," the authors should specify the depth range that qualifies as "deep" boreholes, how many boreholes fall into this category versus shallow ones, and the threshold used to distinguish them.
- Line 422-425: I suggest adding explicit references to Figure 12a and 12b to more clearly explain the related sub-figures.
- Line 435: It should also clearly mention which subfigure of Figure 7 is being referred to.
- Line 458: The entropy maps are not clear. Please provide more depth regarding why entropy is used (e.g., as a measure of predictive uncertainty) and a more detailed interpretation of this figure, including how variations in entropy relate to data constraints or geological features.
- The paper extracts 88 2D transects from a 3D model to compute transition probabilities (around Line 353), but does not justify whether the sample is sufficient to characterize the spatial variability of the study area. It is recommended to add a brief statistical description of the transect orientations, spacings, and coverage uniformity to strengthen the argument for representativeness.
- Figure 2: Inconsistent coordinate ranges and units. The horizontal coordinate ranges differ between the left and right subfigures, and the colorbar units are missing (presumed to be S/m). It is suggested to unify the coordinate ranges and label the units.
- Figure 8: Inconsistent horizontal coordinate ranges. The left figure spans -89 to 87, while the right spans -89 to 89. It is suggested to unify the coordinate ranges for easier comparison.
- Figure 9: Inconsistent scaling. The right figure includes a "×105" label, while the left does not. It is suggested to unify the scaling.
- Mixed use of conductivity/resistivity units. Both S/m and Ω·m appear in the text and figures. It is suggested to standardize to Ω·m and explain the conversion relationship upon first appearance.
- Figure 10: Abnormal format in the Layer 14 subfigure. The subplot for Layer 14 has inconsistent sizing and blank areas. It is suggested to revise it.
Citation: https://doi.org/10.5194/egusphere-2025-3160-RC2
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,258 | 66 | 21 | 1,345 | 38 | 30 |
- HTML: 1,258
- PDF: 66
- XML: 21
- Total: 1,345
- BibTeX: 38
- EndNote: 30
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
In this manuscript rather pleasant to read, the authors adopt the MCP method to integrate geophysical constraints for hydro-stratigraphic modelling. It is valuable for the EGUsphere community. However, the authors shall clarify some important aspects such as the degree of novelty of this work, some parts of the methodology. I am concerned by one of the indicator chosen to analyse the results; averaging lithological categorical values can be very misleading.
Below are detailed comments
Introduction
Line 59, the reference to Mariethoz et al. 2010 (Direct Samling specific algorithm, as mentioned line 551) might not be the most appropriate to support your statement here. Guardiano and Srivastava 1993 instead?
Line 69: what do you mean by that? How does the reference to Meerschman et al. 2013 support this statement? This reference would better describe the ease of use or wide use of MPS techniques such as the Direct Sampling.
Many statements refer to 4 different citations. Maybe keep the two most relevants and cite as (e.g. …).
Relevant work that could should probably be included in the literature review:
Lochbühler, T., Pirot, G., Straubhaar, J., & Linde, N. (2014). Conditioning of multiple-point statistics facies simulations to tomographic images. Mathematical Geosciences, 46(5), 625-645.
Pirot, G., Linde, N., Mariethoz, G., & Bradford, J. H. (2017). Probabilistic inversion with graph cuts: Application to the Boise Hydrogeophysical Research Site. Water Resources Research, 53(2), 1231-1250.
You have to clarify in the introduction that you are building up on previous work (Isunza Manrique et al., 2023) or ideas presented at a conference (Guo et al. 2024) and explain what is new here.
2.1 MCP
Is the considered lag h omnidirectional or directional?
2.2 Integration of geophysical data
Line 167: ‘this’ is ambiguous. Do you refer to Guo et al. 2024 or the work presented here?
It is not clear how P(A|C) is estimated nor how P(A|B,C) is integrated in the MCP framework (equation 1).
3.1 Synthetic case
Line 231: what are the different variables composing the training image? (maybe insert a step between 4. And 5.)
Figure 1 is confusing; there are two steps 6, crossing arrows, please reorganise it to make it clear or remove if the text description above is clear enough. Then later comes Figure 11, that looks totally different. It would be better to have a single workflow figure in section 2, and then give the specific of how the TI and conditional probabilities are estimated for the synthetic case and the real-case study.
3.2 Real-case study
Figure 9a: use the same colormap as in Figure 7.
Figure 9b: use a perceptually uniform colormap (e.g. https://doi.org/10.1038/s41467-020-19160-7 , https://www.fabiocrameri.ch/colourmaps/ , https://colorcet.com/ )
Line 440-441, may be add a reference to support the use of Shannon’s entropy, e.g. one of the followings
Lindsay, M. D., Aillères, L., Jessell, M. W., de Kemp, E. A., & Betts, P. G. (2012). Locating and quantifying geological uncertainty in three-dimensional models: Analysis of the Gippsland Basin, southeastern Australia. Tectonophysics, 546, 10-27.
Pirot, G., Joshi, R., Giraud, J., Lindsay, M. D., & Jessell, M. W. (2022). loopUI-0.1: indicators to support needs and practices in 3D geological modelling uncertainty quantification. Geoscientific Model Development, 15(12), 4689-4708.
Line 445: averaging lithological categorical values seems dangerous. It may convey false information. E.g. if lithologies 10 (aquifer) and 12 (aquifer) average to 11 (aquitard), that would not make sense. It would make more sense to have an aquitard probability volume and an aquifer probability volume.
Line 504, is the interpreted geological model (Figure 7) used a s reference in the sensitivity analysis? Please clarify.