the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Recovery of stratigraphic data with associated uncertainties from drillhole databases using litho2strat 1.0
Abstract. Australian commonwealth, state and territory geological surveys possess information on over 3 million drillhole logs. There are many more wells drilled in the search for oil and shallower holes related to hydrogeology. Other countries no doubt have similar data holdings. Together these legacy drillhole datasets have the potential to significantly improve our subsurface data coverage but have limited use as constraints on regional 3D geological models as many if not most drill logs lack stratigraphic information.
This study develops open-source codes and methodologies for stratigraphy recovery from drillhole databases by introducing a correlation algorithm that integrates data from multiple drillholes. The algorithms combine constraints from lithological descriptions, with stratigraphic relationships automatically derived from regional maps. In addition, by integrating uncertainty quantification and presenting multiple geological hypotheses, the resulting stratigraphical description provide critical insights for resource estimation, scenario analysis, and data acquisition strategies.
The application of our method to a dataset of 52 drillholes from South Australia demonstrated its ability to make useful predictions of stratigraphic solutions and quantifying associated uncertainties. These results not only validate our approach but also highlight opportunities to refine current stratigraphic descriptions and provide a valuable new source for regional 3D geological modelling.
- Preprint
(2210 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on egusphere-2025-1294', Guillaume Caumon, 19 Aug 2025
reply
This paper presents a new method to determine the succession of geological formations along many drillholes from lithofacies observations. In my view, the most interesting aspects concern the use of adjacency relationships derived from geological maps to constrain the classification, and the use of the branch and bound algorithm. I also like very much the uncertainty assessment aspect. However, I struggled to understand the paper, so I recommend major revisions to improve clarity and discussions as suggested below. Overall, I like the approach, but I think the way the paper is written has room for improvement. I hope the comments below will be helpful.
Main remarks:
- In terms of form, the paper would benefit from a better problem statement and more precise wording. I was a bit in the fog when I started reading because I was not very clear about the input, output and overall objective of the method. Things became clearer when moving forward, as details and examples were given. A reason for this is terminology, which is sometimes very general (please see some suggestions on this in the detailed remarks below). The flowchart of Fig. 1 is useful, and could be moved to the introduction. Adding some some visual examples in the flowchart would help, and explaining what is done in geological terms would be needed at this stage (more than how it is done --“optimization solver”-- or using a different type of frame). Adding some visuals could also support definitions of terms in the introduction. Would it be correct to state that the proposed method does a clustering of lithofacies data with topological (adjacency) constraints?
- Also on the form, the paper could describe a bit more completely some of the methods used. This includes the overall algorithm / type of distance (Line 96 --is this a map distance or 3D distance?); the basic algorithms and input of the litho2strat code (Line 118), in particular the branch and bound algorithm. A supporting figure and example would help. I appreciate that the code is provided, but a higher level description in the paper with a few more equations would certainly help. A more formal description of the uncertainty quantification would also help. (please see detailed remarks).
- Research gaps: the motivation for the method with regard to previous literature should be made more explicitly after line 60: what are the problems with the (many) well data classification approaches that motivate this contribution?
- Representativity of map contacts: as maps provide a section view of the 3D medium, there is not guarantee that geological maps provide an exhaustive set of possible contacts between stratigraphic units. While I appreciate the interest of constraining the solution space only to the observed contacts, I fear that this could also lead to under-estimating the actual uncertainties in many cases (and I guess this can be easily addressed). Please discuss.
- Along these lines, does the code make a distinction between “normal” stratigraphic contacts and stratigraphic contacts due to fault juxtaposition?
- It seems that the used algorithm is greedy, so that it does not depend on the well traversal order (top to bottom or conversely); is this correct?
- Does the proposed method use polarity? There is a mention to relative stratigraphic ages and directed graphs in the code design section, but polarity is not discussed before. In the end, I am not sure if the method handles only ordered sequences (like Dynamic time warping) or not. The aggregation of results to define the stratigraphic succession seems to preclude reverse series due to reverse faults or folds. Please clarify and discuss.
- Could you provide a few elements about performance and scalability of the code?
Detailed remarks:
- 14: “There are many more wells drilled in the search for oil and shallower holes related to hydrogeology”: sentence seems odd. Please rephrase.
- Line 16 of the abstract also mentions drillhole data, but the algorithms takes lithofacies as input, and does not currently allow for logging, geochemical or assay data, so being more specific would help.
- Line 20 mentions “a correlation algorithm”, but this appears to be only a step of the overall workflow, and not doing exactly the same thing as stratigraphic correlation as for instance in Waterman and Raymond (1987). summarizing the other steps and explaining what is correlated with what would help to disambiguate the term.
- 22: “integrating uncertainty quantification and presenting multiple geological hypotheses”: this is written as two distinct ideas, but I think the scenarios is the way to perform the uncertainty quantification here. Also, a more specific term about the nature of the hypotheses could be relevant.
- 38: “complexly coded lithological information but limited stratigraphic data”: Please disambiguate. What does “complexly coded” mean: lithologies coded as integers or something else? We understand later in the paper that textual descriptions are used. This would be worth explaining from the onset. Also, I’d argue that lithological information is part of lithostratigraphic data, so please replace “stratigraphic data” by “stratigraphic formations / units” (or a more relevant term).
- 42: check references data base (M Jessell et al. should read Jessell et al).
- 44 “From these” : among these?
- 50-52: The last sentence is correct, but it diverts the reader from the focus of the paper. Could be removed or moved to the discussion or conclusion.
- 55: “stratigraphy”: I suggest to replace by “stratigraphic units” here and in other places of the manuscript
- 61 “stratigraphy recovery”: please define / explain. “drillhole databases”; “data from multiple drillholes”: please be more specific. “we enhanced the robustness and reliability of stratigraphic interpretations”: enhanced with regard to what ? Please check syntax of the whole sentence. Probably present is better than preterit in this paragraph.
- 66-70: reads more like an abstract or a conclusion than an introduction.
- 1: what do “ASAD” and “geology complexity” mean?
- 110: “our”: 1st person is not useful here.
- 120 “Combinatorial optimisation solver”: for what problem exactly?
- 151-154: Is the top unit constrain a specific case of the global unit connectivity?
- 168: “relationships”: what type of relationships?
- 173: tectonic features + stratigraphic gaps may also lead to misalignments.
- 174: what do the nodes / edges of the connectivity graph represent? Is the graph oriented or not?
- 181: Give the equation for the solution score.
- 185: what is an “external drillhole”?
- 206 “geological distance”: please define.
- 229: “ensuring that modifications can be verified without introducing errors.”: please rephrase.
- Fig. 3: please add coordinates. Why is there a score on he drillhole ?
- 249: “The figure below shows”: please use figure number
- 249: distance between drillhole and polygon: is this the minimum, maximum or average distance?
- Check caption of Fig. 4
- 5: Would it make sense to replace topological by adjacency? It this graph created automatically?
- 267: is the probability marginal or conditional to the adjacent units? Some equations would help.
References
Waterman, M.S., Raymond, R., 1987. The match game: new stratigraphic correlation algorithms. Math. Geol. 19, 109–127.
Citation: https://doi.org/10.5194/egusphere-2025-1294-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
735 | 36 | 13 | 784 | 11 | 31 |
- HTML: 735
- PDF: 36
- XML: 13
- Total: 784
- BibTeX: 11
- EndNote: 31
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1