the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Broken Terrains v. 1.0: A supervised detection of fault-related lineaments on geological terrains
Abstract. The study presents a novel approach for fault detection on geological terrains using supervised learning algorithm and careful variable selection. Synthetic faulted terrains are generated using Delaunay triangulation via the Computational Geometry Algorithms Library (CGAL) allowing for adjustments of parameters. We introduce 24 variables, including local geometric features and neighborhood analysis, for classification. Support Vector Machine (SVM) is employed as the classification algorithm, achieving high precision and recall rates for fault-related observations. Application to real borehole data demonstrates the effectiveness of the method in detecting fault orientations, the challenges remain with respect to distinguishing faults with opposite dip directions. The study highlights the need to address 3D fault zone complexities and their identification. Despite limitations, the proposed supervised approach offers significant advancement over clustering-based methods, showing promise in detecting faults of various orientations. Future research directions include exploring more complex geological scenarios and refining fault detection methodologies.
- Preprint
(1602 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on egusphere-2024-2004', Anonymous Referee #1, 08 Aug 2024
General comments
The manuscript titled “Broken Terrains v. 1.0: A supervised detection of fault-related lineaments on geological terrains” describes an approach to predict fault 3D geometry from changes in Triangular Irregular Networks (TINs). While this manuscript is within the scope of GMD and presents a novel approach to the classification of faults, I feel that there are substantial changes required to clarify assumptions, methods and results. For example, the key assumption that faults at the surface of the Earth are reflected by particular landforms (e.g. scarps or breaks in slope) and their geometry is not entirely valid nor is it clearly stated. The methods section should be structured to reflect the steps summarised in Fig 2 (which should be moved from the introduction to the methods section). The results are ambiguous, given that there is confusion around what is being compared in the test data and how it is being compared. Finally, the manuscript is hard to follow in many sections (see specific comments below) and the figures need a substantial amount of editing to clearly communicate to the reader what they should glean from them (e.g. symbology, colours, quality). Unfortunately, for these reasons I am recommending that this manuscript be rejected.
Specific comments
1 Introduction
Why is there a section called short summary? This information should be removed as everything is covered in the abstract
The use of the term geological terrains is a little misleading. The model uses changes in slope in triangular irregular network (TIN) models, presumably of the Earth’s surface (this is not clear), as the basis for detecting faults.
What if for instance, the user has a digital elevation model? Would they still be able to use this approach given that the surface model is not irregular?
One key assumption that I don’t think has been stated clearly enough is that this approach assumes that all scarps, and/or breaks in slope are caused by faulting, i.e. they are fault-controlled landforms. This is probably not the case as some landforms will be controlled by other geological processes and features such as erosion and variability in rock type resistance to weathering. Furthermore, not all faults will be reflected as changes in the landscape.
2 State of the art
Supervised machine learning has been applied to a multitude of applications other than lithology classification, and I am not sure how relevant these applications are to your example, which is specifically the linking of TIN segments based on their location and normal and dip vectors.
3 Methodology
The training data are synthetically generated based on user inputs (there is a list of these at lines 121-124). Please mention that the training data are synthetic in line 120 and consider providing the user defined variables as a table, for example, with an indication of the parameter name (as in the application) the range of possible values (surely there will be some parameters with restrictions on numeric values, e.g. non-negative) and the function of the parameter. This table may be useful to document other input parameters for the application if there are any.
The structure of the paragraphs for generating training data is hard to follow. It reads better with numbered dot points for each of the steps. For example,
The faulted triangulated terrains are created in the following sequence (summarized in Fig. 5).
- a container with 2D points is generated within a square of a given size.
- a new container of 3D points is created with the Z coordinate corresponding to the random value of dip and dip direction.
- noise is introduced to the surface defined as a random fraction of the elevation difference within the terrain.
- …
You must make sure that the table with user defined parameters has the same names as the parameters included in these steps to avoid ambiguity.
Lines 150-153: it is unclear if dip direction is included in the final set of variables for supervised learning. The statement “northern directions indicate great numerical difference (e.g. 358-2=356) but very small
geometric difference (4 degrees).” Needs to be clarified as I suspect that you trying to explain that the orientation difference between 358 deg and 002 deg (as measured relative to magnetic/grid north) is 4 degrees but numerically it is 356.
Line 169: The authors state that “sort the distances to neighbouring triangles in decreasing order.” to avoid randomness issues. Which distance or distances are used to sort the neighbouring triangles? Is this the Euclidean distance or cosine distance of the normal and dip vectors or is this something else? Please clarify.
Line 176: Authors state that visualsaion uses spatial clustering. Not sure what you mean by spatial clustering as I cannot see any indication of a specific spatial clustering approach. XXXX Appears to be more like the spatial distribution of classes (fault or not fault) as plotted on a map
Lines 210-215: please format the equations for precision and recall and F1 such that they are on separate lines from the text.
4 Results
How many samples in the synthetic training data and how many samples are in the synthetic test data? What were the parameters used to generate the synthetic data for this experiment?
It would be great to present the evaluation and validation data as a confusion matrix. Precision and recall can be appended to these tables.
Please change Tab to Table where it occurs.
I am a little confused about how borehole data and the fault models based on the analysis of topographic features can be compared? Please explain this more clearly. Also you need to include at least a confusion matrix of the comparison. If it is not a quantitative comparison then I suggest that you exclude this.
It seems that the only measure of success is fault or not fault and there is no measure of the successful classification of the orientation of the faults. Have you considered this as measure of fit for your classification model?
5 Discussion
I have read this section several times and I am still a little confused. I suspect that there are several aspects that you are trying to discuss:
- the use of TINs means that there are only a limited number (3) neighbours to every face and that this simplifies the modelling
- The assumptions when generating synthetic training data, e.g. planes representing faults
- Issues with multiple faults being predicted from a single synthetic fault training example that have different dips (although I am not entirely sure if I understand this correctly)
I suggest a careful review of the discussion with the view to clearly distinguish the main points (as sub sections of the discussion with headings) and clarify to the reader the key message for each of the points.
6 Conclusions
The conclusion should clearly state, what you did (developed a supervised fault classifier) and how it is novel and different from other approaches (generate synthetic terrain data representing faults that control landscape geometry, use of TINs to simplify modelling) and the key assumptions of the method. You should also communicate the impact of your work (who should be using your fault classifier and why).
Figures
Figure 1 - What do the colours in A represent? The legend states scalars but it is not clear what the scalars are, I suspect it is distance above some reference?
Figure 3 - This figure needs to be moved to the Methods section where the steps are summarised in detail. Likely introduced in an initial paragraph before section 3.1. At the moment this workflow lacks this clarifying information.
Figure 4 - It is unclear what is being presented here. Is this a 2D view of an underground mine or mining region? What data are used to generate the points in 4B and 4C? What clustering algorithm used and what are the variables used in clustering? It appears that this information is provided in Michalak et al. (2022). I realise that a certain level of knowledge is assumed but for someone who has not read previous iterations of this research need to be provided with more background knowledge. It is probably worth indicating that this figure is modified from
Figure 6 -I am having trouble seeing the differences between all of the plots in Fig 6. Is there some way that you can change the shape or the colour of the points in each of the models and indicate how these points compare with the borehole data or the unsupervised model, whichever is being compared in this figure?
Figures 7 and 8 - The symbols in these figures are hard to see as they are very small. Also these figures can probably be combined into a single figure as A and B.
Citation: https://doi.org/10.5194/egusphere-2024-2004-RC1 -
AC1: 'Reply on RC1', Michal Michalak, 09 Aug 2024
We thank the Referee for submitting their review in due time. As of now, we would only like to point out that some of the main objections likely result from a misunderstanding that we use Earth's surface data. However, we don't use this type of data. We used borehole data (sect. 4.2) for illustrating the aim of detecting faults for subsurface and limited data with preferred orientation of strata. We will work on the review and the manuscript to be more clear about the aim and to avoid future misunderstandings. We will submit our full response when the second review is available.
Citation: https://doi.org/10.5194/egusphere-2024-2004-AC1 -
AC3: 'Reply on RC1', Michal Michalak, 27 Aug 2024
The PDF file containing the response to Reviewer #1's comments is located beneath the comments from Reviewer #2.
Citation: https://doi.org/10.5194/egusphere-2024-2004-AC3
-
RC2: 'Comment on egusphere-2024-2004', Anonymous Referee #2, 14 Aug 2024
General comments
The manuscript "Broken Terrains v. 1.0: A Supervised Detection of Fault-Related Lineaments on Geological Terrains" introduces a novel machine-learning approach but challenges readability, making it difficult to follow the progression of ideas. For example, the section on geological settings is well-contained within a single paragraph. Still, whether the subsequent text belongs to this section or would be more appropriately placed in the Results or Discussion sections. The manuscript would benefit significantly from a comprehensive restructuring to enhance coherence and flow. Additionally, the figures require careful editing to improve their visual impact; for instance, Figure 5 uses blue points on a deep grey background, a combination that lacks sufficient contrast and hinders clarity. Still, some technicalities need to be clarified, mainly how a detection model designed with synthetic 3D faults could be applied to borehole data.
Specific comments
0.- Short Summary
This section does not look necessary to get the correct general idea of the manuscript, just like the words "to classify terrain shape or nearby features" when the main goal is fault detection.
1.- Introduction
line 38 "lineament/fault" is not recommended to use the "/" in formal manuscripts. The training set of Figure 1 looks very similar and has short faults but is rotated in different 3D positions. It looks like quite a simple idealized model. Could you add more complexity, such as fault displacement variation?? In Figure 3 and general, it is better to be specific with quantities instead of using the term "many."
2.- State of the art
I prefer to call this section "Background" instead of "State of the Art."
2.2 Geological Setting
As I mentioned earlier, this section needs to be completed, and 105 paragraphs sound like a discussion instead of describing a geological terrain or setting.
3.- Methodology
3.1 .- Genereting terrains
It needs to be rewritten for clarity.
3.2 Selecting meaningful and consistent variables
I think that some terminology upgrades can be made here, like "variable features" or "feature" instead of just "variables," which could be general and prone to confusing terms.
3.3 Visualization
This section is too short; eliminate or combine it with other sections.
4 Results.
In line 217, the parentheses are incorrectly placed. Additionally, there needs to be more essential details typically included in a machine learning approach, such as the number of samples used for training and testing and the number of correctly classified samples. The sections overall appear too brief; therefore, it would be beneficial to provide a more thorough description of the experiment to ensure that others can fully understand the methodology and results.
5.- Discussion
This section is challenging to follow; I suggest it be rewritten for clarity.
6.- Conclussions.
This section reads more like a discussion than a conclusion. Only lines 294 and 295 align with the intent of a conclusion. While the section is well-explained, I recommend relocating it to the discussion section.
Citation: https://doi.org/10.5194/egusphere-2024-2004-RC2 - AC2: 'Reply on RC2', Michal Michalak, 27 Aug 2024
Model code and software
BrokenTerrains Michał Michalak https://github.com/michalmichalak997/BrokenTerrains/blob/main/README.md
Interactive computing environment
BrokenTerrains Michał Michalak https://github.com/michalmichalak997/BrokenTerrains/blob/main/Broken_terrains_training_testing_evaluating.ipynb
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
511 | 109 | 35 | 655 | 17 | 17 |
- HTML: 511
- PDF: 109
- XML: 35
- Total: 655
- BibTeX: 17
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1