the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Unbiased statistical length analysis of linear features: Adapting survival analysis to geological applications
Abstract. A proper quantitative statistical characterization of fracture length (or height) is of paramount importance when analysing outcrops of fractured rocks. Past literature suggested adopting a non-parametric approach, using circular scanlines, for the unbiased estimation of the fracture length mean value. However, necessities shifted and now there is an increasing demand for parametric solutions to correctly estimate and compare all the parameters (e.g. mean AND standard deviation) of several types of distributions. These changing requirements highlighted the absence in geological literature of properly structured theoretical works on this topic and in particular on different biases that affect this estimate. Here we propose to tackle the right censoring bias, caused by limited size of outcrops with respect to fracture length, by applying survival analysis techniques: a branch of statistics focused on modelling time to event data and correctly estimating model parameters with data affected by censoring. After discussing both theoretical and practical aspects of survival analysis applied to geological datasets, we propose a novel approach for selecting the most representative parametric model (i.e. statistical distribution), combining a direct visual approach and distance statistics modified to accommodate for censored data. The proposed approach has been applied to real outcrop data, correctly estimating censored length distributions. We also show the effects of censoring percentage on crude parametrical estimation that do not use this paradigm. The theory and techniques discussed here are wrapped in an easily installable open-source Python package called FracAbility (https://github.com/gecos-lab/FracAbility).
- Preprint
(17446 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on egusphere-2024-2818', Stephen Laubach, 05 Nov 2024
This manuscript is an important contribution to a topic of considerable interest within geoscience. Fracture length distributions are important for rock strength and permeability, and thus are of great practical interest. Although length distributions from field studies are widely used inputs for modeling, there has long been uncertainty about how best to measure and analyze the outcrop observations.
There are in my opinion a couple of places in the MS where some clarifications will help improve the impact of the contribution. One of these areas is in contextualizing the work in the Introduction (see comments for lines 81, 106). The other is the at the beginning and in the transitions between the explanation of the survival analysis (see line 129 comment). There are also a couple of minor usage issues; I’ve highlighted some. The comments are keyed to lines in the text.
81 I suggest making this comment more nuanced and adding a reference: ‘joints when empty or veins when filled (refs), although many fractures have hybrid fill attributes: they may be partly filled with inconspicuous mineral deposits that resemble joints, or the degree of fill may depend on fracture width, so that small fractures resemble veins (e.g. Laubach et al., 2019).’ After all, many fractures of interest to subsurface applications are strictly speaking neither ‘joints’ nor ‘veins’. In some populations small fractures are fill but wide fractures are open with a thin mineral lining. The old joints versus veins terminology is not helpful, and this is particularly germane for the discussion of length since ‘open’ fracture length may depend on these width-dependent mineral infills. It’s better to call them ‘opening-mode fractures or faults’ and separately specify the fill state. Laubach, S.E., Lander, R.H., Criscenti, L.J., et al., 2019. The role of chemistry in fracture pattern development and opportunities to advance interpretations of geological materials. Reviews of Geophysics, 57 (3), 1065-1111. doi:10.1029/2019RG000671.
90 The ambiguity of lengths where, as is the common case, fractures are segmented and en echelon, ought to be mentioned. This is a big source of uncertainty in measured lengths (and heights) and there are now ways to deal with this rationally with other node types. See the Forstner paper.
106 Here the potential for flow in the fracture network is assumed to be a function of connectivity, but in the preceding list of fracture types many of the elements many not be conducive of fluid flow, for example some faults with gouge and opening-mode fractures that are sealed. Likewise, if you have a situation where sets are of different ages, early sets may be sealed (or partly sealed) and later ones more open. An example is an outcrop of veins abutted or crossed by later joints. These abutting and crossing relations may impart high connectivity but will have a different impact on flow than a bunch of intersecting open joints. Maybe in 106 say: “If all the fractures are open, a network with prevalence of I nodes…” This may not be central to the point that you are making in this paper, but it’s such a common and misleading logical jump in fracture network studies (and with respect to length) that the clarification is useful. See the discussion in Forstner and Laubach, 2022, J. Struct. Geol.
Also, if the rock itself is porous, even a network that has only I nodes can markedly augment fluid flow because of flow between fractures through the host rock (Philip et al., 2005, SPE Res. Eval. Eng.); here length distribution is the key parameter (not connectivity) as Philip et al. show, which just makes your focus on length even more important.
122 It seems like these values might also be meaningless for ‘stochastic modeling’? Do you clarify this in the Discussion?
129-132 On first read, I found the transitions here confusing. For clarity I think you ought to warn the reader here that you are going to demonstrate the time-length dimensional shift in 3.3. Something like ‘Survival analysis is usually used in the time domain. In section 3.3. we show how a time-length dimensional shift is valid. Here we briefly introduce the terms as they are used in the time domain.’ These are key lines defining terms. I think they could use some clarification. What do you mean by ‘the event of interest is commonly defined as death’? Is a clarifying word missing? The ‘event of “x” is’? Or do you need some more information at the start of the paragraph: “Survival analysis is used to analyze data in which the time until the event is of interest (for example, the time until death in some medical or biological contexts).” This would perhaps be a good point to introduce the idea that you are substituting distance for time?
133 Which ‘length’ do you mean here?
173 1D or 2D? How does this conversion work?
190 ‘simplest’
204 ‘it has its limitations’
223 ‘…that can enable the researcher to obtain an informed…’
235 ‘both figures’
In the example case studies, with such big clear outcrops, can you analyze a small area within the larger area and verify that you are accounting for the censoring correctly?
Recent reference of possible interest: Forstner, S.R., Corrêa, R., Wang, Q., Laubach, S.E., 2024. Fracture length data for geothermal applications. In Gill, C.E., Goffey, G., Underhill, J.R., eds., Powering the Energy Transition through Subsurface Collaboration, Geological Society of London, Energy Geoscience Conference Series, v. 1, https://doi.org/10.1144/egc1-2024-17
350 Given the limitations of any spacing statistic, I think it would be worthwhile mentioning here that good field practice with scanlines should be to keep track of the sequence of fracture occurrences, in other words, the spatial arrangement, as you’ve pointed out in other work (and also Marrett et al. 2018, J Struct Geol). Your analysis here seems like it would be equally apt for spatial arrangement data collection and analysis.
376 I agree with this way of proceeding re: defining length. Does your method work as well with lengths defined via branches; is there a reason to choose one or the other? Maybe this gets out of scope, but the way you mention it here might make a reader wonder.
388 This is a big claim that length is always underestimated. What if you have a process that produces only short fractures (or even fractures that are shorter than your outcrop size). Hooker et al. 2013, J. Struct. Geol. describes one set (of several) that only contains very short lengths. Maybe some caveats are in order here.
395 ‘it’? Maybe ‘they are’?
405 Although testing this hypothesis is something that people studying fracture lengths in the context of geomorphology ought to consider. Particularly large or open fractures can affect the size, shape, and occurrence of outcrop. See Eppes et al. 2024, Earth Surface Dynamics, doi.org/10.5194/esurf-12-35-2024.
409 ‘in key of time’ is an odd phrase. Check.
411 ‘useless’ seems harsh. I’m not convinced this extra remark is needed. Anyway, there may be other parameters (like segmentation) that have similar effects to outcrop size that would benefit from the approach you propose, even if outcrops were arbitrarily large.
445 This assumes that measurements are only caried out at one scale of resolution. But this need not be the case. See Ortega et al. 2006, AAPG Bulletin (for aperture sizes) and Forstner et al. for lengths.
451 And for some fracture systems, the smaller fractures are more prone to be mineral filled and potentially less obvious features on images. This size/visibility effect can also manifest in the picking of long fractures if the long traces are segmented.
-
AC1: 'Reply on CC1', Gabriele Benedetti, 08 Nov 2024
We thank the reviewer for taking the time to partake in the discussion and post this insightful community comment. We will now address each point hoping to clarify some points and modify the final draft following the suggested comments. We also provide a pdf supplement that is a formatted copy of this answer. In the pdf, blue text are the comments of the reviewer, in black our answers and in red the additions/modifications that we propose.
81 I suggest making this comment more nuanced and adding a reference […]
Thank you, we changed the line as suggested.
106 Here the flow in the fracture network is assumed to be a function of connectivity, but in the preceding list of fracture types many of the elements many not be conducive of fluid flow, for example some faults and opening-mode fractures that are sealed. Likewise, if you have a situation where sets are of different ages, early sets may be sealed (or partly sealed) and later ones more open. An example is an outcrop of veins abutted or crossed by later joints. These abutting and crossing relations will have a different impact on flow than a bunch of intersecting joints. Maybe in say: “If all the fractures are open, a network with prevalence of I nodes…” This may not be central to the point that you are making in this paper, but it’s such a common and misleading logical jump in fracture network studies that the clarification is useful. See the discussion in Forstner and Laubach, 2022, J. Struct. Geol. Also, if the rock itself is porous, even a network that has only I nodes can markedly augment fluid flow (Philip et al., 2005, SPE Res. Eval. Eng.), here length distribution is the key parameter (not connectivity) which just makes your focus on length even more important.
We agree on the importance of clarifying, we expand the line as following:
106 In a non-porous rock with all open fractures, a network with a prevalence of I nodes will be less connected, and fluid flow will be more restrained. However, for many of the fracture types that were previously discussed, this is indeed not the case (e.g. sealed faults and opening-mode fractures) (Forstner and Laubach, 2022). Furthermore, if the rock is porous then length distribution becomes the key parameter for controlling fluid flow (Philip et al., 2005).
122 It seems like these values might also be meaningless for ‘stochastic modeling’? Do you clarify this in the Discussion?
The discussed values indeed are useful for stochastic modeling both from a statistical and numerical point of view (i.e. DFNs). In the text this was not explained clearly. We provide the following edit to clarify.
119 Circular scan lines methods on the other hand do offer an unbiased estimate of the mean length, however, being non-parametric, they do not yield neither the distribution type (e.g. normal, exponential, etc.) nor distribution shape parameters (e.g. standard deviation, etc.). This in turn, makes the estimate completely useless to quantitatively compare different results, and carry out any downstream statistical and/or numerical modelling, such as DFN stochastic fracture modelling.
129-132 On first read, I found the transitions here confusing. For clarity I think you ought to warn the reader here that you are going to demonstrate the time-length dimensional shift in 3.3. Something like ‘Survival analysis is usually used in the time domain. In section 3.3. we show how a time-length dimensional shift is valid. Here we briefly introduce the terms as they are used in the time domain.’ These are key lines defining terms. I think they could use some clarification. What do you mean by ‘the even of interest is commonly defined as death’? Is a clarifying word missing? The ‘event of “x” is’? Or do you need a some more information at the start of the paragraph: “Survival analysis is used to analyze data in which the time until the event is of interest (for example, the time until death in some medical or biological contexts).” This would perhaps be a good point to introduce the idea that you are substituting distance for time?
Thank you for pointing out that you found the transition confusing. We changed the text as follows hoping to make it clearer.
122 To solve these problems, we propose to use survival analysis, a specialized field of statistics, specifically developed to deal with censored data. Survival analysis focuses on the analysis of time of occurrence until an event of interest (Kalbfleisch and Prentice, 2002). The advantage of survival analysis over the methods discussed above is that it considers censored data as the carrier of the crucial information that the event did not occur up to the censoring time, thus allowing for an unbiased estimation of all statistical parameters and models. However, although in literature the terms survival times, time-to-event, or more generally lifetimes (Lawless, 2003) seem to imply that time is the only valid variable, any non-negative continuous variable, such as length, is valid (Kalbfleisch and Prentice, 2002; Lawless, 2003). In the following sections of this chapter, we will start describing the canonical theory behind survival analysis in function of time, and then we will show how the same theory can be applied in space, to sets of length or distance measurements.
129 Since this technique is rooted in medical and biological applications, the nomenclature from this type of literature is carried along. The event of interest (for which we measure the time-to-event) is often defined as death, while a loss indicates that the observation has been lost because it was hindered by a secondary event, called a censoring event. Censoring can be …
133 Which ‘length’ do you mean here?
Changed with:
133 the event happens after the end of the study period and thus we observe the partial lifetime of the event.
173 1D or 2D? How does this conversion work?
We did not understand if the comment is referred to the type of intersection between the fractures 173 4. the censored event as the intersection between the fracture trace and the boundary (marked by a B node), or if it is referring to the figure below indicating that it is not clear what the figure entails. For the former it is a 2D intersection. For the latter, then we can expand the figure caption and text as follows:
Figure 6. Censoring effect on an example of a simple fracture network and corresponding survival diagram. The survival diagram is a 1D representation of the fracture length. On the Y axis the fracture number is indicated and on the X axis the length is measured. Solid lines indicate the actual measured length while dashed lines indicate the possible continuation of the fracture. Yellow pentagons represent the censoring of the boundary.
174 Figure 6 represents an abstraction of the fracture network by just representing fractures by their length. Each fracture in the network is numbered (Y axis) and the corresponding fracture length is represented by a bar. Bars with a yellow pentagon indicate that the fracture n is censored and thus the measured length is shorter than the true length. By applying …
190 ‘simplest’
204 ‘it has its limitations’
223 ‘…that can enable the researcher to obtain an informed…’
235 ‘both figures’
395 ‘it’? Maybe ‘they are’?
409 ‘in key of time’ is an odd phrase. Check.
411 ‘useless’ seems harsh. I’m not convinced this extra remark is needed. Anyway, there may be other parameters (like segmentation) that have similar effects to outcrop size that would benefit from the approach you propose, even if outcrops were arbitrarily large.
Changed in the main text. Thank you for the corrections.
350 Given the limitations of any spacing statistic, I think it would be worthwhile mentioning here that good field practice with scanlines should be to keep track of the sequence of fracture occurrences, in other words, the spatial arrangement, as you’ve pointed out in other work (and also Marrett et al. 2018, J Struct Geol). Your analysis here seems like it would be equally apt for spatial arrangement data collection and analysis.
We added a brief mention of this in the text as such
371 Finally we would like to point out that the censoring analysis is a secondary part in the analysis for spacing. It is worth noting that analysing the spatial arrangement of the fractures in the network (such as Marrett et al. 2018 and Bistacchi et.al 2020) is of fundamental importance. The presented datasets are equally apt to this type of analysis; however, we decided not to include this analysis and focus mainly on censoring to avoid increasing the length of an already dense text.
376 I agree with this way of proceeding re: defining length. Does your method work as well with lengths defined via branches; is there a reason to choose one or the other? Maybe this gets out of scope, but the way you mention it here might make a reader wonder.
Yes, we chose to measure the lengths of the entire segments instead of branches because they entail two different things. Branches offer a useful topological abstraction of the network (making it possible to classify node intersections), but they do not have a real geological or physical meaning. As we defined in section 2, 2D fractures traces are the intersection of discontinuity surfaces with a secondary surface. Branches on the other hand are defined as a segment of a fracture trace between any two nodes (either I-I, I-Y etc..). Considering the geological origin of a trace, by using branches we would be essentially segmenting fracture planes in smaller sub-planes. This, however, is only an artifact given by the topological definition of a branch and thus the obtained branch length distribution does not carry any real physical meaning.
This discussion, as interesting as it is, may be a bit out of scope and we tried our best to summarize it in the discussion as follows:
376 Branches offer a useful topological abstraction of the network (making it possible to classify node intersections), but they do not carry a real geological or physical meaning and as such a distribution obtained by fitting branch-length will have a different meaning compared to a length distribution.
388 This is a big claim that length is always underestimated. What if you have a process that produces only short fractures (or even fractures that are shorter than your outcrop size). Hooker et al. 2013, J. Struct. Geol. describes one set (of several) that only contains very short lengths. Maybe some caveats are in order here.
Yes, however we firmly support it and we expanded the discussion to motivate it further:
423 Measured lengths of censored fractures will always be shorter than their true lengths and, by using the first simple approach, the dataset is essentially “polluted” by shorter fractures thus always decreasing the measured mean. The second simple method will also lead to an underestimation of the mean because of the size bias. However, this second method can be less impacted by censoring. For example, if a fracture population has a very small standard deviation (i.e. almost all fractures have the same length) and/or fractures are occurring in an outcrop that is much bigger than the characteristic fracture length, then removing censored values would not have a great impact on the estimation. But, even if small, the underestimation will always be present. Overestimation of the mean length would be possible in these scenarios when we do not consider censoring as independent from the length distribution (for example if only fractures shorter than a certain value are censored). However, this would violate both the core underlying hypothesis of random censoring, and standard geological experience, and thus we do not deem it possible under these imposed limits.
405 Although testing this hypothesis is something that people studying fracture lengths in the context of geomorphology ought to consider. Particularly large or open fractures can affect the size, shape, and occurrence of outcrop. See Eppes et al. 2024, Earth Surface Dynamics, doi.org/10.5194/esurf-12-35-2024.
We added this remark in
406 … independent processes. Nonetheless in some applications (Eppes et al. 2024) this assumption may not hold, and a more in-depth study may be required to prove the independence hypothesis before proceeding.
445 This assumes that measurements are only caried out at one scale of resolution. But this need not be the case. See Ortega et al. 2006, AAPG Bulletin (for aperture sizes) and Forstner et al. for lengths.
Changed to
445 Because of this limitation, for a constant resolution scale, the modelled length distribution
451 And for some fracture systems, the smaller fractures are more prone to be mineral filled and potentially less obvious features on images. This size/visibility effect can also manifest in the picking of long fractures if the long traces are segmented.
Added as another factor contribution to censoring, thank you for the suggestion
In the example case studies, with such big clear outcrops, can you analyze a small area within the larger area and verify that you are accounting for the censoring correctly?
This is a tricky question that we thought about while writing the paper. It is not easy to see if censoring is correctly accounted for by just subsampling the outcrop (as large as it is). The problem is that essentially, we do not have a controlled environment. First, we are estimating only a limited suite of statistical models and thus we cannot say if the best estimated model is the true underlying model (and in fact we will never be able to tell). So even if for the first case study the lognormal may seem perfectly fitting, we cannot be certain that it is the true underlying statistical model. Moreover, the spatial distribution of the fractures in the outcrop space is also not uniform thus with the same sub area dimension you will have different model estimations depending on the position of the sub area. Thus, we found it difficult to obtain a satisfactory estimate of how well censoring is accounted and how well survival analysis works depending on the censoring percentage. We are relying on the fact that survival analysis has been used and is still being used in countless applications and show that it is working also for lengths. However, we believe that synthetic experiments can and should be carried out to explore further the effects of censoring, violations of the underlying hypothesis on the final estimation and the overall precision and reliability of survival analysis (we talk about this in 407-435). We decided to not include or explore synthetic results because it would have drastically increased the length of the MS and blurred its focus.
-
AC1: 'Reply on CC1', Gabriele Benedetti, 08 Nov 2024
-
RC1: 'Comment on egusphere-2024-2818', Sarah Weihmann, 22 Nov 2024
The manuscript tackles statistical length analysis of linear features by adapting survival analysis to geological outcrops. A novel approach is investigated based on three case studies with the aim of reducing impact of the censoring bias.
The reviewer is familiar with the authors’ work from their poster presentation at EGU24 and they rightfully received an OSPP award for their amazing work and presentation. However, the reviewer finds that parts of the manuscript at this point do not reflect this standard and would benefit from improved academic writing style and precision. The reviewer thoroughly recommends getting academic writing support for parts of the manuscript as the impact of the manuscript could be much bigger than it is now.
Specific comments/ technical corrections: See list below with line references.
TITLE
- Great title!
ABSTRACT
- Line 3: Too vague. What and who’s necessities shifted and why now? Who demands parametric solutions?
- Line 4: Too vague. Please rephrase “parametric solutions to compare […] all the parameters”. Avoid doubling word stems (parameter) and avoid being too general (“all the”, “several types”).
- Line 5: In line 3 present tense was used, now past tense. I recommend present tense here. Further: “These changing requirements” is fine to say once they are specified, else the link is lost (see comment line 4)
- Line 5: Word order swap: Move “in geological literature” backwards. Work with the “absence of [something]” rather than interrupting this segment.
- Line 6: When using “in particular” I recommend using “in general” beforehand to tie the parts together. Again, missing links in storyline.
- Line 7: At this point the reader does not necessarily know about right and left censoring, as only later explained in line 133. Either explain here or mention that it is a specific type of censoring that will be explained later in detail. Further: Mark word in italics?
- Line: 11: I don’t understand “modified” in this context or location. Further, please avoid long sentences; split this sentence into two sentences?
- Line 12: How often or on how much data? Maybe place the term “correctly” more prominently; it gets a little lost while this is the main selling point!
- Abstract generally: I feel a lot of this amazing research is not represented well enough and gets lost in a rather unstructured paragraph. Please re-read academic writing guides or seek advice from an experienced colleague (for example the one who wrote the discussion and conclusions!). The impact of the abstract can be much bigger than it is now.
INTRODUCTION
- Line 16: composed of
- Line 19: Replace “Nowadays the increase in” with “Amplified”? Wording sounds dated. Further: Does non-Italian research in DOMs exist? It feels biased. Try being more diverse in reference selection.
- Line 21: Grammar: Plural. Content: DOMS allow the extraction of datasets? Try being more exact in wording.
- Line 27: Why an example? If so, why not advertise the example more prominently as most common implementation but in brackets? Why does this example have a reference when it is stated as a general fact?
- Line 32: “only indirect geophysical methods may provide truly 3D datasets”. Truly as adverb to provide? Or true as adjective to 3D datasets? Change word order or grammar. Further, is there a better way of saying it?
- Line 33: Absence of contrast is not always the case
- Line 34: “a rich literature” sounds odd. Try “vast research conducted” or else.
- Line 34: Please don’t start sentences with “Because of”. Try “Due to” or “Given” or else. Please follow academic writing guides.
- Line 35: “the 2D lines of intersection of 3D […] surfaces with the outcrop surface, or with topography” is too complicated. Please make sure to keep sentences short and clear.
- Line 41: “the Digital Outcrop approach”. Is this a standardised method? It has not been mentioned in the text before. It is also not explained in more detail in the following sentence as the reader might expect. Please link the sentences more carefully and guide the reader better. Please consult academic writing guides.
- Line 43: don’t capitalize “authors”
- Line 44: “consists of”
- Line 47: Why reference here and not at the end of the sentence?
- Line 50: Here: DOM approach. Whereas in Line 41: Digital Outcrop approach. Please avoid using multiple versions of one term.
- Line 53: Please avoid disclosing information in brackets. Instead, convert them to separate sentences.
- Lines 50-52: Please don’t put words in bold. They have no meaning individually here and the reader is able to read text just so. They stand out too much in light of the text here.
- Line 56: I like this argument, very convincing!
- Line 58: This is the main selling point and well phrased. It needs to go in the abstract, too.
- Line 61: Earlier (line 32) it is argued that it is due to the 2D/3D issue – here it is presented differently and separated from the other part. Please revisit.
- Line 66: time spans. Also, please do not put in bold. I find the bracket suitable here, as it gives a list of examples.
- Line 67: length measurements
- Line 69 “that is the main topic of this contribution” is a separate, full sentence. Do not add this to the previous sentence. Please check academic writing guides.
- Line 69: What is physical measuring? Please clarify.
- Line 70: How are these censored trace length datasets created?
- Line 72: Avoid “second objective” if there has been no “first objective”. Please re-read academic writing guides. Always stay consistent.
- Line 73: These hypotheses come as a surprise, as they have neither been introduced nor are they explained here. Please adjust.
- Line 73: Please stay in the same tense. Please stay consistent. Say for example “[…] is available in […] Python […]”.
- Line 74/75: I am not sure this topic really belongs in the introduction
FRACTURE SURVEYS AND TERMINOLOGY
- Line 79: Language is imprecise. Joints are never “empty”. There will always be fluids/gases if not solids. Better: lacking mineral filling, or else
- Line 80: Make two sentences or separate by semicolon. With “and” it appears to be a list.
- Line 81: its = the. Avoid referring to something that hasn’t been mentioned.
- Line 81-84: I suggest splitting this into two sentences instead of one to a) simplify the structure and b) allow for one reference per point/sentence.
- Line 85-86: Avoid making multiple points in one sentence. It exhausts the reader too quickly.
- Line 88-90: Split in two sentences please.
- Line 91: Does the sampling area reach from a thin section to a satellite image? Language is not precise enough. Please improve.
- Line 92: Please do not put new points in brackets. Either make them a new sentence, if important, or leave them completely out.
- Figure 1: Maybe mark the “hole” with red hatching? This makes it easier to find the “hole” quickly and follow the explanation.
- Line 93: Why “boundary nodes” in bold if not “nodes and branches” (line 98) in bold as well? Please stay consistent. Generally, avoid bold style.
- Line 94: Please rearrange the sentence order. Think about what point is trying to be made and move it either to the front or end of the sentence. Please check academic writing guides for this.
- Line 94: “impossible” is too emotional. Use “not feasible” or else.
- Line 95: Never use “…” in an academic manuscript.
- Line 100: If it is not a direct quote there is no need for a page reference
- Line 101 and 102: to = by
- Line 106: Delete comma
- Line 106/107: Why future tense and not present tense? In line 110 present tense is used. Please stay consistent.
- Figure 2: “Topology” seems to be spelled with an odd character. Number 8 in the right-hand picture seems dislocated.
- Overall: I disagree with the writing style. It seems written almost as if it was spoken language. Often too many thoughts are cramped into one sentence, sometimes not clearly separated. Using parentheses to introduce even more points must be avoided. Important points need to be moved to the beginning or end of the sentence or paragraph. Please make sure that academic writing advise is followed thoroughly.
STATISTICAL MODELLING OF CENSORED LENGTH DATA
- Line 114/115: The start of the sentence and paragraph is poor. Try “There is increased necessity for estimating parameters of statistical distribution in length datasets”. Before “however” there should be a full stop. First person plural should be avoided.
- Avoid putting all but one word of a sentence in bold. It looks accidental. Maybe introduce the question with a colon or just don’t make it bold.
- Line 119: “lines” = line
- Line 120: Do not write “on the other hand” if there is no “on the one hand”.
- Line 122: “almost completely meaningless and useless” is too emotional. Use “impractical” or else.
- Line 124-126: This sentence makes no sense to me. Please rewrite.
- Figure 3 and line 136: “observations intervals” = observation intervals”. A figure should always speak for itself. This figure does not make enough sense by itself and needs simplification. Reduce the number of colours, avoid unnecessary and unlisted abbreviation (e.g. start and end), make sure colours can be distinguished (e.g. black vs. grey; tightly-dashed vs. line), explain question marks, match thickness of lines in image and legend, standardise spacing behind “Complete” and “Interval”, standardise font size for y-axis title, y-axis is not an axis, etc.
- Line 133: Why first-person plural here? Please avoid. Use passive voice. Maybe use: “the event happens after the end of the study period and thus the length of the event is partially observed”.
- Lines 133-140: Always start with a capitalized letter after a colon.
- Lines 135-140: Follow advise from line 133.
- Lines 142-145: Avoid writing in bold.
- Line 145: time to failure or time-to-failure? Keep spelling constant. Also, this term was only mentioned in an example and inside brackets – explain this more in detail instead.
- Figure 4: standardise axis titles font sizes, match image and legend line thickness, explain “C” in figure, clarify definition for partial length, use “s” and “r” in the image if explained within the legend, A and B should be on the top left side of each image, full-stop missing in figure description.
- Line 150: I understand “complement” as a supplement or accessory. Is “inverse” meant?
- Figure 5: Standardise font size for A and B, make box clear unless figure B is red, increase all axis title and value font sizes, remove title for figure or increase font size, define axis title x.
- Line 166: “non-negative continuous variable” is not defined and is not the opposite to “valid variable” yet seems to be the “central point of this work”. The term needs more introduction if it holds such importance.
- Line 170: Why highlight a verb if else only nouns are put in bold? Stay consistent by changing wording or marking.
- Figure 6: Match axis title font sizes to legend font size, Left: Match legend to image. Right: Match line thicknesses, clarify “n.” on y-axis: title number would be abbreviated “no.”, why clarify unit of length but not item (fracture length)?
- Line 174: What are “the definitions of the different types of censoring”? They are not mentioned. The bullet point list mentions considerations. Why are definitions referred that are not mentioned in this subchapter?
- Line 176: Why are sources of fracture genesis mentioned here in this subchapter? Is this not a topic for the introduction?
- General: The manuscript needs a lot more structure at this point to avoid the impression of a random collection of thoughts.
- Line 189: If a colon is used to present the main objective, avoid listing side points as bullet points. The message gets lost, and the reader is confused.
- Line 190: simples to simple
- Line 196: Avoid putting words in bold
- Line 197-210: I skipped these lines in the review process
- Line 209: uncertainty to uncertainties
- Line 212: What are natural questions? Please clarify.
- Line 213: Changing several short simple questions to a long complicated question is not “reducing”. Starting a question with a side sentence is not recommended. Structure needs to be clearer.
- Line 216 ff: Do not use first person plural. Instead: “These types of tests…”
- Line 224: “Sensible” sounds highly subjective if not explained how this is defined.
- Line 225: Font size in 225 and 241 seem different.
- Line226: Do not start a sentence with rather meaningless introduction words. The reader’s interest is immediately lost. Instead start with the subject “Probability Integral Transform is […]”
- Lines 228-231: Do not name the conditions (list) if the transformation statement is promised after the colon. Instead write line 231 first after the colon. The list gives too much importance to the conditions and limits the focus on the actual definition.
- Line 233: Split sentences after “(Fig. 7A)”. One thought per sentence only.
- Line 240: This sentence is in the right place of the paragraph, has profound impact and is worded very clearly.
- General: Introductory sentences of paragraphs need to start with catchy topics. Final sentences need to sum up the information or conclusion. One thought per sentence only. Keep sentences short. Start and end of paragraph need to communicate with one another.
- Line 245: Singular phenomenon (if needed, e.g. population)
- Line 248: likelihood of…
- Figure 7: The numbering (A, B, C) would normally occur on the left-hand side of the sub-figures. The text appears rather small and might be increased – however the message is clear.
- Line 256: “models deemed reasonable by the researcher” again is very subjective. Can this be made more objective?
- Line 257-290: The more technical, the better the readability of the manuscript it appears!
- Line 295: Again, important messages need to be at the start or end of the paragraph. This is not the case here. Make sure “sensible guided choice” is put in the end.
CASE STUDIES
- Line 297: “all the discussed theory” to “the discussed theory”
- Line 298: In the sentence before it is “case studies” so one needs to introduce singular first: change “one” to “case study”
- Figure 8: Text is very small. “Yellow pentagons” can’t be identified at this scale. Second scale in left figure should be inside the zoom window.
- Line 307: Try to mirror sentences that belong together. If starting with “Pictured on the left…” continue similarly to guide the reader. Don’t say “On the right it is represented” but try “Pictured on the right” or similar.
- Line 327: “Weibull seem” to “Weibull model seems”; “as” to “than”
- Line 328: Again, the last sentence has little value in the paragraph (“occupy the last two positions”). Maybe better: “rank lowest in comparison of the distances/models” or “are least representative”.
- Figure 9: Remove title or make bigger, reference length double units name standardized/ true.
- Figure 10: Title is too large, table is too small. Numbers and axis titles are small and hard to read. I suggest putting numbering of subplots (A-D) on the left-hand side of each image. I would get rid of all articles to make the figure description consistent. Overall title: “plots”?
- Figure 11: Numbers and words need larger font. Insert scale of subplot left inside subplot box.
- Line 330: “outcrop”
- Line 336, 337: Why past tense here when nowhere else?
- Line 341. Do not start a paragraph and/or sentence with the least important information (“In Fig. 12”).
- Line 341/342: Can this sentence be smoother?
- Line 346: replace “afterwards” as it is too figuratively. Try “at greater lengths” or else.
- Table 2: “indicating a worst fit in respect of the exponential and Weibull distributions positioned in second and third place respectively” reads very clunkily. “a worst fit” does not exist and “places” are only handed out at races. Try and rephrase.
- Line 349: “For the other models, looking at the mean rank value helps in understanding the final ranking showing that the gamma distribution is ranked lower than the exponential and the Weibull (at the second and third place respectively)” reads very clunkily. Try and rephrase.
- Figure 12: Increase font size of title and axis title. A and B should be on left-hand side of figure and potentially smaller (compare with other figures). The text could read better. Try “PIT visualization for the proposed length models is shown for Set 1 (A) and Set 2 (B) of the Colle Salza dataset. The red line represents the reference U(0,1); the closer a model's line is to this reference, the more representative the model. Among the models, the lognormal distribution demonstrates the closest fit to the reference line in both sets, although its fit is inferior to that observed in the first case study. Across both sets, all estimated models exhibit less linearity, with notable underestimation between 0.34 m and 1.5 m in Set 1 (A) and between 0.44 m and 2.57 m in Set 2 (B).” or similar.
- Figure 13: Reduce title font, increase table font, change numbering A-D to left-side please. Increase axis description. Here, all articles are kept consistent in the description. This is better than in Fig. 10.
- Figure 14: Reduce A-C numbering (compare with other figures). Increase legend. Explain all lines in legend. Purple colour hard to see. Stay consistent in your phrasing: Delete “Shows”.
- Line 350: Case study 1 and 2 are locations whereas case study 3 is a topic. Maybe make that clearer by expanding the title, e.g. “Spacing analysis”
- Line 351: Do not start a paragraph or sentence with minor information like here. Start with “Survival analysis can be used…”
- Line 366: Start new sentence with/at “Thus”
- Line 367: Add “model” to end of first sentence. “bad” to “poor”.
- Line 368: Don’t use “place”. Try “rank” or else.
- Line 370: Do not use “on the other hand” if you haven’t used “on the one hand”. Mind structure and consistency please.
- Line 370/371: Repeated use of word “quite”
- Line 371: What does “converging all at the first place” mean?
- Figure 15: Reduce numbering letter A and B and title, compare to other figures. Increase y-axis and clarify. Please correct figure description by the advice given above.
- Figure 16: Please adjust sizes referring to similar figures and comments above.
DISCUSSION
- General: This chapter reads very differently from the rest. It is very pleasant style to the reader. Maybe the same author can help with the other parts?
- Line 374: Please don’t put information in brackets
- Line 384/385: Repeated use of word “useful”
- Line 407: Please only mention a “second point” if a “first point” was explicitly named (line 378 names a “crucial” point; maybe write “The first and crucial point”?)
- Figure 17: Increase/ add axis titles and number. Increase titles (compared to numbering (A-D)). Increase legend size.
- Figure/table: Rather unusual to have figures and tables in the discussion. Maybe reconsider?
CONCLUSIONS
- All great!
Citation: https://doi.org/10.5194/egusphere-2024-2818-RC1 -
RC2: 'Comment on egusphere-2024-2818', David Healy, 05 Dec 2024
General comments
Overall, the paper is well written (but see below for important discrepancies that need to be addressed), well-structured and well-illustrated. I have no major problems with the analysis. I listened to an explanation of this work by the authors at EGU in Vienna in April, and believe it is a solid piece of work, which will be useful for the community.
However, there is a repeated tendency to exaggerate, overstate or mislead throughout the text. None of these are needed – the analysis is good and stands by itself. I recommend moderate revision to remove these statements (listed below) and add the necessary clarifications regarding other approaches.
Specific comments
Line 21: ‘allows the extraction of large datasets and facilitates the measurement of properties’ – these are still just samples from the population though; they are not the ‘right’ answer.
Line 50: note that FracPaQ does not in fact use the mean length statistic from these measures, just Intensity and Density; length statistics are calculated directly from the sample lengths. In addition, FracPaQ employs MLE methods to estimate optimum length distributions, with a Goodness of Fit approach. The work of Rizzo et al., (Rizzo, R.E., Healy, D. and De Siena, L., 2017. Benefits of maximum likelihood estimators for fracture attribute analysis: Implications for permeability and up-scaling. Journal of Structural Geology, 95, pp.17-31.) is not cited here, and it needs to be. As the current text gives an incorrect impression of FracPaQ functionality, I respectfully ask for clarification on these points.
Line 118: ‘avoided at all costs’ is a bit too dramatic; delete.
Line 123: ‘completely meaningless’ – again, too strong; with no other alternative, it can be a useful estimate, albeit limited.
Line 383: again, as above length stats estimation in FracPaQ does not use circular scanlines; we use the mean and standard deviation of the sample data and, optionally, MLE. Please correct this misleading statement.
Line 405: not sure this statement is true. Many outcrops are bounded by fractures; thus, the modern day process that has defined the boundary HAS been influenced by the geological structure and fabric of the rock mass.
Line 470 – ‘proper’ – replace with ‘better’.
Line 485 – regarding DFNs (and elsewhere in the ms); there are other approaches to modelling fractured rock volumes, for example effective methods and tensorial approximations. It would be better to mention and acknowledge these alternatives. DFNs are just one approach, among many.
Figures
Fig 5, 9, 12, 15, 17, – make the axis labels (numbers and text) bigger relative to the figure; hard to read.
Citation: https://doi.org/10.5194/egusphere-2024-2818-RC2
Data sets
Input shapefiles Stefano Casiraghi https://github.com/gecos-lab/FracAbility/tree/main/paper_materials
Model code and software
FracAbility source-code Gabriele Benedetti https://github.com/gecos-lab/FracAbility
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
247 | 107 | 92 | 446 | 2 | 4 |
- HTML: 247
- PDF: 107
- XML: 92
- Total: 446
- BibTeX: 2
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1