the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Fatbox: the fault analysis toolbox
Abstract. Understanding complex fault networks is essential for reconstructing their geological history, quantifying deformation in tectonically active regions, and assessing geohazards and resource potentials. Structure and evolution of fault networks are investigated using a range of methods, including numerical and analogue modelling, as well as the analysis of topographic data derived from satellite imagery. However, due to the high density and complexity of fault systems in many study areas or models, automated analysis remains a significant challenge, and fault interpretation is often performed manually. To address this limitation, we present Fatbox, the fault analysis toolbox, an open-source Python library that integrates semi-automated fault extraction with automated geometric and kinematic analysis of fault networks. The toolbox capabilities are demonstrated through three case studies on normal fault systems: (1) fault extraction and geometric characterization from GLO-30 topographic data in the Magadi-Natron Basin; (2) spatio-temporal tracking of fault development in vertical cross-sections of a forward numerical rift model; and (3) surface fault mapping and geometric evolution of an analogue rift model. By representing fault networks as graphs, Fatbox captures the complexity and variability inherent to fault systems. In time-dependent models, the toolbox enables temporal tracking of faults, providing detailed insights into their geometric evolution and facilitating high-resolution measurements of fault kinematics. Fatbox offers a versatile and scalable framework that enhances the efficiency, reproducibility, and precision of fault system analysis – opening new avenues for tectonic research.
- Preprint
(1574 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 10 Oct 2025)
-
RC1: 'Comment on egusphere-2025-3989', Anthony Jourdon, 01 Sep 2025
reply
I reviewed the paper entitled “Fatbox: the fault analysis toolbox” by Gayrin et al. The paper presents a python software using image analysis methods to automatically extract faults from DEMs, numerical and analogue models in two dimensions. In addition to reconstruct a fault network, the software can perform a time evolution analysis to characterize fault evolution in time and space. Although limited to 2D applications, as well discussed by the authors in the limitation section, the method looks good and the presented applications demonstrate well that it is promising.
Here are some comments and remarks:
The paper presents the results of applying a sequence of methods and functions to treat the data and obtain a fault network. Although the results look good, I think it would be nice to provide more details about these functionalities and illustrate the effect of the functions applied at each step from the raw data to the final fault network. While some steps are shown for the DEMs fault extraction, we do not see it for the numerical model to reconstruct a fault network from the raw plastic strain output. The jupyter notebooks illustrate this more, but I think it would be good to have these steps in the paper.
You made the choice to structure the paper using applications which leads to a lot of redundancy as some methods are the same across the applications e.g., the DEM and analogue model share the same automatic topographic analysis while the numerical and analogue models share the same approach to use strain and characterize fault evolution over time and space.
I suggest that instead of structuring the section 3 by application you could structure it by approach or functionality and present how a functionality is used for each application.
At the beginning of section 3 you could present the applications, and then describe your functionalities, how they work, what they do, when to use them etc. with some nice illustrations.There is a problem with the references across the paper. I don’t know how this occurred but there are a lot of citations in the text that are not in the reference list. You should double check that before re-submitting the article.
Finally, because the paper is very oriented about the Python package, it is referred several times and its content is described, I had a closer look to it. I have some suggestions that I think could benefit its spread and use but I do not think that the authors need to address them for the paper to be published.
Line by line comments:
L107: “Each component of the network consists of nodes (points defined by location x- and y-coordinates) and edges (connections between nodes)”
This sentence seems appropriate to introduce here that this (nodes + edges) is called a graph, particularly because it is the title of the subsection and that you use the term graph at line 112.L137: I am sorry, but I do not understand what “edit the network” means in this context.
L185-187: What is actually detected by the Canny algorithm? In the jupyter notebook we can see that the method is conveniently implemented in the scikit library but the actual quantity and approach utilized to detect “edges” and what it means in the physical world is not clear to me. It is a very important step of the method presented here, and even if it is not implemented by the authors, it would be nice to have an overview of the approach.
L188-189: “non-fault-related components exhibit distinct geometric signatures; they are subsequently filtered out in using dedicated functions based on curvature and length”
Interesting! However, there are no real details in the manuscript on how this important step is taken. Looking into the jupyter notebook tutorial we can see that the scikit method called “remove_small_objects” is used, with a parameter “min_size=30” pixels. This value sounds arbitrary. Is there a generic way to choose it? If the surface of the covered ground or the resolution of the DEM was different, would that value change? L215: “a virtual cross section is drawn perpendicular to the fault axis until reaching a predefined maximum distance”
How is that predefined maximum distance chosen? In the notebook it is set to d=12 (no units indicated so I assume it is also in pixels?), same question than in the previous comment: will that value change if the surface and/or resolution of the DEM changes? How can you be confident enough that using a constant value for all faults may not lead to crossing another fault depending on where you are performing the analysis?L271: “In the following, we use a relative threshold optimized to highlight fault structures.”
Could you please provide a bit more details of how the optimized relative threshold is chosen?L277: “In some cases, interpolation to a regular grid is necessary, particularly when the model dataset has variable resolution”
Why is that step necessary? If the object used in the first place is an image, the pixel grid is already regular even if the model’s mesh is not, isn’t it? Can you please provide an example for which that step is required and illustrate why?L280: “Our library provides functions to address these issues”
This is interesting, but unfortunately, we miss details here. What are these functions? Could you be more specific about which function addresses which issue and how? It would help the readers to know what functions they should use depending on their case.L282-284: “To optimize the network, a filtering function removes selected internal nodes (based on user-defined parameters), reducing density while preserving realistic geometric orientations.”
Could you please elaborate on that function? What does it actually do?L290-291: “the program compares each fault at time 1 with each fault at time 2 and vice versa”
If I understand correctly, you mean that you compare two consecutive timesteps, right? If this is the case, I suggest to use “time n” and “time n+1” instead of “1” and “2” to better emphasize that it is between two consecutive steps, whatever their numbering.L292-293: “we calculate the minimum distance between each node of a fault and each node of the neighbours and then average these distances”
I do not understand what is the “distance of each node of the neighbours”. Neighbour is not defined and thus ambiguous. Could you please be more precise about the distance between what and what is computed?L299-300: “The article (Neuharth et al., 2022) provide a detailed study of fault system evolution in the 2D numerical models discussed earlier”
I think that it is a bit unfortunate that the illustration of the method presented in this paper is actually not in this paper but in another. You could maybe showcase some of the possibilities of the method presented here in a figure to provide the readers with examples of what they could expect from your software especially what you describe lines 302-303L339-350: Section 3.3.2 roughly contains the same information than section 3.1.2. I suggest that you could remove this section. L355: “strain threshold to distinguish active faults”
Is strain correctly employed here? It sounds strange to use strain to identify active faults. Strain-rate would be a better quantity because strain also contains inactive faults. L357: “data are binarized”
I suggest to introduce the term binarized in section 3.2.2 where the procedure is first described. L356-360: This paragraph is also a redundant information with a previous section. I suggest to remove it.
L362-365: Is this procedure different than the one described at lines 195-197?
If not, I suggest to group them, if yes, I suggest to provide more details at lines 195-197 and then explain why here the procedure is different.L387-388: “The Canny edge detection is based on this topography gradient criterion.”
While the fact that the Canny edge detection uses topography gradient can be reminded in the discussion, I think it would be nice to introduce much earlier, at lines 185-187, what the Canny algorithm actually does and what type of quantity it uses. For example, the use of topography gradient criterion was not mentioned before the discussion.Section 4.1 Fatbox options for defining a fault:
I feel like there could be more information about how to choose some parameters for each identification criterion. While the DEM analysis is slightly more detailed, the strain or strain-rate approach is summarized in a single sentence. The last paragraph is good, it states what type of choices can be made. Aren’t there more like this that you could elaborate upon to provide more insights about your toolbox and what we could do with it?L420: “most steps can be parallelized”
What do you mean by “parallelized” in that context? Does it mean that your toolbox should run in parallel i.e., multiple mpi ranks/threads or that the steps of the procedure are independent of each other and thus you can perform each of them independently?
Because as I understood, there are steps or procedures that do not seem independent, like the time evolution and the correlation between time steps as each time step needs to be treated sequentially.
Section 4.3 limitations:
What about the limitations concerning the time correlation to identify faults over time, is it always succeeding or are there some cases for which it fails?Although the 3 applications proposed are demonstrating several contexts in which the toolbox can be applied, they are all about rifts and extensional systems. What about convergent and strike-slip systems? Do they represent limitations? If yes which ones?
Minor comments:
L82: Mattéo et al. (2021). The parenthesis should be in front of the M as the citation is not embedded in the sentence => (Mattéo et al. 2021).
L96: parenthesis missing befor the 2)
L249: there is a parenthesis that should be remove after the “e.g.”
References :
L85: T et al., 2025. It seems that there is a problem with this citation both in text and in the reference section (L615).
L260: References are required for the wet quartzite, wet anorthite and dry olivine flow laws. (Not Neuharth et al., 2022; the actual papers that published the parameters used in the model). L261: “Beneath the lithosphere lies a weak asthenospheric layer composed of wet olivine (Neuharth et al., 2022)”
Neuharth et al., 2022 is not the publication related to the wet olivine flow law. Here, Hirth & Kohlstedt, 2003 should be cited as this is the paper describing the wet olivine flow law used and cited in Neuharth et al., 2022.Below are some references missing from the reference list (the one I noted, may not be exhaustive):
Panza et al., 2024
Purinton and Bookhagen, 2021
Baker and Wohlenberg, 1971
Canny, 1986
Guo and Hall, 1992
Shmela et al., 2021
Gassmöller et al., 2018
Glerum et al., 2018
Braun and Willett, 2013
Yuan et al., 2019
Saha et al., 2016
Strak and Schellart, 2016
Strak et al., 2011
Willingshofer and Sokoutis, 2009
Philippon et al., 2015
Schlagenhauf et al., 2008
Lathrop et al., 2022
Henza et al., 2010
Jourdon et al., 2025Software/Code related remarks:
The following comments do not need a particular attention from the authors to publish the article but they could take them into account for future development to enhance code visibility, availability, reusability and collaboration.While cloning the repository, I realized that the repo is 174 MB large, which I agree is not huge, but the code is only made of 6 Python files and a few jupyter notebooks. This size likely comes from the storage of the data used in the jupyter notebooks as tutorials/demonstration. In general, storing large data files in code repository is not recommended as it will impact users when they download, install or update the code from the repository. A better way would be to store those data in another storage place offering long time storage, and link to it to get access to the data.
In the README it is mentioned that the installation should be done using conda. I understand that it offers a simple way to install your package but it forces the users to use conda. Fortunately, your package can actually be installed without the use of conda, and except for earthpy which can also be installed without conda, there are not that much benefits of strictly restraining the installation to the use of conda. I don’t say that you should get rid of it, but I rather suggest to provide more options to the users. You could consider to add a setup.py to install using pip for instance.Citation: https://doi.org/10.5194/egusphere-2025-3989-RC1
Model code and software
Fatbox, the fault analysis toolbox Pauline Gayrin et al. https://doi.org/10.5281/zenodo.15716079
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
85 | 5 | 2 | 92 | 10 | 14 |
- HTML: 85
- PDF: 5
- XML: 2
- Total: 92
- BibTeX: 10
- EndNote: 14
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1