the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Interpreting the cause of bound earthquakes at underground injection experiments
Abstract. Constraining the maximum possible magnitude (MMAX) of an induced earthquake sequence is a challenging process with important implications for managing risks. CAP-tests are a suite of statistical tests that can infer, quantify, and select best-fitting MMAX models via an earthquake catalogue’s magnitudes. We use CAP-tests to discern between bound/unbound earthquake sequences at underground laboratories, where high-resolution and near-field geophysical observations are abundant. There, we find clear evidence for bound sequences, where magnitude growth was restricted during stimulation. Furthermore, bound sequences tend to be associated with stimulations that occurred within intact rock. On the other hand, unbound sequences tended to be associated with stimulations where hydraulic fractures interacted with relatively large pre-existing faults/fractures. We further examine bound sequences by fitting magnitude growth to a generalized family of MMAX functions. This process appears to be able to aggregate bound sequences into categories consistent with theoretical considerations (e.g., tectonic, tensile-crack, or shear-crack). These results provide a basis for validating and interpreting bound sequences in controlled experiments, which is important for extrapolating to larger-scale observations. Overall, CAP-tests appear to be a promising avenue for constraining MMAX from earthquake catalogue data.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(7557 KB)
-
Supplement
(1458 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(7557 KB) - Metadata XML
-
Supplement
(1458 KB) - BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2025-5806', Anonymous Referee #1, 12 Mar 2026
- AC1: 'Reply on RC1', Ryan Schultz, 24 Mar 2026
-
RC2: 'Comment on egusphere-2025-5806', Peter Niemz, 19 Mar 2026
In this manuscript, the authors extend their study on the statistical characterization of induced seismic sequences with respect to the expected maximum magnitude from field-scale observations (Schultz et al, 2025) to underground laboratory experiments. A reliable estimation of the maximum expected magnitude during EGS and other subsurface operations, which goes beyond a static traffic light system, is paramount operations operations to be conducted safely.
Mine-scale experiments have the advantage that their environment is much more controlled and findings from field-scale (Schultz et al, 2025) can be tested with higher confidence due to improved monitoring setups and other comprehensive datasets. That said, the extension of their research goes beyond the mere application of the existing method to new test cases.
The authors concisely describe their statistical tests and the workflow, allowing it to be applied to other sequences by other researchers. I have to assume that the github link with data and code will be available upon publication of the article, as they did for the previous paper introducing the CAP test.
The work is clearly structured from simple (HF-dominated) to complex (mixed-mode) cases.
While the authors clearly draw the difference between bound and unbound in regard to limited fracture length and the reactivation of preexisting faults, the authors also discuss when this simple binary reasoning might be wrong. Their generalization of the V^n model and the Vn-EW test provides a novel approach to the study of underlying processes in seismic sequences beyond the analysis of induced cases alone.
General minor remarks/questions/suggestions
- I appreciate that the authors state that they are careful in estimating Mc and set it in a rather conservative way, even when it reduces the number of events to the point of hindering the unambiguous classification of bound sequences. That said, I suggest adding Mc in Table 1-3 for completeness and for the reproducibility of the subset of events used in each analysis.
- 246f: “Testing on both synthetic and real datasets suggests that the MLE-test is sensitive to quantifying MMAX within a hundredth of a magnitude unit when MLRG-MMAX discrepancies are -0.5 M or better.” Are magnitude uncertainties considered in these tests? For synthetic datasets, the precision might be within a hundredth of a magnitude, but for real cases, inherent magnitude uncertainties will not allow for such precision.
- Despite being discussed in the methods section, no sequence is assigned to the ‘unresolved’ class in the tables. Did all sequences pass the resolution checks?
- The generalization of Vn models and the integration into the EW analysis is a very interesting approach to further characterize bound sequences. However, I see some discrepancies between the Mmax model choice in the result tables and the Vn-EW analysis that might be worth discussing.
- Aspo HF3 and SURF N164 fall into the unknown (X-like) category in the vn analysis. Obviously, these two cases were in a different model class before, since X-like is not part of the models in the standard EW test. While Aspo HF3 ended up in the next closest category, SURF N164 was found to be in between McGarr and Tectonic (the models indicated in the result table). Both make perfect sense.
- However, PNRC 2w (Schultz et al 2025) is assigned to McGarr or Galis in the result table and should therefore fall into one of those categories in the Vn-EW analysis, yet it peaked in tectonic. Is this a different subset of data? How do the authors explain the difference?
I am looking forward to seeing Vn-EW tests for other seismic sequences in future research, as these may shed light onto the processes driving them.
Some remarks on HF6 of the Aspo dataset:
The damping coefficient used in the magnitude calculation was estimated as a uniform value across the entire rock volume. As correctly stated by the authors, most events were in fact induced during stages HF1 and HF2. The larger number of events in this domain could have biased the uniform estimation of the damping coefficient. As a consequence, the deviating rock type around HF6 might not be well represented by the uniform coefficient, which may lead to biased magnitudes that influence the statistical analysis.
Unlike all other stages in which the volumes of substages were ramped up over time or kept stable, the first substage of injection in HF6 was the substage with the largest injected volume. The volume was reduced afterwards, resulting in an early occurrence of Mmax even during the injection. For all other stages the largest magnitude was induced after the shut-in.
Additionally, the last three substages of HF6 were pumped on another day. This might have allowed some relaxation and flow back over hours after the largest substage, HF6-1, altering the response of the reservoir. Unfortunately, the flowback for HF4-HF6 was not rigorously monitored.
Additional minor points:
84f: The abbreviations of the statistical tests should be introduced, here, when first mentioned.
107: “This is significant”
330: “catalogues between 10^2-10^3 events (above Mc)”
374: “HF1, HF2, HF4 & HF6”
727f: Repeating that the “analogous stages” were not aiming for new fracture creation would help to emphasize the difference between the experiments.
798: What are sd and k in the time-varying fracture radii equations for shear and tensile cracks?
822: “stages 1-3”
824: “for cluster 1 (stages 1-2) and cluster 2 (stages 3–6) as defined in Schultz et al, 2025”
880: “or , the extent of asperities”
866ff: I suggest to add the categories in front of the equations for clarity
V^n: …
Tensile: …
Shear: …
868ff: c, sd and k not formally introduced, see also line 798
881: “between seismic asperities”
Figure 1: I suggest adding the resulting classification at the end of the arrows (bound, unbound, not resolved).
Figure2 : Increasing the unbound/bound labels on top of the figure would help to quickly spot which is which.
Figure 4:
- The labels got mixed up. HF3 instead of HF1 for the green cluster; Last label in Fig. 4a should be HF6. There was no microseismic activity during HF5.
- The injection rate is mostly hidden behind the seismic events, see also Fig. 7 and 10. Since the stages are analyzed individually in this study, the time period between the stages is not important, so each stage could get its own subplot to improve the visibility.
- The map is distorted. Unequal scale of easting and northing (possibly a layout choice?), see Fig. 7
Figure 7: The discussion regarding GTS HS4 and GTS HF2 is based on geometric considerations of the microseismicity. I suggest including subfigures that zoom into these two stages due to their complexity.
Figure 10: Flow rate axis label missing
Figure 11: Labels a and b should be added in the caption
Citation: https://doi.org/10.5194/egusphere-2025-5806-RC2 - AC2: 'Reply on RC2', Ryan Schultz, 24 Mar 2026
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2025-5806', Anonymous Referee #1, 12 Mar 2026
Dear Authors,
Manuscript entitled „Interpreting the cause of bound earthquakes at underground injection experiments” by Ryan Schultz, Linus Villiger, Valentin Gischig, and Stefan Wiemer is well designed with clearly presented scope and reasoning. It deals with important topic of the determination of maximum magnitude for injection experiments, which pose important insight into physics of the earthquakes induced by fluid injection and further any other seismicity related to fluid-rock interactions. Methods are clearly described as well as data used for the estimations. Reasoning is documented well with the former works of various authors. I have one major critical comment related with methodology and some minor comments related with the literature review and technical.
Major issue, which may need some explanation is sensitivity of the CAP test to magnitude range. There is 2-3 magnitude unit span between the smallest and the largest events and even smaller when we consider completeness. I would like to see any discussion about the magnitude range on the CAP tests efficiency in cases used here. Authors only discuss the role of the events number in datasets suitable for the tests.
In the Introduction authors refer to different maximum magnitude estimation methods, however not mentioning any Bayesian methods (Kijko, 2025) or methods dealing with small catalogs or incomplete catalogs (eg. Kijko et al., 2021, Vermuelen and Kijko (2017)). I think, that taking into account above works may be informative for reader interested in dealing with seismic catalogs with narrow magnitude range and/or small event number.
Minor technical remarks:
Line 78 and below: Acronyms such as CAP, KS, MLE and EW should be explained as they are introduced.
Line 868: All the symbols from equation should be explained here again. Some are introduced earlier (but not all), and it may be hard to follow for the reader.
References:
Kijko, A., Vermeulen, PJ., Smit, A. (2021) Estimation Techniques for Seismic Recurrence Parameters for Incomplete Catalogues SURVEYS IN GEOPHYSICS Vol.43 Issue 2 pp. 597-617, DOI:10.1007/s10712-021-09672-2
Kijko A., (2025) Bayesian Assessment of the Maximum Possible Earthquake Magnitude mmax. JOURNAL OF THE GEOLOGICAL SOCIETY OF INDIA. Volume 101 Issue 6 Page764-769 DOI: 10.17491/jgsi/2025/174157
Vermuelen, P., Kijko, A. (2017) More statistical tools for maximum possible earthquake magnitude estimation. Acta Geophysica 65(4), pp.579-587. DOI10.1007/s11600-017-0048-3
Citation: https://doi.org/10.5194/egusphere-2025-5806-RC1 - AC1: 'Reply on RC1', Ryan Schultz, 24 Mar 2026
-
RC2: 'Comment on egusphere-2025-5806', Peter Niemz, 19 Mar 2026
In this manuscript, the authors extend their study on the statistical characterization of induced seismic sequences with respect to the expected maximum magnitude from field-scale observations (Schultz et al, 2025) to underground laboratory experiments. A reliable estimation of the maximum expected magnitude during EGS and other subsurface operations, which goes beyond a static traffic light system, is paramount operations operations to be conducted safely.
Mine-scale experiments have the advantage that their environment is much more controlled and findings from field-scale (Schultz et al, 2025) can be tested with higher confidence due to improved monitoring setups and other comprehensive datasets. That said, the extension of their research goes beyond the mere application of the existing method to new test cases.
The authors concisely describe their statistical tests and the workflow, allowing it to be applied to other sequences by other researchers. I have to assume that the github link with data and code will be available upon publication of the article, as they did for the previous paper introducing the CAP test.
The work is clearly structured from simple (HF-dominated) to complex (mixed-mode) cases.
While the authors clearly draw the difference between bound and unbound in regard to limited fracture length and the reactivation of preexisting faults, the authors also discuss when this simple binary reasoning might be wrong. Their generalization of the V^n model and the Vn-EW test provides a novel approach to the study of underlying processes in seismic sequences beyond the analysis of induced cases alone.
General minor remarks/questions/suggestions
- I appreciate that the authors state that they are careful in estimating Mc and set it in a rather conservative way, even when it reduces the number of events to the point of hindering the unambiguous classification of bound sequences. That said, I suggest adding Mc in Table 1-3 for completeness and for the reproducibility of the subset of events used in each analysis.
- 246f: “Testing on both synthetic and real datasets suggests that the MLE-test is sensitive to quantifying MMAX within a hundredth of a magnitude unit when MLRG-MMAX discrepancies are -0.5 M or better.” Are magnitude uncertainties considered in these tests? For synthetic datasets, the precision might be within a hundredth of a magnitude, but for real cases, inherent magnitude uncertainties will not allow for such precision.
- Despite being discussed in the methods section, no sequence is assigned to the ‘unresolved’ class in the tables. Did all sequences pass the resolution checks?
- The generalization of Vn models and the integration into the EW analysis is a very interesting approach to further characterize bound sequences. However, I see some discrepancies between the Mmax model choice in the result tables and the Vn-EW analysis that might be worth discussing.
- Aspo HF3 and SURF N164 fall into the unknown (X-like) category in the vn analysis. Obviously, these two cases were in a different model class before, since X-like is not part of the models in the standard EW test. While Aspo HF3 ended up in the next closest category, SURF N164 was found to be in between McGarr and Tectonic (the models indicated in the result table). Both make perfect sense.
- However, PNRC 2w (Schultz et al 2025) is assigned to McGarr or Galis in the result table and should therefore fall into one of those categories in the Vn-EW analysis, yet it peaked in tectonic. Is this a different subset of data? How do the authors explain the difference?
I am looking forward to seeing Vn-EW tests for other seismic sequences in future research, as these may shed light onto the processes driving them.
Some remarks on HF6 of the Aspo dataset:
The damping coefficient used in the magnitude calculation was estimated as a uniform value across the entire rock volume. As correctly stated by the authors, most events were in fact induced during stages HF1 and HF2. The larger number of events in this domain could have biased the uniform estimation of the damping coefficient. As a consequence, the deviating rock type around HF6 might not be well represented by the uniform coefficient, which may lead to biased magnitudes that influence the statistical analysis.
Unlike all other stages in which the volumes of substages were ramped up over time or kept stable, the first substage of injection in HF6 was the substage with the largest injected volume. The volume was reduced afterwards, resulting in an early occurrence of Mmax even during the injection. For all other stages the largest magnitude was induced after the shut-in.
Additionally, the last three substages of HF6 were pumped on another day. This might have allowed some relaxation and flow back over hours after the largest substage, HF6-1, altering the response of the reservoir. Unfortunately, the flowback for HF4-HF6 was not rigorously monitored.
Additional minor points:
84f: The abbreviations of the statistical tests should be introduced, here, when first mentioned.
107: “This is significant”
330: “catalogues between 10^2-10^3 events (above Mc)”
374: “HF1, HF2, HF4 & HF6”
727f: Repeating that the “analogous stages” were not aiming for new fracture creation would help to emphasize the difference between the experiments.
798: What are sd and k in the time-varying fracture radii equations for shear and tensile cracks?
822: “stages 1-3”
824: “for cluster 1 (stages 1-2) and cluster 2 (stages 3–6) as defined in Schultz et al, 2025”
880: “or , the extent of asperities”
866ff: I suggest to add the categories in front of the equations for clarity
V^n: …
Tensile: …
Shear: …
868ff: c, sd and k not formally introduced, see also line 798
881: “between seismic asperities”
Figure 1: I suggest adding the resulting classification at the end of the arrows (bound, unbound, not resolved).
Figure2 : Increasing the unbound/bound labels on top of the figure would help to quickly spot which is which.
Figure 4:
- The labels got mixed up. HF3 instead of HF1 for the green cluster; Last label in Fig. 4a should be HF6. There was no microseismic activity during HF5.
- The injection rate is mostly hidden behind the seismic events, see also Fig. 7 and 10. Since the stages are analyzed individually in this study, the time period between the stages is not important, so each stage could get its own subplot to improve the visibility.
- The map is distorted. Unequal scale of easting and northing (possibly a layout choice?), see Fig. 7
Figure 7: The discussion regarding GTS HS4 and GTS HF2 is based on geometric considerations of the microseismicity. I suggest including subfigures that zoom into these two stages due to their complexity.
Figure 10: Flow rate axis label missing
Figure 11: Labels a and b should be added in the caption
Citation: https://doi.org/10.5194/egusphere-2025-5806-RC2 - AC2: 'Reply on RC2', Ryan Schultz, 24 Mar 2026
Peer review completion
Journal article(s) based on this preprint
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 303 | 257 | 25 | 585 | 48 | 18 | 16 |
- HTML: 303
- PDF: 257
- XML: 25
- Total: 585
- Supplement: 48
- BibTeX: 18
- EndNote: 16
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Linus Villiger
Valentin Gischig
Stefan Wiemer
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(7557 KB) - Metadata XML
-
Supplement
(1458 KB) - BibTeX
- EndNote
- Final revised paper
Dear Authors,
Manuscript entitled „Interpreting the cause of bound earthquakes at underground injection experiments” by Ryan Schultz, Linus Villiger, Valentin Gischig, and Stefan Wiemer is well designed with clearly presented scope and reasoning. It deals with important topic of the determination of maximum magnitude for injection experiments, which pose important insight into physics of the earthquakes induced by fluid injection and further any other seismicity related to fluid-rock interactions. Methods are clearly described as well as data used for the estimations. Reasoning is documented well with the former works of various authors. I have one major critical comment related with methodology and some minor comments related with the literature review and technical.
Major issue, which may need some explanation is sensitivity of the CAP test to magnitude range. There is 2-3 magnitude unit span between the smallest and the largest events and even smaller when we consider completeness. I would like to see any discussion about the magnitude range on the CAP tests efficiency in cases used here. Authors only discuss the role of the events number in datasets suitable for the tests.
In the Introduction authors refer to different maximum magnitude estimation methods, however not mentioning any Bayesian methods (Kijko, 2025) or methods dealing with small catalogs or incomplete catalogs (eg. Kijko et al., 2021, Vermuelen and Kijko (2017)). I think, that taking into account above works may be informative for reader interested in dealing with seismic catalogs with narrow magnitude range and/or small event number.
Minor technical remarks:
Line 78 and below: Acronyms such as CAP, KS, MLE and EW should be explained as they are introduced.
Line 868: All the symbols from equation should be explained here again. Some are introduced earlier (but not all), and it may be hard to follow for the reader.
References:
Kijko, A., Vermeulen, PJ., Smit, A. (2021) Estimation Techniques for Seismic Recurrence Parameters for Incomplete Catalogues SURVEYS IN GEOPHYSICS Vol.43 Issue 2 pp. 597-617, DOI:10.1007/s10712-021-09672-2
Kijko A., (2025) Bayesian Assessment of the Maximum Possible Earthquake Magnitude mmax. JOURNAL OF THE GEOLOGICAL SOCIETY OF INDIA. Volume 101 Issue 6 Page764-769 DOI: 10.17491/jgsi/2025/174157
Vermuelen, P., Kijko, A. (2017) More statistical tools for maximum possible earthquake magnitude estimation. Acta Geophysica 65(4), pp.579-587. DOI10.1007/s11600-017-0048-3