Brief communication: Threshold not probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves

Marra, Francesco; Dallan, Eleonora; Borga, Marco; Greco, Roberto; Bogaard, Thom

doi:10.5194/egusphere-2025-3378

Preprints

https://doi.org/10.5194/egusphere-2025-3378

Preprints

17 Jul 2025

| 17 Jul 2025

Brief communication: Threshold not probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves

Francesco Marra, Eleonora Dallan, Marco Borga, Roberto Greco, and Thom Bogaard

Abstract. Intensity-duration (ID) thresholds are used to identify rainfall conditions likely to initiate landslides. They consider the average rain intensity observed over the entire length (called duration) of user-defined events. Intensity-duration-frequency (IDF) curves assign a probability to the intensity of precipitation observed over fixed-length temporal windows (also called durations). As the term duration refers to different concepts, ID thresholds and IDF curves cannot be compared directly, and should better not be plotted in one figure, and IDF curves should not be used to quantify the exceedance probability of ID thresholds.

Received: 14 Jul 2025 – Discussion started: 17 Jul 2025

Competing interests: At least one of the (co-)authors is a member of the editorial board of Natural Hazards and Earth System Sciences.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 869 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (869 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

22 Dec 2025

Brief communication: Threshold and probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves

Francesco Marra, Eleonora Dallan, Marco Borga, Roberto Greco, and Thom Bogaard

Nat. Hazards Earth Syst. Sci., 25, 5055–5061, https://doi.org/10.5194/nhess-25-5055-2025,https://doi.org/10.5194/nhess-25-5055-2025, 2025

Short summary

Francesco Marra, Eleonora Dallan, Marco Borga, Roberto Greco, and Thom Bogaard

Interactive discussion

Status: closed

RC1:
'Reviewer comment on egusphere-2025-3378', Anonymous Referee #1, 23 Jul 2025

This brief communication is very… brief! I mean, in the positive sense of the word. Indeed, it is a clear, concise manuscript that is perfectly written in fluent English - something very rare for a reviewer to find. I thank the authors for that! The paper gets straight to the point: landslide-triggering intensity-duration thresholds and precipitation intensity-duration-frequency curves cannot be confounded, compared, or plotted together. Neither one can be used to quantify the return time of the other.
Frankly, having worked on rainfall analysis and landslide prediction for years, the idea of mixing/comparing ID thresholds and IDF curves is something that never came to my mind. In the few cases I have seen in the extensive literature on these topics, it has always seemed very strange, not to say a downright methodological error. So, I can say that I certainly agree with the authors of this paper, although I do not think the article addresses a relevant scientific and/or technical question. I simply think that mixing ID thresholds and IDF curves is a misconception that does not even require discussion.
The authors list the differences between ID thresholds and IDF curves, focusing on the different durations (D and W) considered by the two tools, and then analysing the differences in terms of return time referring to these durations. In my opinion, they forgot the main and most important difference. That is: since their definition from pioneering works (Nel Caine and also previous pioneers), ID thresholds have been defined considering ID pairs that are somehow - arbitrarily or not, subjectively or not - linked to the initiation or re-activation of one or more landslides. On the other hand, IDF curves are defined considering IW (using the same terminology as the authors) pairs that are not linked to landslide/debris flow occurrence, referring only to rainfall itself. Indeed, the authors write “IDF are obtained by collecting the highest rainfall intensities observed any year over the time windows of interest” (lines 45-46). Therefore, the two tools summarise or describe different variables (the ID pairs by which the thresholds are defined are different by definition from the IW pairs with pre-fixed durations of the IDF curves, having different characteristics consequently) and different processes (landslide or debris flow initiation and rainfall severity). This is, in my opinion, the main reason why the two tools must not be compared or mixed. I wouldn't have added anything else to this discussion
However, the authors added more to the discussion, deserving attention. I list below some other comments on this paper.
First, I don’t understand the first part of the title “Threshold not probability”. Actually, thresholds can be probabilistic. As a matter of fact, the Bayesian thresholds mentioned by the authors are probabilistic. Moreover, the frequentist thresholds also mentioned by the authors allow defining probabilistic diagrams to be used for early warning purposes. Therefore, I would remove this part of the title, which works only for deterministic, binary thresholds.
In several parts of the text, the authors write that quantifying the return period of a given intensity used to define ID thresholds using probabilities estimated from the IW space is erroneous and causes an underestimation of the severity of the triggering rainfall. I agree with the authors, totally. However, I’d suggest mentioning some works in which this erroneous approach was adopted, also because these are cited again in the last sentence of the paper (“Some results in the literature may thus be quantitatively inexact”). Moreover, I would add that the return period of a given ID thresholds should not be calculated at all. Indeed, rather than adopting dichotomous approaches (above/below threshold), using statistical and probabilistic approaches, as the two mentioned above, allows the probabilistic characterisation of the thresholds without introducing (erroneously) the concept of return time, which is also highly questionable for a variable not easily measurable as landslide or debris flow occurrence/triggering. In addition, as the authors certainly know, the concept of return time and how it changes in relation to non-stationarity is a topic of discussion in the scientific community.
Moving to sections 2 and 3, the differences between ID thresholds and IDF curves are listed, focusing in particular on the different ways to define the duration of the ID/IW pairs.
According to the authors’ view, the durations D are user- (or arbitrary-) defined while the durations W are not. But, actually, W are also user- (or arbitrary-) defined using running windows of x minutes or hours: 5, 10, … 45 minutes or 1, 2, … 48 hours were also defined by a user. Moreover, the authors didn’t mention that IDF curves can be defined using the partial duration series approach as well, so introducing another point of discussion.
In section 2 (lines 29-32) the authors write “rainfall records are often not available at hourly resolutions nor in close range of the landslide (Marra et al., 2016; Marra, 2019), which makes the events separation dependent also on these aspects.”. Actually, this issue affects the definitions of W too. Indeed, if only daily measurements are available in a given area, sub-daily values of W (e.g. the classical 1, 3, 6, 12, 24 hours) can’t be defined, and the IDF curves cannot be drawn for sub-daily durations.
In section 4 (lines 62-63), the authors write “In a univariate framework, the return period T* of a rainfall event can reasonably be defined as the maximum among the return periods T_W associated with all possible temporal scales”. I think that some examples should be provided to support this statement.
Moving to section 5, I have some comments regarding the dataset used. First, it should be noted (and somewhere acknowledged by the authors) that the dataset is quite dated, having been collected over ten years ago. Second, spatial and temporal information of the debris flow records is missing. In particular, authors should specify whether the time of occurrence is known for the debris flows included in the dataset used. This is extremely important information for a dataset to be used for the definition of rainfall thresholds. Moreover, it is relevant for another issue that I write further on in my comment. Third, it is not described how the triggering precipitation events used to draw the thresholds were defined. This is also very relevant, given the comparison with IDF curves done in the paper.
Further on in section 5, the authors describe the procedure used to calculate W* (lines 88-92). It should be acknowledged that the outcomes of this procedure are not related to debris flow triggering. Indeed, the fact that they have the highest return time among all IW pairs does not mean that they triggered debris flow. It would be useful to know when these IW* pairs occurred within the whole event duration, in order to establish whether they are relevant to the triggering of debris flows or not. If the IW* pairs occurred many hours (or days) before the occurrence of the debris flows, it cannot be said that they were certainly relevant to the initiation; at least, not more important than the entire event. This is the reason why knowing the exact time of occurrence of the debris flows is essential to prove that “what is really important for triggering are the rain intensities over time scales that can be much shorter than the total length of the identified rainfall event in combination with the hydrological antecedent conditions”. In my opinion, selecting IW* pairs using the maximum return time as the only constraint is not sufficient to prove this hypothesis, and adds subjectivity in the process.
Then (lines 98-99), the authors write that “IW* pairs are associated with temporal scales W* that are always smaller than the duration D of ID pairs. In addition, by design, the corresponding intensities are systematically higher”. This is tautological and led to what is written in lines 109-111 (i.e., the underestimation of the return times of the whole events compared to the IW* pairs). Again, having a lower return time does not imply that an ID pair is less severe in terms of landslide/debris flow triggering. This is another point to be added in the conceptual difference between ID thresholds and IDF curves.
Moreover, the authors assumed that ID thresholds are always defined considering D as the whole duration of the rainfall events. This is not always true. There are several examples in the literature in which sub-events are distinguished (automatically or not) within the entire rainfall events and used to define rainfall thresholds. This can be considered a solution to the issues about durations being too long. I’d suggest mentioning it in the discussion.
Before moving to the conclusions, two comments on Figs. 2 and 3. In Fig. 2, the (a) and (b) labels are missing. Fig. 3 and its description are not very clear; a better description a more discussion are needed.
Going to the conclusions of the work, I totally agree that the calculation of return times of triggering conditions should be avoided, for several reasons including the ones described by the authors. However, the main motivation should be that it’s better to use statistical/probabilistic approaches to define rainfall thresholds rather than calculating return times of the triggering conditions. Moreover, the underestimation of the return periods should be better evaluated considering the time of occurrence of the IW pairs and landslides/debris flows.
Overall, I think that the main message of the work is clear and shareable. However, I believe that the conclusions would need results based on an accurate dataset and improved methodology. In my opinion, more temporal details on the dataset are needed, in order to allow the most important methodological improvement needed in the work: that is, find the time of occurrence of the IW* pairs and their temporal distance from the debris flow occurrences. Only in this way will the conclusions be adequately justified by the data and results.
So, my suggestion is that the work needs major revisions before being reconsidered for publication. The revised version of the paper should include an analysis of the temporal instants of the IW* pairs, so as to say with certainty that they can be considered the cause for debris-flow-triggering. This may be done using information from the proposed dataset (if any) or using other datasets. Moreover, I’d kindly suggest taking into consideration all my comments regarding theoretical and methodological aspects of the work.

Citation: https://doi.org/10.5194/egusphere-2025-3378-RC1
- AC1: 'Reply on RC1', Francesco Marra, 22 Sep 2025
  
  Please see the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3378-AC1
RC2:
'Comment on egusphere-2025-3378', Anonymous Referee #2, 13 Aug 2025
Thank you for the opportunity to review Marra et al., “Brief communication: Threshold not probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves.” (egusphere-2025-3378). In this contribution, the authors explore the conceptual difference between rainfall intensity-duration (ID) thresholds for landslide initiation, which are conventionally fit to ID pairs that consider average intensity over entire landslide-triggering rainfall events with the intention of identifying conditions under which landslides are more likely, and intensity-duration-frequency (IDF) curves, which are fit to ID pairs that consider annual maximum average intensities for windows of defined duration with the intention of estimating annual exceedance probabilities. The authors argue that, because the definition of duration is different, these two curves are not comparable and IDF curves should not be used to estimate the exceedance probability of landslide-triggering rainfall. They use an example dataset of debris flows from the eastern Italian Alps to compare the implications of using the conventional approach based on the entire event duration to define I-D thresholds and an alternative approach that selects the duration with the maximum return period during an event. They show that return periods are much higher for the window with the maximum return period during the event than for the whole event. They also show that the slope of an ID threshold that uses the alternative approach better matches the slope of the regional IDF curves.
Overall, this brief communication is well-written, thought-provoking, and has caused me to reconsider some results of my own research. It points out some important issues with ID thresholds that will be instructive for landslide researchers. In my view, the key contributions are (1) the clear explanation of how ID thresholds and IDF curves differ conceptually, (2) the insight that the return period of the average intensity over an entire event is the lowest possible return period and that much higher return periods may exist for shorter periods within an event, and (3) the recognition that if landslide triggering rainfall events are sampled with duration windows akin to the blocks used to determine IDF curves, the slope of ID threshold matches the regional IDF curves, at least for the case study presented. I believe that points (2) and (3) could be further emphasized in the text and should included in the abstract.
Because this piece is likely to serve as a primer on this topic for future researchers, there are some arguments that need a more nuanced explanation and it must be made clear which points are the author’s opinions and which are supported by the evidence presented. Additional references are needed throughout. In particular:
The authors make the arguments that “IDF curves should not be used to quantify the exceedance probability of ID thresholds” (Lines 5 – 6) and “it is therefore erroneous to quantify the return period Td of a given intensity I in the ID space using probabilities estimated from the IW space of IDF curves.” (Lines 70 – 71) From my perspective, it is not wrong, if one has made the conventional choice that W=D, to look up what the exceedance probability of that ID pair is. The key issue lies in making the choice that W=D in the first place, as this choice obscures shorter periods of high intensities that may have much lower exceedance probabilities, as shown in the case study. This distinction needs to be made very clearly. The “should” in the first statement and the “erroneous” in the second are based on the opinion that it would be better to use W* to define the exceedance probability of the event than choosing W=D. While I tend to agree, this short paper does not present evidence that W* is a better predictor of triggering rainfall than W=D, so it needs to be clear that this is the authors’ opinion.

The analysis shows that an ID threshold fit to IW* pairs better matches the slope of the regional IDF curves than a conventional threshold and the authors argue that this solves “the apparent difference in the power-law scaling of ID thresholds and IDF curves discussed by Bogaard and Greco (2018).” This is an interesting result, my interpretation of which is that when time series of debris flow triggering rain are sampled with W*, the method is similar enough to using block maxima that the distribution of extreme rainfall is similarly represented. That would suggest that the difference between ID and IDF slopes can be attributed to methodological differences in how rainfall time series are sampled rather than any physical processes. If the authors agree that this is the case, I recommend making this point explicitly to avoid any further confusion. But then I have to wonder – what about filling-storing-draining?

The authors note that corresponding intensities for W* are systematically higher than W=D, which they argue “Implies that what is really important for triggering are rain intensities over time scales that can be much shorter than the total length of the identified rainfall event in combination with the hydrological antecedent conditions” (Lines 99 – 102). I do not understand how the first point implies the second. There is a logical gap here that needs to be addressed.

I have some additional suggestions that I believe could make the manuscript more instructive, particularly for readers who are less familiar with ID, IDF, or both:
In Figure 1, I suggest labeling the IDF scaling lines with return periods to make it more clear what these refer to. I also suggest adding a panel to this figure that shows a time series of a debris flow triggering event with windows that show W* and W=D and the average intensities and their return periods over each of these windows. This will help readers to better understand the difference between the ID pairs and IW* pairs.

As an outlook in the conclusions, the authors may want to consider mentioning the variety of alternative approaches to determining thresholds or estimating continuous probabilities that are better able to capture intense periods in landslide triggering time series than averaging over the entire event. For example, both (Staley et al., 2017) and (Patton et al., 2023) compared models trained with accumulations over different windows to select a model that best separated triggering from non-triggering events for post-fire debris flows in the western United States and shallow landslides in Alaska. The (Moreno et al., 2025) study that was already cited is a nice example of how we can move away from the need to bin time series entirely.

Line by line comments:
Line 9 – suggest citing (Guzzetti et al., 2020)
Line 58 – this statement needs a reference
Line 63 – this statement needs a reference and possibly more context. Is this choice a convention in meteorology or is this an argument that the authors are making here?
Line 72 – Please add ~one sentence clarifying how this leads to false alarms.
Line 74 – this statement needs a reference
Line 81 – How did you select these 12 storms as opposed to considering all debris flow triggering storms? Please clarify.
Line 87 – Please add one sentence detailing how you defined the events (e.g. length of dry period). As you noted earlier, the derived ID pairs are sensitive to these choices.
Line 94, Figure 1 – Please add an estimate of statistical uncertainty to these thresholds.
Line 127 – surely, Moreno et al., 2025 aren’t the first to point this out. Earlier reference?
Line 129 - (Iida, 1999) also noted this
References
Guzzetti, F., Gariano, S. L., Peruccacci, S., Brunetti, M. T., Marchesini, I., Rossi, M., and Melillo, M.: Geographical landslide early warning systems, Earth-Science Reviews, 200, 102973, https://doi.org/10.1016/j.earscirev.2019.102973, 2020.
Iida, T.: A stochastic hydro-geomorphological model for shallow landsliding due to rainstorm, CATENA, 34, 293–313, https://doi.org/10.1016/S0341-8162(98)00093-9, 1999.
Moreno, M., Lombardo, L., Steger, S., De Vugt, L., Zieher, T., Crespi, A., Marra, F., Van Westen, C., and Opitz, T.: Functional Regression for Space‐Time Prediction of Precipitation‐Induced Shallow Landslides in South Tyrol, Italy, JGR Earth Surface, 130, e2024JF008219, https://doi.org/10.1029/2024JF008219, 2025.
Patton, A. I., Luna, L. V., Roering, J. J., Jacobs, A., Korup, O., and Mirus, B. B.: Landslide initiation thresholds in data-sparse regions: application to landslide early warning criteria in Sitka, Alaska, USA, Natural Hazards and Earth System Sciences, 23, 3261–3284, https://doi.org/10.5194/nhess-23-3261-2023, 2023.
Staley, D. M., Negri, J. A., Kean, J. W., Laber, J. L., Tillery, A. C., and Youberg, A. M.: Prediction of spatially explicit rainfall intensity–duration thresholds for post-fire debris-flow generation in the western United States, Geomorphology, 278, 149–162, https://doi.org/10.1016/j.geomorph.2016.10.019, 2017.
Citation: https://doi.org/10.5194/egusphere-2025-3378-RC2
- AC2: 'Reply on RC2', Francesco Marra, 22 Sep 2025
  
  Please see the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3378-AC2

Interactive discussion

Status: closed

RC1:
'Reviewer comment on egusphere-2025-3378', Anonymous Referee #1, 23 Jul 2025

This brief communication is very… brief! I mean, in the positive sense of the word. Indeed, it is a clear, concise manuscript that is perfectly written in fluent English - something very rare for a reviewer to find. I thank the authors for that! The paper gets straight to the point: landslide-triggering intensity-duration thresholds and precipitation intensity-duration-frequency curves cannot be confounded, compared, or plotted together. Neither one can be used to quantify the return time of the other.
Frankly, having worked on rainfall analysis and landslide prediction for years, the idea of mixing/comparing ID thresholds and IDF curves is something that never came to my mind. In the few cases I have seen in the extensive literature on these topics, it has always seemed very strange, not to say a downright methodological error. So, I can say that I certainly agree with the authors of this paper, although I do not think the article addresses a relevant scientific and/or technical question. I simply think that mixing ID thresholds and IDF curves is a misconception that does not even require discussion.
The authors list the differences between ID thresholds and IDF curves, focusing on the different durations (D and W) considered by the two tools, and then analysing the differences in terms of return time referring to these durations. In my opinion, they forgot the main and most important difference. That is: since their definition from pioneering works (Nel Caine and also previous pioneers), ID thresholds have been defined considering ID pairs that are somehow - arbitrarily or not, subjectively or not - linked to the initiation or re-activation of one or more landslides. On the other hand, IDF curves are defined considering IW (using the same terminology as the authors) pairs that are not linked to landslide/debris flow occurrence, referring only to rainfall itself. Indeed, the authors write “IDF are obtained by collecting the highest rainfall intensities observed any year over the time windows of interest” (lines 45-46). Therefore, the two tools summarise or describe different variables (the ID pairs by which the thresholds are defined are different by definition from the IW pairs with pre-fixed durations of the IDF curves, having different characteristics consequently) and different processes (landslide or debris flow initiation and rainfall severity). This is, in my opinion, the main reason why the two tools must not be compared or mixed. I wouldn't have added anything else to this discussion
However, the authors added more to the discussion, deserving attention. I list below some other comments on this paper.
First, I don’t understand the first part of the title “Threshold not probability”. Actually, thresholds can be probabilistic. As a matter of fact, the Bayesian thresholds mentioned by the authors are probabilistic. Moreover, the frequentist thresholds also mentioned by the authors allow defining probabilistic diagrams to be used for early warning purposes. Therefore, I would remove this part of the title, which works only for deterministic, binary thresholds.
In several parts of the text, the authors write that quantifying the return period of a given intensity used to define ID thresholds using probabilities estimated from the IW space is erroneous and causes an underestimation of the severity of the triggering rainfall. I agree with the authors, totally. However, I’d suggest mentioning some works in which this erroneous approach was adopted, also because these are cited again in the last sentence of the paper (“Some results in the literature may thus be quantitatively inexact”). Moreover, I would add that the return period of a given ID thresholds should not be calculated at all. Indeed, rather than adopting dichotomous approaches (above/below threshold), using statistical and probabilistic approaches, as the two mentioned above, allows the probabilistic characterisation of the thresholds without introducing (erroneously) the concept of return time, which is also highly questionable for a variable not easily measurable as landslide or debris flow occurrence/triggering. In addition, as the authors certainly know, the concept of return time and how it changes in relation to non-stationarity is a topic of discussion in the scientific community.
Moving to sections 2 and 3, the differences between ID thresholds and IDF curves are listed, focusing in particular on the different ways to define the duration of the ID/IW pairs.
According to the authors’ view, the durations D are user- (or arbitrary-) defined while the durations W are not. But, actually, W are also user- (or arbitrary-) defined using running windows of x minutes or hours: 5, 10, … 45 minutes or 1, 2, … 48 hours were also defined by a user. Moreover, the authors didn’t mention that IDF curves can be defined using the partial duration series approach as well, so introducing another point of discussion.
In section 2 (lines 29-32) the authors write “rainfall records are often not available at hourly resolutions nor in close range of the landslide (Marra et al., 2016; Marra, 2019), which makes the events separation dependent also on these aspects.”. Actually, this issue affects the definitions of W too. Indeed, if only daily measurements are available in a given area, sub-daily values of W (e.g. the classical 1, 3, 6, 12, 24 hours) can’t be defined, and the IDF curves cannot be drawn for sub-daily durations.
In section 4 (lines 62-63), the authors write “In a univariate framework, the return period T* of a rainfall event can reasonably be defined as the maximum among the return periods T_W associated with all possible temporal scales”. I think that some examples should be provided to support this statement.
Moving to section 5, I have some comments regarding the dataset used. First, it should be noted (and somewhere acknowledged by the authors) that the dataset is quite dated, having been collected over ten years ago. Second, spatial and temporal information of the debris flow records is missing. In particular, authors should specify whether the time of occurrence is known for the debris flows included in the dataset used. This is extremely important information for a dataset to be used for the definition of rainfall thresholds. Moreover, it is relevant for another issue that I write further on in my comment. Third, it is not described how the triggering precipitation events used to draw the thresholds were defined. This is also very relevant, given the comparison with IDF curves done in the paper.
Further on in section 5, the authors describe the procedure used to calculate W* (lines 88-92). It should be acknowledged that the outcomes of this procedure are not related to debris flow triggering. Indeed, the fact that they have the highest return time among all IW pairs does not mean that they triggered debris flow. It would be useful to know when these IW* pairs occurred within the whole event duration, in order to establish whether they are relevant to the triggering of debris flows or not. If the IW* pairs occurred many hours (or days) before the occurrence of the debris flows, it cannot be said that they were certainly relevant to the initiation; at least, not more important than the entire event. This is the reason why knowing the exact time of occurrence of the debris flows is essential to prove that “what is really important for triggering are the rain intensities over time scales that can be much shorter than the total length of the identified rainfall event in combination with the hydrological antecedent conditions”. In my opinion, selecting IW* pairs using the maximum return time as the only constraint is not sufficient to prove this hypothesis, and adds subjectivity in the process.
Then (lines 98-99), the authors write that “IW* pairs are associated with temporal scales W* that are always smaller than the duration D of ID pairs. In addition, by design, the corresponding intensities are systematically higher”. This is tautological and led to what is written in lines 109-111 (i.e., the underestimation of the return times of the whole events compared to the IW* pairs). Again, having a lower return time does not imply that an ID pair is less severe in terms of landslide/debris flow triggering. This is another point to be added in the conceptual difference between ID thresholds and IDF curves.
Moreover, the authors assumed that ID thresholds are always defined considering D as the whole duration of the rainfall events. This is not always true. There are several examples in the literature in which sub-events are distinguished (automatically or not) within the entire rainfall events and used to define rainfall thresholds. This can be considered a solution to the issues about durations being too long. I’d suggest mentioning it in the discussion.
Before moving to the conclusions, two comments on Figs. 2 and 3. In Fig. 2, the (a) and (b) labels are missing. Fig. 3 and its description are not very clear; a better description a more discussion are needed.
Going to the conclusions of the work, I totally agree that the calculation of return times of triggering conditions should be avoided, for several reasons including the ones described by the authors. However, the main motivation should be that it’s better to use statistical/probabilistic approaches to define rainfall thresholds rather than calculating return times of the triggering conditions. Moreover, the underestimation of the return periods should be better evaluated considering the time of occurrence of the IW pairs and landslides/debris flows.
Overall, I think that the main message of the work is clear and shareable. However, I believe that the conclusions would need results based on an accurate dataset and improved methodology. In my opinion, more temporal details on the dataset are needed, in order to allow the most important methodological improvement needed in the work: that is, find the time of occurrence of the IW* pairs and their temporal distance from the debris flow occurrences. Only in this way will the conclusions be adequately justified by the data and results.
So, my suggestion is that the work needs major revisions before being reconsidered for publication. The revised version of the paper should include an analysis of the temporal instants of the IW* pairs, so as to say with certainty that they can be considered the cause for debris-flow-triggering. This may be done using information from the proposed dataset (if any) or using other datasets. Moreover, I’d kindly suggest taking into consideration all my comments regarding theoretical and methodological aspects of the work.

Citation: https://doi.org/10.5194/egusphere-2025-3378-RC1
- AC1: 'Reply on RC1', Francesco Marra, 22 Sep 2025
  
  Please see the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3378-AC1
RC2:
'Comment on egusphere-2025-3378', Anonymous Referee #2, 13 Aug 2025
Thank you for the opportunity to review Marra et al., “Brief communication: Threshold not probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves.” (egusphere-2025-3378). In this contribution, the authors explore the conceptual difference between rainfall intensity-duration (ID) thresholds for landslide initiation, which are conventionally fit to ID pairs that consider average intensity over entire landslide-triggering rainfall events with the intention of identifying conditions under which landslides are more likely, and intensity-duration-frequency (IDF) curves, which are fit to ID pairs that consider annual maximum average intensities for windows of defined duration with the intention of estimating annual exceedance probabilities. The authors argue that, because the definition of duration is different, these two curves are not comparable and IDF curves should not be used to estimate the exceedance probability of landslide-triggering rainfall. They use an example dataset of debris flows from the eastern Italian Alps to compare the implications of using the conventional approach based on the entire event duration to define I-D thresholds and an alternative approach that selects the duration with the maximum return period during an event. They show that return periods are much higher for the window with the maximum return period during the event than for the whole event. They also show that the slope of an ID threshold that uses the alternative approach better matches the slope of the regional IDF curves.
Overall, this brief communication is well-written, thought-provoking, and has caused me to reconsider some results of my own research. It points out some important issues with ID thresholds that will be instructive for landslide researchers. In my view, the key contributions are (1) the clear explanation of how ID thresholds and IDF curves differ conceptually, (2) the insight that the return period of the average intensity over an entire event is the lowest possible return period and that much higher return periods may exist for shorter periods within an event, and (3) the recognition that if landslide triggering rainfall events are sampled with duration windows akin to the blocks used to determine IDF curves, the slope of ID threshold matches the regional IDF curves, at least for the case study presented. I believe that points (2) and (3) could be further emphasized in the text and should included in the abstract.
Because this piece is likely to serve as a primer on this topic for future researchers, there are some arguments that need a more nuanced explanation and it must be made clear which points are the author’s opinions and which are supported by the evidence presented. Additional references are needed throughout. In particular:
The authors make the arguments that “IDF curves should not be used to quantify the exceedance probability of ID thresholds” (Lines 5 – 6) and “it is therefore erroneous to quantify the return period Td of a given intensity I in the ID space using probabilities estimated from the IW space of IDF curves.” (Lines 70 – 71) From my perspective, it is not wrong, if one has made the conventional choice that W=D, to look up what the exceedance probability of that ID pair is. The key issue lies in making the choice that W=D in the first place, as this choice obscures shorter periods of high intensities that may have much lower exceedance probabilities, as shown in the case study. This distinction needs to be made very clearly. The “should” in the first statement and the “erroneous” in the second are based on the opinion that it would be better to use W* to define the exceedance probability of the event than choosing W=D. While I tend to agree, this short paper does not present evidence that W* is a better predictor of triggering rainfall than W=D, so it needs to be clear that this is the authors’ opinion.

The analysis shows that an ID threshold fit to IW* pairs better matches the slope of the regional IDF curves than a conventional threshold and the authors argue that this solves “the apparent difference in the power-law scaling of ID thresholds and IDF curves discussed by Bogaard and Greco (2018).” This is an interesting result, my interpretation of which is that when time series of debris flow triggering rain are sampled with W*, the method is similar enough to using block maxima that the distribution of extreme rainfall is similarly represented. That would suggest that the difference between ID and IDF slopes can be attributed to methodological differences in how rainfall time series are sampled rather than any physical processes. If the authors agree that this is the case, I recommend making this point explicitly to avoid any further confusion. But then I have to wonder – what about filling-storing-draining?

The authors note that corresponding intensities for W* are systematically higher than W=D, which they argue “Implies that what is really important for triggering are rain intensities over time scales that can be much shorter than the total length of the identified rainfall event in combination with the hydrological antecedent conditions” (Lines 99 – 102). I do not understand how the first point implies the second. There is a logical gap here that needs to be addressed.

I have some additional suggestions that I believe could make the manuscript more instructive, particularly for readers who are less familiar with ID, IDF, or both:
In Figure 1, I suggest labeling the IDF scaling lines with return periods to make it more clear what these refer to. I also suggest adding a panel to this figure that shows a time series of a debris flow triggering event with windows that show W* and W=D and the average intensities and their return periods over each of these windows. This will help readers to better understand the difference between the ID pairs and IW* pairs.

As an outlook in the conclusions, the authors may want to consider mentioning the variety of alternative approaches to determining thresholds or estimating continuous probabilities that are better able to capture intense periods in landslide triggering time series than averaging over the entire event. For example, both (Staley et al., 2017) and (Patton et al., 2023) compared models trained with accumulations over different windows to select a model that best separated triggering from non-triggering events for post-fire debris flows in the western United States and shallow landslides in Alaska. The (Moreno et al., 2025) study that was already cited is a nice example of how we can move away from the need to bin time series entirely.

Line by line comments:
Line 9 – suggest citing (Guzzetti et al., 2020)
Line 58 – this statement needs a reference
Line 63 – this statement needs a reference and possibly more context. Is this choice a convention in meteorology or is this an argument that the authors are making here?
Line 72 – Please add ~one sentence clarifying how this leads to false alarms.
Line 74 – this statement needs a reference
Line 81 – How did you select these 12 storms as opposed to considering all debris flow triggering storms? Please clarify.
Line 87 – Please add one sentence detailing how you defined the events (e.g. length of dry period). As you noted earlier, the derived ID pairs are sensitive to these choices.
Line 94, Figure 1 – Please add an estimate of statistical uncertainty to these thresholds.
Line 127 – surely, Moreno et al., 2025 aren’t the first to point this out. Earlier reference?
Line 129 - (Iida, 1999) also noted this
References
Guzzetti, F., Gariano, S. L., Peruccacci, S., Brunetti, M. T., Marchesini, I., Rossi, M., and Melillo, M.: Geographical landslide early warning systems, Earth-Science Reviews, 200, 102973, https://doi.org/10.1016/j.earscirev.2019.102973, 2020.
Iida, T.: A stochastic hydro-geomorphological model for shallow landsliding due to rainstorm, CATENA, 34, 293–313, https://doi.org/10.1016/S0341-8162(98)00093-9, 1999.
Moreno, M., Lombardo, L., Steger, S., De Vugt, L., Zieher, T., Crespi, A., Marra, F., Van Westen, C., and Opitz, T.: Functional Regression for Space‐Time Prediction of Precipitation‐Induced Shallow Landslides in South Tyrol, Italy, JGR Earth Surface, 130, e2024JF008219, https://doi.org/10.1029/2024JF008219, 2025.
Patton, A. I., Luna, L. V., Roering, J. J., Jacobs, A., Korup, O., and Mirus, B. B.: Landslide initiation thresholds in data-sparse regions: application to landslide early warning criteria in Sitka, Alaska, USA, Natural Hazards and Earth System Sciences, 23, 3261–3284, https://doi.org/10.5194/nhess-23-3261-2023, 2023.
Staley, D. M., Negri, J. A., Kean, J. W., Laber, J. L., Tillery, A. C., and Youberg, A. M.: Prediction of spatially explicit rainfall intensity–duration thresholds for post-fire debris-flow generation in the western United States, Geomorphology, 278, 149–162, https://doi.org/10.1016/j.geomorph.2016.10.019, 2017.
Citation: https://doi.org/10.5194/egusphere-2025-3378-RC2
- AC2: 'Reply on RC2', Francesco Marra, 22 Sep 2025
  
  Please see the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-3378-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (23 Sep 2025) by Ugur Öztürk

AR by Francesco Marra on behalf of the Authors (23 Sep 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (24 Sep 2025) by Ugur Öztürk

RR by Anonymous Referee #1 (06 Oct 2025)

Suggestions for revision or reasons for rejection

I would like to thank the Authors for responding to my comments and clarifying most of my doubts. The manuscript certainly benefited from the review phase; some issues are now better clarified, and the main motivation of the work is more understandable.
I’ve still only one concern, regarding the way the Authors identified and compared the durations D and W. The authors state that “IW* pairs are associated with temporal scales W* that are always smaller than the duration D of ID pairs”. Now we know that the triggering instants of the debris flows in the used dataset are not known. Thus, it is clear that the ID pairs, as calculated by the authors, could have included hours that were not related to the triggering: if a debris flow occurred in the middle of one day, the ID pair was defined including around 12 hours – and maybe a few mm of rainfall – which should have been discarded. As a consequence, the calculated ID pairs can be longer (and the intensity lower) than they should be. Therefore, the main reason for the fact that IW* were shorter than ID may lay in the dataset used (with only daily temporal information) and in the method used to calculate D. This has also implications in the discussion on the underestimation of the triggering precipitation (figure 2 and related text).
This issue should be better acknowledged by the Authors (e.g., in lines 130-135), and – on the other hand – should suggest that the results of the real-world example cannot be easily generalized.

My question is: is the underestimation of the triggering precipitation due to a real difference between IW* and ID pairs, or it is due to the coarse temporal resolution of the dataset (which affects only the calculation of ID)? Perhaps the Authors could try to quantify the influence of the uncertainty in the debris flows triggering instants on the calculation of the ID pairs, before comparing them with the IW* pairs. A way to address this issue can be found in some published works. A few years ago, Peres et al. (2018) [https://doi.org/10.5194/nhess-18-633-2018] proposed a quantitative analysis of the impacts of uncertain knowledge of landslide initiation instants on the assessment of rainfall thresholds, using a synthetic dataset. The authors found that uncertainties in the landslide triggering instants may lead to underestimation of the thresholds and consequently more false positives. More recently, Mondini et al. (2025) [https://doi.org/10.1016/j.scitotenv.2025.179453] characterized the temporal uncertainty of some records included in a landslide catalog before using them to build a prediction tool for rainfall-induced landslides.

I acknowledge that this would need some additional work (perhaps too much for a brief communication?), but this issue came out only after the first round of review, during which it was clear that the triggering instants of the debris flows were not known. The best option would be to carry out these analyses to remove any doubt. If the Editor considers this to be too much work for a brief communication, I believe it is at least necessary to discuss the uncertainties related to the data in greater detail.

Hide

RR by Haruka Tsunetaka (05 Nov 2025)

Suggestions for revision or reasons for rejection

I have reviewed the manuscript entitled “Brief communication: Threshold and probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves” as an additional reviewer for this round. This paper examines the conceptual differences between the ID thresholds and IDF curves in relation to debris-flow and landslide triggering. In Sections 1–4, the authors clearly and concisely describe the essential issues regarding rainfall data processing for ID thresholds and IDF curves. In Section 5, based on a well-constrained debris-flow dataset, the authors convincingly demonstrate how these processing differences can affect our understanding of rainfall conditions that trigger debris flows. The results are robust and highly meaningful, and will likely stimulate further analyses of rainfall-induced landslide and debris-flow initiation processes. Although the presented framework requires independent IDF curves, it is widely applicable across diverse environmental settings and effectively removes the dependence on local conditions and subjective interpretation. Overall, this paper represents a necessary and timely contribution to our community and could serve as a benchmark study. It is a promising and well-written manuscript that I believe will reach a wide audience. I have a few comments and suggestions that may help improve the clarity and overall flow of the paper. I hope my feedback will assist the authors in further refining this excellent work.
Sincerely,
Haruka Tsunetaka

1. Differences between debris flows and landslides
My first concern relates to the differences between debris flows and landslides in the triggering mechanisms implicitly evaluated by ID thresholds. As described in L23-24, ID thresholds for landslide triggering commonly evaluate whether slope is activated “trigger” and “cause” prepared by rainfall input (Bogaad and Greco, 2018). However, some researchers argue that ID thresholds for debris-flow triggering reflect various processes, such as changes in sediment availability (e.g., Pastorello et al., 2018; Tsunetaka et al., 2021a), whether debris flows reach to the monitoring station (e.g., Bel et al., 2017), and changes in sediment composition (e.g., Guo et al., 2016; Tsunetaka et al., 2021b). These differences, which ID pairs may evaluate different initiation mechanisms between landslides and debris flows, should be well considered through the manuscript.
In my view, these differences relate to only interpretation of real world example (i.e., Section 5). Thus, my recommendation is that, either deleting or moving the related sentence (L23-24) and paragraph (L78-87) to Section 5, or providing a more precise explanation of the above differences. By doing so, the explanations provided in Sections 1–4 would more clearly highlight that they describe generalized issues concerning the ID–IDF relationship, which apply universally, regardless of whether the triggering process regarding debris flows or landslides.

2. Difference in the definition of W* between Sections 4 and 5
In the previous review round, both reviewers raised several concerns regarding the analysis presented in Section 5. In my view, these concerns mainly stemmed from the difference in the definition of W* between Sections 4 (L72: true, unknown triggering interval) and 5 (L105: time interval during which the most severe intensity was observed). In the revised manuscript, the authors have addressed this issue by clearly describing how W* is defined. However, I am still concerned that this difference may cause readers to misunderstand how the authors distinguish between theoretical facts and their interpretations throughout the paper. Indeed, it appears that the authors themselves may still be somewhat uncertain about this distinction (L139–140). It might be clearer to readers if, in Section 5, all results were consistently described in terms of W corresponding to max(Tw), with the subsequent discussion and interpretation developed under the assumption that such W is approximately equivalent to W*.
Some readers may also wonder why the authors did not apply the same framework to a broader dataset that includes landslides or other regions. However, I recognize that the authors have used a well-constrained, high-quality debris-flow dataset, and that extending the same analysis to other phenomena or regions would be extremely challenging due to the inherent difficulties in identifying a sufficient number of ID pairs and determining reliable W* values. Therefore, I consider this dataset and the associated analysis to be particularly valuable. That said, these practical and methodological challenges may not be readily apparent to some readers, so providing a brief clarification in the text could further help convey the value and uniqueness of this dataset.
For landslides, it is generally impossible to predict where they will occur in advance or to identify the exact time of initiation. The recurrence interval of landslides in a given region typically spans several decades to centuries, making it difficult to obtain a sufficient number of ID pairs at the regional scale. Consequently, most ID thresholds for landslides have been derived at the national scale. Preparing a landslide dataset suitable for an analysis such as that presented in Section 5 is therefore extremely difficult.
For debris flows, regional ID thresholds are often derived from ID pairs based on observed occurrences within the same or nearby catchments. However, in many cases, “occurrence” refers to the arrival of debris flow at the observation point rather than the initiation of motion. As mentioned in Comment 1 and the references therein, this implies that the threshold inherently includes the processes of debris-flow development and runout, which depend not only on rainfall but also on sediment availability, distribution, and composition. Hence, the strict identification of W* is practically difficult.
The paragraph in Section 4 (L78-87) already mentions, at least in part, that W* is practically indeterminable. As mentioned earlier, I suggest moving that paragraph to Section 5 and expanding on it there. My recommendation is to strengthen the explanation of the practical difficulties in determining W* and in obtaining numerous ID pairs from real-world data. The authors could clarify that the analysis in Section 5 essentially deals with another metric (such as W corresponding to max(Tw)) but that, in this study area and dataset, assuming W = W* is reasonably valid, citing adequate references to support this rationale. This approach would clearly separate Sections 1–4 as describing theoretical principles and Section 5 as demonstrating an empirical application and interpretation based on real data.
I also agree with the discussion in L123-134. However, as noted in Comment 1, it is important to emphasize that the key finding, that W* ranges from 30 minutes to 6 hours, was derived specifically for debris flows. Whether landslides exhibit a similar pattern remains unknown. If the debris flows analyzed here are sourced primarily from channel-bed sediment, this relatively broad time interval may reflect temporal variations in sediment availability within the catchment (e.g., Tsunetaka et al., 2021a).

3. Scaling limitation of IDF curves
The authors, for convenience, have estimated the return periods of very short-duration rainfall (less than 15 minutes) based on existing IDF curves. I am concerned that the validated lower limit of the existing IDF scaling (Borga et al., 2005) may be around 15-minute rainfall durations. The estimation of return periods for such short-duration rainfall involves high uncertainty. In fact, in Figure 1b, there appear to be at least two unrealistic data points plotted at less than 1 hour on the x-axis and around 200 mm h⁻¹ on the y-axis. Although I understand that there is currently no practical alternative approach, a brief mention of this limitation in the main text would further improve the clarity of the manuscript.
I also believe that this concern is independently addressed by the results presented in Figure 3, which show the decorrelation time of rainfall. I was quite impressed by how closely this figure conceptually aligns with Figure 2. If space permits, adding a more detailed explanation and discussion of Figure 3 would make the manuscript even more refined and insightful.

4. Comparison of slopes regarding ID, IW*, and IDF
The discussion comparing the slopes of ID, IW*, and IDF relationships may need to be moderated, as the current analysis does not provide sufficient evidence to draw a definitive conclusion. Because the D values of the ID pairs are relatively large, the data points in this dataset appear only within the range of approximately 1 to 48 hours on the x-axis in Figure 1b. The scaling for durations shorter than 1 hour is essentially extrapolated.
Considering this, when focusing on the range between 1 and 48 hours in Figure 1b, the differences in slope among the ID threshold, IW* threshold, and IDF scaling appear to be nearly equivalent. Therefore, the overall difference in slope might simply reflect the data limitation that there are few ID pairs with small D values. For landslides, triggering rainfall events with D < 1 hour are extremely rare. However, for debris flows, such short-duration triggering events have been reported in various catchments (e.g., Abancó et al., 2016; Bel et al., 2017; Tsunetaka et al., 2021a).

Line by line comments
Title: Since the case study focuses specifically on debris flows, it might be helpful to include the term “debris flow” in the title to clearly indicate the study target.
L27-29: Readers who are less familiar with rainfall thresholds may wonder why the parameter E is sometimes used. A brief explanation of its meaning and rationale, supported by an appropriate reference, would help clarify this point.
L50: IDF are -> IDF curves are
L61: an user defined -> a user defined
L78-79: Please consider softening the tone of the explanation slightly to make it more balanced and accessible to a broader readership.
L89-90: It would be helpful to clarify whether these debris flows were initiated by landslides or if they mainly resulted from bulking and entrainment of unconsolidated channel-bed material. A brief explanation would improve the reader’s understanding.
L98: It is not entirely clear whether these represent the triggering locations or the observation locations. Please clarify how the triggering location was defined in this dataset.
Figure 1: To further aid interpretation, you might consider adding summary scatter plots for all 133 events alongside Figure 1a: specifically, W∗ vs. D and I∗ vs. I. Including such plots could make the relationships easier to grasp, particularly for readers who are less familiar with rainfall threshold analyses.

References
Abancó, C., Hürlimann, M., Moya, J., & Berenguer, M. (2016). Critical rainfall conditions for the initiation of torrential flows. Results from the Rebaixader catchment (Central Pyrenees). Journal of hydrology, 541, 218-229.
Bel, C., Liébault, F., Navratil, O., Eckert, N., Bellot, H., Fontaine, F., & Laigle, D. (2017). Rainfall control of debris-flow triggering in the Réal Torrent, Southern French Prealps. Geomorphology, 291, 17-32.
Guo, X., Cui, P., Li, Y., Fan, J., Yan, Y., & Ge, Y. (2016). Temporal differentiation of rainfall thresholds for debris flows in Wenchuan earthquake-affected areas. Environmental Earth Sciences, 75(2), 109.
Pastorello, R., Hürlimann, M., & D’Agostino, V. (2018). Correlation between the rainfall, sediment recharge, and triggering of torrential flows in the Rebaixader catchment (Pyrenees, Spain). Landslides, 15(10), 1921-1934.
Tsunetaka, H., Hotta, N., Imaizumi, F., Hayakawa, Y. S., & Masui, T. (2021a). Variation in rainfall patterns triggering debris flow in the initiation zone of the Ichino-sawa torrent, Ohya landslide, Japan. Geomorphology, 375, 107529.
Tsunetaka, H., Shinohara, Y., Hotta, N., Gomez, C., & Sakai, Y. (2021b). Multi‐decadal changes in the relationships between rainfall characteristics and debris‐flow occurrences in response to gully evolution after the 1990–1995 Mount Unzen eruptions. Earth Surface Processes and Landforms, 46(11), 2141-2162.

Hide

ED: Reconsider after major revisions (further review by editor and referees) (05 Nov 2025) by Ugur Öztürk

AR by Francesco Marra on behalf of the Authors (15 Dec 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (17 Dec 2025) by Ugur Öztürk

AR by Francesco Marra on behalf of the Authors (17 Dec 2025)

Journal article(s) based on this preprint

22 Dec 2025

Brief communication: Threshold and probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves

Francesco Marra, Eleonora Dallan, Marco Borga, Roberto Greco, and Thom Bogaard

Nat. Hazards Earth Syst. Sci., 25, 5055–5061, https://doi.org/10.5194/nhess-25-5055-2025,https://doi.org/10.5194/nhess-25-5055-2025, 2025

Short summary

Francesco Marra, Eleonora Dallan, Marco Borga, Roberto Greco, and Thom Bogaard

Data sets

Thresholds, not probability Francesco Marra https://doi.org/10.5281/zenodo.15845770

Francesco Marra, Eleonora Dallan, Marco Borga, Roberto Greco, and Thom Bogaard

Viewed

Total article views: 2,232 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
2,084	107	41	2,232	43	54

HTML: 2,084
PDF: 107
XML: 41
Total: 2,232
BibTeX: 43
EndNote: 54

Views and downloads (calculated since 17 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	95	32	12	139
Aug 2025	293	17	5	315
Sep 2025	1,537	12	10	1,559
Oct 2025	67	7	3	77
Nov 2025	46	9	4	59
Dec 2025	46	30	7	83

Cumulative views and downloads (calculated since 17 Jul 2025)

Month	HTML	PDF	XML	Total
Jul 2025	95	32	12	139
Aug 2025	293	17	5	315
Sep 2025	1,537	12	10	1,559
Oct 2025	67	7	3	77
Nov 2025	46	9	4	59
Dec 2025	46	30	7	83

Viewed (geographical distribution)

Total article views: 2,191 (including HTML, PDF, and XML) Thereof 2,191 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Dec 2025

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (869 KB)
Metadata XML


Total:	0
HTML:	0
PDF:	0
XML:	0