the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Brief communication: Threshold not probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves
Abstract. Intensity-duration (ID) thresholds are used to identify rainfall conditions likely to initiate landslides. They consider the average rain intensity observed over the entire length (called duration) of user-defined events. Intensity-duration-frequency (IDF) curves assign a probability to the intensity of precipitation observed over fixed-length temporal windows (also called durations). As the term duration refers to different concepts, ID thresholds and IDF curves cannot be compared directly, and should better not be plotted in one figure, and IDF curves should not be used to quantify the exceedance probability of ID thresholds.
Competing interests: At least one of the (co-)authors is a member of the editorial board of Natural Hazards and Earth System Sciences.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.- Preprint
                                        (869 KB) 
- Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- 
                     RC1:  'Reviewer comment on egusphere-2025-3378', Anonymous Referee #1, 23 Jul 2025
            
                        
            
                            
                    
            
            
            
                        - AC1: 'Reply on RC1', Francesco Marra, 22 Sep 2025
 
- 
                     RC2:  'Comment on egusphere-2025-3378', Anonymous Referee #2, 13 Aug 2025
            
                        
            
                            
                    
            
            
            
                        Thank you for the opportunity to review Marra et al., “Brief communication: Threshold not probability. The conceptual difference between ID thresholds for landslide initiation and IDF curves.” (egusphere-2025-3378). In this contribution, the authors explore the conceptual difference between rainfall intensity-duration (ID) thresholds for landslide initiation, which are conventionally fit to ID pairs that consider average intensity over entire landslide-triggering rainfall events with the intention of identifying conditions under which landslides are more likely, and intensity-duration-frequency (IDF) curves, which are fit to ID pairs that consider annual maximum average intensities for windows of defined duration with the intention of estimating annual exceedance probabilities. The authors argue that, because the definition of duration is different, these two curves are not comparable and IDF curves should not be used to estimate the exceedance probability of landslide-triggering rainfall. They use an example dataset of debris flows from the eastern Italian Alps to compare the implications of using the conventional approach based on the entire event duration to define I-D thresholds and an alternative approach that selects the duration with the maximum return period during an event. They show that return periods are much higher for the window with the maximum return period during the event than for the whole event. They also show that the slope of an ID threshold that uses the alternative approach better matches the slope of the regional IDF curves. Overall, this brief communication is well-written, thought-provoking, and has caused me to reconsider some results of my own research. It points out some important issues with ID thresholds that will be instructive for landslide researchers. In my view, the key contributions are (1) the clear explanation of how ID thresholds and IDF curves differ conceptually, (2) the insight that the return period of the average intensity over an entire event is the lowest possible return period and that much higher return periods may exist for shorter periods within an event, and (3) the recognition that if landslide triggering rainfall events are sampled with duration windows akin to the blocks used to determine IDF curves, the slope of ID threshold matches the regional IDF curves, at least for the case study presented. I believe that points (2) and (3) could be further emphasized in the text and should included in the abstract. Because this piece is likely to serve as a primer on this topic for future researchers, there are some arguments that need a more nuanced explanation and it must be made clear which points are the author’s opinions and which are supported by the evidence presented. Additional references are needed throughout. In particular: - The authors make the arguments that “IDF curves should not be used to quantify the exceedance probability of ID thresholds” (Lines 5 – 6) and “it is therefore erroneous to quantify the return period Td of a given intensity I in the ID space using probabilities estimated from the IW space of IDF curves.” (Lines 70 – 71) From my perspective, it is not wrong, if one has made the conventional choice that W=D, to look up what the exceedance probability of that ID pair is. The key issue lies in making the choice that W=D in the first place, as this choice obscures shorter periods of high intensities that may have much lower exceedance probabilities, as shown in the case study. This distinction needs to be made very clearly. The “should” in the first statement and the “erroneous” in the second are based on the opinion that it would be better to use W* to define the exceedance probability of the event than choosing W=D. While I tend to agree, this short paper does not present evidence that W* is a better predictor of triggering rainfall than W=D, so it needs to be clear that this is the authors’ opinion.
- The analysis shows that an ID threshold fit to IW* pairs better matches the slope of the regional IDF curves than a conventional threshold and the authors argue that this solves “the apparent difference in the power-law scaling of ID thresholds and IDF curves discussed by Bogaard and Greco (2018).” This is an interesting result, my interpretation of which is that when time series of debris flow triggering rain are sampled with W*, the method is similar enough to using block maxima that the distribution of extreme rainfall is similarly represented. That would suggest that the difference between ID and IDF slopes can be attributed to methodological differences in how rainfall time series are sampled rather than any physical processes. If the authors agree that this is the case, I recommend making this point explicitly to avoid any further confusion. But then I have to wonder – what about filling-storing-draining?
- The authors note that corresponding intensities for W* are systematically higher than W=D, which they argue “Implies that what is really important for triggering are rain intensities over time scales that can be much shorter than the total length of the identified rainfall event in combination with the hydrological antecedent conditions” (Lines 99 – 102). I do not understand how the first point implies the second. There is a logical gap here that needs to be addressed.
 I have some additional suggestions that I believe could make the manuscript more instructive, particularly for readers who are less familiar with ID, IDF, or both: - In Figure 1, I suggest labeling the IDF scaling lines with return periods to make it more clear what these refer to. I also suggest adding a panel to this figure that shows a time series of a debris flow triggering event with windows that show W* and W=D and the average intensities and their return periods over each of these windows. This will help readers to better understand the difference between the ID pairs and IW* pairs.
- As an outlook in the conclusions, the authors may want to consider mentioning the variety of alternative approaches to determining thresholds or estimating continuous probabilities that are better able to capture intense periods in landslide triggering time series than averaging over the entire event. For example, both (Staley et al., 2017) and (Patton et al., 2023) compared models trained with accumulations over different windows to select a model that best separated triggering from non-triggering events for post-fire debris flows in the western United States and shallow landslides in Alaska. The (Moreno et al., 2025) study that was already cited is a nice example of how we can move away from the need to bin time series entirely.
 Line by line comments: Line 9 – suggest citing (Guzzetti et al., 2020) Line 58 – this statement needs a reference Line 63 – this statement needs a reference and possibly more context. Is this choice a convention in meteorology or is this an argument that the authors are making here? Line 72 – Please add ~one sentence clarifying how this leads to false alarms. Line 74 – this statement needs a reference Line 81 – How did you select these 12 storms as opposed to considering all debris flow triggering storms? Please clarify. Line 87 – Please add one sentence detailing how you defined the events (e.g. length of dry period). As you noted earlier, the derived ID pairs are sensitive to these choices. Line 94, Figure 1 – Please add an estimate of statistical uncertainty to these thresholds. Line 127 – surely, Moreno et al., 2025 aren’t the first to point this out. Earlier reference? Line 129 - (Iida, 1999) also noted this References Guzzetti, F., Gariano, S. L., Peruccacci, S., Brunetti, M. T., Marchesini, I., Rossi, M., and Melillo, M.: Geographical landslide early warning systems, Earth-Science Reviews, 200, 102973, https://doi.org/10.1016/j.earscirev.2019.102973, 2020. Iida, T.: A stochastic hydro-geomorphological model for shallow landsliding due to rainstorm, CATENA, 34, 293–313, https://doi.org/10.1016/S0341-8162(98)00093-9, 1999. Moreno, M., Lombardo, L., Steger, S., De Vugt, L., Zieher, T., Crespi, A., Marra, F., Van Westen, C., and Opitz, T.: Functional Regression for Space‐Time Prediction of Precipitation‐Induced Shallow Landslides in South Tyrol, Italy, JGR Earth Surface, 130, e2024JF008219, https://doi.org/10.1029/2024JF008219, 2025. Patton, A. I., Luna, L. V., Roering, J. J., Jacobs, A., Korup, O., and Mirus, B. B.: Landslide initiation thresholds in data-sparse regions: application to landslide early warning criteria in Sitka, Alaska, USA, Natural Hazards and Earth System Sciences, 23, 3261–3284, https://doi.org/10.5194/nhess-23-3261-2023, 2023. Staley, D. M., Negri, J. A., Kean, J. W., Laber, J. L., Tillery, A. C., and Youberg, A. M.: Prediction of spatially explicit rainfall intensity–duration thresholds for post-fire debris-flow generation in the western United States, Geomorphology, 278, 149–162, https://doi.org/10.1016/j.geomorph.2016.10.019, 2017. Citation: https://doi.org/10.5194/egusphere-2025-3378-RC2 - AC2: 'Reply on RC2', Francesco Marra, 22 Sep 2025
 
Data sets
Thresholds, not probability Francesco Marra https://doi.org/10.5281/zenodo.15845770
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 1,985 | 66 | 29 | 2,080 | 34 | 43 | 
- HTML: 1,985
- PDF: 66
- XML: 29
- Total: 2,080
- BibTeX: 34
- EndNote: 43
Viewed (geographical distribution)
| Country | # | Views | % | 
|---|
| Total: | 0 | 
| HTML: | 0 | 
| PDF: | 0 | 
| XML: | 0 | 
- 1
 
 
                         
                         
                         
                        



 
                 
                 
                 
                 
                
This brief communication is very… brief! I mean, in the positive sense of the word. Indeed, it is a clear, concise manuscript that is perfectly written in fluent English - something very rare for a reviewer to find. I thank the authors for that! The paper gets straight to the point: landslide-triggering intensity-duration thresholds and precipitation intensity-duration-frequency curves cannot be confounded, compared, or plotted together. Neither one can be used to quantify the return time of the other.
Frankly, having worked on rainfall analysis and landslide prediction for years, the idea of mixing/comparing ID thresholds and IDF curves is something that never came to my mind. In the few cases I have seen in the extensive literature on these topics, it has always seemed very strange, not to say a downright methodological error. So, I can say that I certainly agree with the authors of this paper, although I do not think the article addresses a relevant scientific and/or technical question. I simply think that mixing ID thresholds and IDF curves is a misconception that does not even require discussion.
The authors list the differences between ID thresholds and IDF curves, focusing on the different durations (D and W) considered by the two tools, and then analysing the differences in terms of return time referring to these durations. In my opinion, they forgot the main and most important difference. That is: since their definition from pioneering works (Nel Caine and also previous pioneers), ID thresholds have been defined considering ID pairs that are somehow - arbitrarily or not, subjectively or not - linked to the initiation or re-activation of one or more landslides. On the other hand, IDF curves are defined considering IW (using the same terminology as the authors) pairs that are not linked to landslide/debris flow occurrence, referring only to rainfall itself. Indeed, the authors write “IDF are obtained by collecting the highest rainfall intensities observed any year over the time windows of interest” (lines 45-46). Therefore, the two tools summarise or describe different variables (the ID pairs by which the thresholds are defined are different by definition from the IW pairs with pre-fixed durations of the IDF curves, having different characteristics consequently) and different processes (landslide or debris flow initiation and rainfall severity). This is, in my opinion, the main reason why the two tools must not be compared or mixed. I wouldn't have added anything else to this discussion
However, the authors added more to the discussion, deserving attention. I list below some other comments on this paper.
First, I don’t understand the first part of the title “Threshold not probability”. Actually, thresholds can be probabilistic. As a matter of fact, the Bayesian thresholds mentioned by the authors are probabilistic. Moreover, the frequentist thresholds also mentioned by the authors allow defining probabilistic diagrams to be used for early warning purposes. Therefore, I would remove this part of the title, which works only for deterministic, binary thresholds.
In several parts of the text, the authors write that quantifying the return period of a given intensity used to define ID thresholds using probabilities estimated from the IW space is erroneous and causes an underestimation of the severity of the triggering rainfall. I agree with the authors, totally. However, I’d suggest mentioning some works in which this erroneous approach was adopted, also because these are cited again in the last sentence of the paper (“Some results in the literature may thus be quantitatively inexact”). Moreover, I would add that the return period of a given ID thresholds should not be calculated at all. Indeed, rather than adopting dichotomous approaches (above/below threshold), using statistical and probabilistic approaches, as the two mentioned above, allows the probabilistic characterisation of the thresholds without introducing (erroneously) the concept of return time, which is also highly questionable for a variable not easily measurable as landslide or debris flow occurrence/triggering. In addition, as the authors certainly know, the concept of return time and how it changes in relation to non-stationarity is a topic of discussion in the scientific community.
Moving to sections 2 and 3, the differences between ID thresholds and IDF curves are listed, focusing in particular on the different ways to define the duration of the ID/IW pairs.
According to the authors’ view, the durations D are user- (or arbitrary-) defined while the durations W are not. But, actually, W are also user- (or arbitrary-) defined using running windows of x minutes or hours: 5, 10, … 45 minutes or 1, 2, … 48 hours were also defined by a user. Moreover, the authors didn’t mention that IDF curves can be defined using the partial duration series approach as well, so introducing another point of discussion.
In section 2 (lines 29-32) the authors write “rainfall records are often not available at hourly resolutions nor in close range of the landslide (Marra et al., 2016; Marra, 2019), which makes the events separation dependent also on these aspects.”. Actually, this issue affects the definitions of W too. Indeed, if only daily measurements are available in a given area, sub-daily values of W (e.g. the classical 1, 3, 6, 12, 24 hours) can’t be defined, and the IDF curves cannot be drawn for sub-daily durations.
In section 4 (lines 62-63), the authors write “In a univariate framework, the return period T* of a rainfall event can reasonably be defined as the maximum among the return periods TW associated with all possible temporal scales”. I think that some examples should be provided to support this statement.
Moving to section 5, I have some comments regarding the dataset used. First, it should be noted (and somewhere acknowledged by the authors) that the dataset is quite dated, having been collected over ten years ago. Second, spatial and temporal information of the debris flow records is missing. In particular, authors should specify whether the time of occurrence is known for the debris flows included in the dataset used. This is extremely important information for a dataset to be used for the definition of rainfall thresholds. Moreover, it is relevant for another issue that I write further on in my comment. Third, it is not described how the triggering precipitation events used to draw the thresholds were defined. This is also very relevant, given the comparison with IDF curves done in the paper.
Further on in section 5, the authors describe the procedure used to calculate W* (lines 88-92). It should be acknowledged that the outcomes of this procedure are not related to debris flow triggering. Indeed, the fact that they have the highest return time among all IW pairs does not mean that they triggered debris flow. It would be useful to know when these IW* pairs occurred within the whole event duration, in order to establish whether they are relevant to the triggering of debris flows or not. If the IW* pairs occurred many hours (or days) before the occurrence of the debris flows, it cannot be said that they were certainly relevant to the initiation; at least, not more important than the entire event. This is the reason why knowing the exact time of occurrence of the debris flows is essential to prove that “what is really important for triggering are the rain intensities over time scales that can be much shorter than the total length of the identified rainfall event in combination with the hydrological antecedent conditions”. In my opinion, selecting IW* pairs using the maximum return time as the only constraint is not sufficient to prove this hypothesis, and adds subjectivity in the process.
Then (lines 98-99), the authors write that “IW* pairs are associated with temporal scales W* that are always smaller than the duration D of ID pairs. In addition, by design, the corresponding intensities are systematically higher”. This is tautological and led to what is written in lines 109-111 (i.e., the underestimation of the return times of the whole events compared to the IW* pairs). Again, having a lower return time does not imply that an ID pair is less severe in terms of landslide/debris flow triggering. This is another point to be added in the conceptual difference between ID thresholds and IDF curves.
Moreover, the authors assumed that ID thresholds are always defined considering D as the whole duration of the rainfall events. This is not always true. There are several examples in the literature in which sub-events are distinguished (automatically or not) within the entire rainfall events and used to define rainfall thresholds. This can be considered a solution to the issues about durations being too long. I’d suggest mentioning it in the discussion.
Before moving to the conclusions, two comments on Figs. 2 and 3. In Fig. 2, the (a) and (b) labels are missing. Fig. 3 and its description are not very clear; a better description a more discussion are needed.
Going to the conclusions of the work, I totally agree that the calculation of return times of triggering conditions should be avoided, for several reasons including the ones described by the authors. However, the main motivation should be that it’s better to use statistical/probabilistic approaches to define rainfall thresholds rather than calculating return times of the triggering conditions. Moreover, the underestimation of the return periods should be better evaluated considering the time of occurrence of the IW pairs and landslides/debris flows.
Overall, I think that the main message of the work is clear and shareable. However, I believe that the conclusions would need results based on an accurate dataset and improved methodology. In my opinion, more temporal details on the dataset are needed, in order to allow the most important methodological improvement needed in the work: that is, find the time of occurrence of the IW* pairs and their temporal distance from the debris flow occurrences. Only in this way will the conclusions be adequately justified by the data and results.
So, my suggestion is that the work needs major revisions before being reconsidered for publication. The revised version of the paper should include an analysis of the temporal instants of the IW* pairs, so as to say with certainty that they can be considered the cause for debris-flow-triggering. This may be done using information from the proposed dataset (if any) or using other datasets. Moreover, I’d kindly suggest taking into consideration all my comments regarding theoretical and methodological aspects of the work.