Using structured expert judgment to Estimate extreme river discharges: a case study of the Meuse River

Rongen, Guus; Morales-Nápoles, Oswaldo; Kok, Matthijs

doi:https://doi.org/10.5194/egusphere-2023-39

Preprints

https://doi.org/10.5194/egusphere-2023-39

Preprints

07 Mar 2023

| 07 Mar 2023

Using structured expert judgment to Estimate extreme river discharges: a case study of the Meuse River

Guus Rongen, Oswaldo Morales-Nápoles, and Matthijs Kok

Abstract. Accurate estimation of extreme discharges in rivers, such as the Meuse, is crucial for effective flood risk assessment. However, existing statistical and hydrological models that estimate these discharges often lack transparency regarding the uncertainty of their predictions, as evidenced by the devastating flood event that occurred in July 2021 which was not captured by the existing model for estimating design discharges. This article proposes an alternative approach with a central role for expert judgment, using Cooke’s method. A simple statistical model was developed for the river basin, consisting of correlated GEV-distributions for discharges in upstream sub-catchments. The model was fitted to expert judgments, measurements, and the combination of both, using Markov chain Monte Carlo. Results from the model fitted only to measurements were accurate for more frequent events, but less certain for extreme events. Using expert judgment reduced uncertainty for these extremes but was less accurate for more frequent events. The combined approach provided the most plausible results, with Cooke's method reducing the uncertainty by appointing most weight to two of the seven experts. The study demonstrates that utilizing hydrological experts in this manner can provide plausible results with a relatively limited effort, even in situations where measurements are scarce or unavailable.

Received: 12 Jan 2023 – Discussion started: 07 Mar 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1976 KB)

Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
Preprint (1976 KB)

Supplement (5342 KB)

Download & links

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Journal article(s) based on this preprint

03 Jul 2024

Using the classical model for structured expert judgment to estimate extremes: a case study of discharges in the Meuse River

Guus Rongen, Oswaldo Morales-Nápoles, and Matthijs Kok

Hydrol. Earth Syst. Sci., 28, 2831–2848, https://doi.org/10.5194/hess-28-2831-2024,https://doi.org/10.5194/hess-28-2831-2024, 2024

Short summary

Guus Rongen, Oswaldo Morales-Nápoles, and Matthijs Kok

Interactive discussion

Status: closed

RC1:
'Referee Comment on egusphere-2023-39', Anonymous Referee #1, 29 Mar 2023

The paper provides the result of an interesting experiment in which flood experts are asked to guess the flood frequency curves for several sites in a region without access to discharge data, and the information is then used together with observed maximum annual flood peaks to improve the "credibility" of the estimated tails of the distributions. A procedure is developed to use expert opinions on several tributaries and transform them into an estimate for a downstream gauge.
The paper is original, as far as I can tell, and deals with an important issue in flood risk assessment, i.e., the formal use of expert opinion in flood frequency analysis. Even though I liked reading the paper, there are some parts that, in my opinion, need to be improved, clarified, better explained or discussed before publication. My main concerns are:
1) I am not sure that the ability of the expert in providing a judgement on the flood frequency curve can be measured by her/his ability in guessing the 10-yr flood in absolute terms. If one wants the expert to help in reducing uncertainty in the tails of the distribution, she/he should inform us on how large floods may compare to small floods, by reasoning on the driving processes. In the end, it is the shape of the flood frequency distribution that's hard to get with local data, not the location. The proposed method seems to be tailored for getting the order of magnitude right, i.e., the flood magnitude in m^3/s, but not how surprising can large extreme events be compared to the more frequent ones.
2) the expert information is accounted for as data (part of the likelihood) using an ad-hoc procedure, which seems to me inconsistent with the Bayesian way. Why not accounting for expert judgement as prior information? That would be the natural Bayesian way to do it: since the experts give their estimates without using discharge data, this can be considered as prior information.
3) given the procedure proposed, the tail of the distribution is controlled by expert judgement with a strength that is related to the subjective choice of the weight given to the expert "data" compared to the observed data. The result of the procedure is then assessed as credible/reasonable, but how could it not be so? From what I've understood, the procedure seems to allow a way to tweak subjectively the shape of the flood frequency distribution.
4) the results are not assessed against a benchmark. Why not using regional flood frequency analysis as a benchmark?
5) some of the methodological steps are unclear, sometimes, and should be properly explained (see the detailed comments below).

Detailed comments:
line 8: MCMC is just a tool. I would say here that you use Bayesian inference.
line 17: the 2021 peak at Borgharen is the highest but does not seem surprisingly high, looking at Figure 5. I think the same event has been much more surprising in other, smaller catchments. Even though it is surprising for the summer season, as I understand, your analysis later is not done accounting for seasonality. I would even expect that, if asked for the summer flood frequency curve, the experts would underestimate the probability of such an event.
line 30: here the text suggests that hydrological model simulations outperform statistical methods in flood frequency analysis. Has this been demonstrated in the literature? As far as I know, statistical models tailored for flood frequency analysis are more accurate than other methods both in gauged and ungauged basins (see Bloeschl et al., 2013, ISBN:9781107028180). Besides, despite some advantages, you clearly show limitations for the hydrological modelling approach in the discussion until line 45. Since the accurate estimate of the distribution tails is of interest, why don't you mention regional flood frequency analysis and inclusion of historical events as ways of increasing the robustness (and reducing the uncertainty) of the estimates? Besides, aren't design flows available from a regional frequency analysis in the area, e.g. to be used as a benchmark?
line 43: I don't get the factor 3 vs. 1.4 sentence. What is the "outcome"?
lines 65-68: spoiler alert! I would move this sentence after the results section.
line 79: I don't get the meaning of the sentence "The discharge estimates for this catchment are therefore only used for expert calibration, as the flow is part of the French Meuse flow".
line 85: I would add a table here in the main text summarizing the data provided to the experts.
line 107: not having some more details on the construction of the correlation matrices is a pity. It would have been wise to publish that paper first.
line 109: Each variable is modelled by a marginal distribution, it is not a distribution.
Section 3.2: I am not sure that the ability of the expert in providing a judgement on the flood frequency curve can be measured by her/his ability in guessing the 10-yr flood in absolute terms. If you want the expert to help in reducing uncertainty in the tails of the distribution, she/he should inform you on how large floods may compare to small floods, by reasoning on the driving processes. In the end, it is the shape of the flood frequency distribution that's hard to get with local data, not the location. Your ranking seems to me tailored for getting the order of magnitude right, but not how surprising can large extreme events be compared to the more frequent ones. I know this cannot be done now but I would have asked the experts to guess the ratios between the 10-yr event and the mean event, and between the 100-yr event and 10-yr event, and so on, in order to get their perception on the shape of the distribution. Maybe you could discuss the idea in the discussion section, if you see that fit.
line 151: "a training exercise"
line 154: are the 26 questions made available somewhere?
line 173: the weakly informed prior in Appendix A is very peculiar to me. I imagine very strange parameter combinations, very far from what could be expected for floods, are given the same weight than more reasonable ones, and some reasonable ones are excluded because of the bound at 10000. Why not the usual priors for the GEV distribution when dealing with floods, i.e., unbounded uniform for location and for the log of the scale and the Martins and Stedinger (2000, doi:10.1029/1999WR900330) geophysical prior, or similar ones, for the shape parameter?
lines 185-195: here the expert information is accounted for as data (part of the likelihood). Why not accounting for it as prior information? That would be the natural way to do it: since the experts give their estimates without using discharge data, this can be considered prior information. For getting the prior distribution of the parameters from the prior assessment of the quantiles, one could use the procedure described in Renard et al. (2006, doi:10.1007/s00477-006-0047-4), for example. This would avoid the subjective choice of weights presented in lines 196-205, which actually control the fit of the tail of the flood frequency curves. Also, this would provide a more defendable prior than the one discussed in Appendix A.
line 196: log-likelihoods are summed
line 206: please indicate in which equation (and with what symbol) the "factor between the tributaries’ sum and the downstream discharge" has been introduced. Is it the one in Eq. (1)? And what are the observations to which a log-normal distribution is fitted? I am confused here.
Section 3.4: I am sorry but I don't understand the procedure at all. I wish I could suggest how to improve points 1 and 2, but I can't figure out what they do mean.
Lines 269-278: here it seems evident to me that the objective assigned to the expert is to guess a reasonable mean annual peak discharge, in m^3/s, but not so much the shape of the growth curve. Afterwards, the Cook's method values the experts in how well they get the order of magnitude of flood discharges right, more than the shape of the distribution. Is this what we need to inform our analysis about how extreme can large floods be?
Line 290: "not too steep"
Figure 5: if I have understood well, the points in the third column should all be grey because discharges at Borgharen are not used in the fit. Am I right?
Line 308: I don't get what the following sentence means: "Sampling from these wide uncertainty bounds will therefore (too) often result in a high discharge event".
Figure 6bc: it seems peculiar that combining the pieces of information that individually result in the blue and yellow distributions leads to the red one (e.g., the red mode is lower than the blue and yellow ones). Can you comment on that?
line 323: why are the median values considered best estimates?
line 330: I don't understand the sentence.
line 340: but the experts knew about the 2021 event when doing the exercise and this has biased their estimates, I guess. How would have their estimates been different before 2021? That's hard to tell.
line 350: the following sentence doesn't mean anything to me: "were combined ... in ranges that are commonly 'in sample'".
line 360: since the tails of the distributions are controlled by the expert opinions, it seems to me obvious that they "seem credible". Couldn't they be compared to the outcomes of a more classical regional flood frequency analysis?

Citation: https://doi.org/10.5194/egusphere-2023-39-RC1
- AC1: 'Reply on RC1', Guus Rongen, 22 Apr 2023
  
  The authors would like to thank the reviewer for the detailed and constructive review. The reviewer mentions valid points. We (the authors) have added a response to each of them. Please find the detailed response in the attached file.
  
  Citation: https://doi.org/10.5194/egusphere-2023-39-AC1
RC2:
'Comment on egusphere-2023-39', Anonymous Referee #2, 15 Apr 2023

This study investigates through the Cooke's method how scientific judgments by experts can assist flood-risk managers. In my opinion, there are several issues that need to be addressed so that the applied methods, justification, and results, can be clearer and of practical use to other case studies. Please see several such comments and suggestions below:
1) The main point raised by the authors for someone to use the suggested method is that "...existing statistical and hydrological models that estimate these discharges often lack transparency regarding the uncertainty of their predictions..."; however, please note that the purpose of the probabilistic analysis is exactly this one (i.e., to estimate and take into consideration the uncertainty and variability of predictions of the input and output parameters of a flood model; see for example, a review, applications, and discussion on the uncertainty of flood parameters through benchmark examples in Dimitriadis et al., 2016). I would suggest not comparing with such methods (which are plenty in the literature), but focusing on the advantages and limitations of the proposed method.
Dimitriadis, P., A. Tegos, A. Oikonomou, V. Pagana, A. Koukouvinos, N. Mamassis, D. Koutsoyiannis, and A. Efstratiadis, Comparative evaluation of 1D and quasi-2D hydraulic models based on benchmark and real-world applications for uncertainty assessment in flood mapping, Journal of Hydrology, 534, 478–492, doi:10.1016/j.jhydrol.2016.01.020, 2016.
2) The fact that "...the devastating flood event that occurred in July 2021... was not captured by the existing model for estimating design discharges.", is not for the statistical methods to blame (or replace), but a more appropriate analysis by experts should have been performed. For example, there is an application shown in Figure 10 (in Dimitriadis et al., 2016), where there was a certain flooded area that could not be captured by a 1D model (due to the 1D nature of the model that cannot account for a 180 degrees turn of the water, since only 1 direction is possible within a cross-section), whereas this area can be captured if a 2D (or quasi-2D) model is applied. However, only an expert in flood modeling could identify this (e.g., the authors state that "The study demonstrates that utilizing hydrological experts in this manner can provide plausible results with a relatively limited effort, even in situations where measurements are scarce or unavailable."). If this is what the authors are trying to highlight in this work (i.e., that the flood models should not be blindly applied by non-experts), then this is a strong and important statement, which however needs to be further discussed.
3) Please consider rephrasing the sentence "Quantifying events that are more extreme than ever measured (i.e., with return levels that are longer than the time period of representative measurements), requires extrapolating from available data or knowledge.", since it is not exactly true. The return period T corresponds to a probability of occurrence (i.e., on average, a storm event is expected to occur in T years) and not a deterministic occurrence that involves any kind of extrapolations or specific (i.e., 5th, 95th etc.) quantiles (please see the mathematical definitions and methods for extreme analysis and probability fitting in a recent work by Koutsoyiannis, 2022).
Koutsoyiannis, D., Replacing histogram with smooth empirical probability density function estimated by K-moments, Sci, 4 (4), 50, doi:10.3390/sci4040050, 2022.
4) The application of Cooke's method to the specific study is not very clear to me. For example, the authors state that "A simple statistical model was developed for the river basin, consisting of correlated GEV-distributions for discharges in upstream sub-catchments. The model was fitted to expert judgments, measurements, and the combination of both, using Markov chain Monte Carlo. Results from the model fitted only to measurements were accurate for more frequent events, but less certain for extreme events."; since they were all experts and applied the same model, how come they came up with different results, did they use different methods, and what are these methods? where did the experts base their reply, did they perform also simulations or just probabilistic fitting?
5) In my opinion, it is not very appropriate to apply a Monte-Carlo method with so few samples; please consider including more samples. Also, how come "The combined approach provided the most plausible results, with Cooke’s method reducing the uncertainty by appointing most weight to two of the seven experts."; why the authors have selected these 2 scientists; were these two more experts than the other scientists?
6) More details are required to back up the statement "The discharge at the Dutch border exceeded the flood events of 1926, 1993, and 1995. Contrary to those events, this flood occurred during summer, a season that is (or was) often considered irrelevant for extreme discharges on the Meuse."; please perform a proper statistical analysis and identify for each season the appropriate probability distribution to show at what discharge the probability of occurrence in the summer season exceeds the selected return period.
7) Regarding the comments "The event was thus surprising in multiple ways. This might happen when we experience a new extreme, but given that Dutch flood risk has safety standards up to once per 100,000 years (Ministry of Infrastructure and Environment, 2016) one would have hoped this to be less of a surprise." and "While most studies aimed at obtaining better estimates of discharge extremes use hydrological or statistical modeling, some follow the approach of using expert judgment (EJ).", please note that this is a must point in every scientific application, since when non-experts apply methods they do not understand, it could lead to failure regardless the magnitude of the selected return period.
8) It is mentioned that "For the Dutch rivers Meuse and Rhine, the GRADE instrument is used for this. It generates 50,000 years of rainfall and discharges."; please give more details on this model and how it generates so long rainfall and discharge timeseries (does it use a stochastic simulation approach for the rainfall annual extremes and input these to a hydraulic model to produce the discharge at a specific location in the area of interest?).

Citation: https://doi.org/10.5194/egusphere-2023-39-RC2
- AC2: 'Reply on RC2', Guus Rongen, 22 Apr 2023
  
  The authors would like to thank the reviewer for reviewing the article. We have written a response to the reviewer’s comments below. Please find it in the document attached.
  
  Citation: https://doi.org/10.5194/egusphere-2023-39-AC2

Interactive discussion

Status: closed

RC1:
'Referee Comment on egusphere-2023-39', Anonymous Referee #1, 29 Mar 2023

The paper provides the result of an interesting experiment in which flood experts are asked to guess the flood frequency curves for several sites in a region without access to discharge data, and the information is then used together with observed maximum annual flood peaks to improve the "credibility" of the estimated tails of the distributions. A procedure is developed to use expert opinions on several tributaries and transform them into an estimate for a downstream gauge.
The paper is original, as far as I can tell, and deals with an important issue in flood risk assessment, i.e., the formal use of expert opinion in flood frequency analysis. Even though I liked reading the paper, there are some parts that, in my opinion, need to be improved, clarified, better explained or discussed before publication. My main concerns are:
1) I am not sure that the ability of the expert in providing a judgement on the flood frequency curve can be measured by her/his ability in guessing the 10-yr flood in absolute terms. If one wants the expert to help in reducing uncertainty in the tails of the distribution, she/he should inform us on how large floods may compare to small floods, by reasoning on the driving processes. In the end, it is the shape of the flood frequency distribution that's hard to get with local data, not the location. The proposed method seems to be tailored for getting the order of magnitude right, i.e., the flood magnitude in m^3/s, but not how surprising can large extreme events be compared to the more frequent ones.
2) the expert information is accounted for as data (part of the likelihood) using an ad-hoc procedure, which seems to me inconsistent with the Bayesian way. Why not accounting for expert judgement as prior information? That would be the natural Bayesian way to do it: since the experts give their estimates without using discharge data, this can be considered as prior information.
3) given the procedure proposed, the tail of the distribution is controlled by expert judgement with a strength that is related to the subjective choice of the weight given to the expert "data" compared to the observed data. The result of the procedure is then assessed as credible/reasonable, but how could it not be so? From what I've understood, the procedure seems to allow a way to tweak subjectively the shape of the flood frequency distribution.
4) the results are not assessed against a benchmark. Why not using regional flood frequency analysis as a benchmark?
5) some of the methodological steps are unclear, sometimes, and should be properly explained (see the detailed comments below).

Detailed comments:
line 8: MCMC is just a tool. I would say here that you use Bayesian inference.
line 17: the 2021 peak at Borgharen is the highest but does not seem surprisingly high, looking at Figure 5. I think the same event has been much more surprising in other, smaller catchments. Even though it is surprising for the summer season, as I understand, your analysis later is not done accounting for seasonality. I would even expect that, if asked for the summer flood frequency curve, the experts would underestimate the probability of such an event.
line 30: here the text suggests that hydrological model simulations outperform statistical methods in flood frequency analysis. Has this been demonstrated in the literature? As far as I know, statistical models tailored for flood frequency analysis are more accurate than other methods both in gauged and ungauged basins (see Bloeschl et al., 2013, ISBN:9781107028180). Besides, despite some advantages, you clearly show limitations for the hydrological modelling approach in the discussion until line 45. Since the accurate estimate of the distribution tails is of interest, why don't you mention regional flood frequency analysis and inclusion of historical events as ways of increasing the robustness (and reducing the uncertainty) of the estimates? Besides, aren't design flows available from a regional frequency analysis in the area, e.g. to be used as a benchmark?
line 43: I don't get the factor 3 vs. 1.4 sentence. What is the "outcome"?
lines 65-68: spoiler alert! I would move this sentence after the results section.
line 79: I don't get the meaning of the sentence "The discharge estimates for this catchment are therefore only used for expert calibration, as the flow is part of the French Meuse flow".
line 85: I would add a table here in the main text summarizing the data provided to the experts.
line 107: not having some more details on the construction of the correlation matrices is a pity. It would have been wise to publish that paper first.
line 109: Each variable is modelled by a marginal distribution, it is not a distribution.
Section 3.2: I am not sure that the ability of the expert in providing a judgement on the flood frequency curve can be measured by her/his ability in guessing the 10-yr flood in absolute terms. If you want the expert to help in reducing uncertainty in the tails of the distribution, she/he should inform you on how large floods may compare to small floods, by reasoning on the driving processes. In the end, it is the shape of the flood frequency distribution that's hard to get with local data, not the location. Your ranking seems to me tailored for getting the order of magnitude right, but not how surprising can large extreme events be compared to the more frequent ones. I know this cannot be done now but I would have asked the experts to guess the ratios between the 10-yr event and the mean event, and between the 100-yr event and 10-yr event, and so on, in order to get their perception on the shape of the distribution. Maybe you could discuss the idea in the discussion section, if you see that fit.
line 151: "a training exercise"
line 154: are the 26 questions made available somewhere?
line 173: the weakly informed prior in Appendix A is very peculiar to me. I imagine very strange parameter combinations, very far from what could be expected for floods, are given the same weight than more reasonable ones, and some reasonable ones are excluded because of the bound at 10000. Why not the usual priors for the GEV distribution when dealing with floods, i.e., unbounded uniform for location and for the log of the scale and the Martins and Stedinger (2000, doi:10.1029/1999WR900330) geophysical prior, or similar ones, for the shape parameter?
lines 185-195: here the expert information is accounted for as data (part of the likelihood). Why not accounting for it as prior information? That would be the natural way to do it: since the experts give their estimates without using discharge data, this can be considered prior information. For getting the prior distribution of the parameters from the prior assessment of the quantiles, one could use the procedure described in Renard et al. (2006, doi:10.1007/s00477-006-0047-4), for example. This would avoid the subjective choice of weights presented in lines 196-205, which actually control the fit of the tail of the flood frequency curves. Also, this would provide a more defendable prior than the one discussed in Appendix A.
line 196: log-likelihoods are summed
line 206: please indicate in which equation (and with what symbol) the "factor between the tributaries’ sum and the downstream discharge" has been introduced. Is it the one in Eq. (1)? And what are the observations to which a log-normal distribution is fitted? I am confused here.
Section 3.4: I am sorry but I don't understand the procedure at all. I wish I could suggest how to improve points 1 and 2, but I can't figure out what they do mean.
Lines 269-278: here it seems evident to me that the objective assigned to the expert is to guess a reasonable mean annual peak discharge, in m^3/s, but not so much the shape of the growth curve. Afterwards, the Cook's method values the experts in how well they get the order of magnitude of flood discharges right, more than the shape of the distribution. Is this what we need to inform our analysis about how extreme can large floods be?
Line 290: "not too steep"
Figure 5: if I have understood well, the points in the third column should all be grey because discharges at Borgharen are not used in the fit. Am I right?
Line 308: I don't get what the following sentence means: "Sampling from these wide uncertainty bounds will therefore (too) often result in a high discharge event".
Figure 6bc: it seems peculiar that combining the pieces of information that individually result in the blue and yellow distributions leads to the red one (e.g., the red mode is lower than the blue and yellow ones). Can you comment on that?
line 323: why are the median values considered best estimates?
line 330: I don't understand the sentence.
line 340: but the experts knew about the 2021 event when doing the exercise and this has biased their estimates, I guess. How would have their estimates been different before 2021? That's hard to tell.
line 350: the following sentence doesn't mean anything to me: "were combined ... in ranges that are commonly 'in sample'".
line 360: since the tails of the distributions are controlled by the expert opinions, it seems to me obvious that they "seem credible". Couldn't they be compared to the outcomes of a more classical regional flood frequency analysis?

Citation: https://doi.org/10.5194/egusphere-2023-39-RC1
- AC1: 'Reply on RC1', Guus Rongen, 22 Apr 2023
  
  The authors would like to thank the reviewer for the detailed and constructive review. The reviewer mentions valid points. We (the authors) have added a response to each of them. Please find the detailed response in the attached file.
  
  Citation: https://doi.org/10.5194/egusphere-2023-39-AC1
RC2:
'Comment on egusphere-2023-39', Anonymous Referee #2, 15 Apr 2023

This study investigates through the Cooke's method how scientific judgments by experts can assist flood-risk managers. In my opinion, there are several issues that need to be addressed so that the applied methods, justification, and results, can be clearer and of practical use to other case studies. Please see several such comments and suggestions below:
1) The main point raised by the authors for someone to use the suggested method is that "...existing statistical and hydrological models that estimate these discharges often lack transparency regarding the uncertainty of their predictions..."; however, please note that the purpose of the probabilistic analysis is exactly this one (i.e., to estimate and take into consideration the uncertainty and variability of predictions of the input and output parameters of a flood model; see for example, a review, applications, and discussion on the uncertainty of flood parameters through benchmark examples in Dimitriadis et al., 2016). I would suggest not comparing with such methods (which are plenty in the literature), but focusing on the advantages and limitations of the proposed method.
Dimitriadis, P., A. Tegos, A. Oikonomou, V. Pagana, A. Koukouvinos, N. Mamassis, D. Koutsoyiannis, and A. Efstratiadis, Comparative evaluation of 1D and quasi-2D hydraulic models based on benchmark and real-world applications for uncertainty assessment in flood mapping, Journal of Hydrology, 534, 478–492, doi:10.1016/j.jhydrol.2016.01.020, 2016.
2) The fact that "...the devastating flood event that occurred in July 2021... was not captured by the existing model for estimating design discharges.", is not for the statistical methods to blame (or replace), but a more appropriate analysis by experts should have been performed. For example, there is an application shown in Figure 10 (in Dimitriadis et al., 2016), where there was a certain flooded area that could not be captured by a 1D model (due to the 1D nature of the model that cannot account for a 180 degrees turn of the water, since only 1 direction is possible within a cross-section), whereas this area can be captured if a 2D (or quasi-2D) model is applied. However, only an expert in flood modeling could identify this (e.g., the authors state that "The study demonstrates that utilizing hydrological experts in this manner can provide plausible results with a relatively limited effort, even in situations where measurements are scarce or unavailable."). If this is what the authors are trying to highlight in this work (i.e., that the flood models should not be blindly applied by non-experts), then this is a strong and important statement, which however needs to be further discussed.
3) Please consider rephrasing the sentence "Quantifying events that are more extreme than ever measured (i.e., with return levels that are longer than the time period of representative measurements), requires extrapolating from available data or knowledge.", since it is not exactly true. The return period T corresponds to a probability of occurrence (i.e., on average, a storm event is expected to occur in T years) and not a deterministic occurrence that involves any kind of extrapolations or specific (i.e., 5th, 95th etc.) quantiles (please see the mathematical definitions and methods for extreme analysis and probability fitting in a recent work by Koutsoyiannis, 2022).
Koutsoyiannis, D., Replacing histogram with smooth empirical probability density function estimated by K-moments, Sci, 4 (4), 50, doi:10.3390/sci4040050, 2022.
4) The application of Cooke's method to the specific study is not very clear to me. For example, the authors state that "A simple statistical model was developed for the river basin, consisting of correlated GEV-distributions for discharges in upstream sub-catchments. The model was fitted to expert judgments, measurements, and the combination of both, using Markov chain Monte Carlo. Results from the model fitted only to measurements were accurate for more frequent events, but less certain for extreme events."; since they were all experts and applied the same model, how come they came up with different results, did they use different methods, and what are these methods? where did the experts base their reply, did they perform also simulations or just probabilistic fitting?
5) In my opinion, it is not very appropriate to apply a Monte-Carlo method with so few samples; please consider including more samples. Also, how come "The combined approach provided the most plausible results, with Cooke’s method reducing the uncertainty by appointing most weight to two of the seven experts."; why the authors have selected these 2 scientists; were these two more experts than the other scientists?
6) More details are required to back up the statement "The discharge at the Dutch border exceeded the flood events of 1926, 1993, and 1995. Contrary to those events, this flood occurred during summer, a season that is (or was) often considered irrelevant for extreme discharges on the Meuse."; please perform a proper statistical analysis and identify for each season the appropriate probability distribution to show at what discharge the probability of occurrence in the summer season exceeds the selected return period.
7) Regarding the comments "The event was thus surprising in multiple ways. This might happen when we experience a new extreme, but given that Dutch flood risk has safety standards up to once per 100,000 years (Ministry of Infrastructure and Environment, 2016) one would have hoped this to be less of a surprise." and "While most studies aimed at obtaining better estimates of discharge extremes use hydrological or statistical modeling, some follow the approach of using expert judgment (EJ).", please note that this is a must point in every scientific application, since when non-experts apply methods they do not understand, it could lead to failure regardless the magnitude of the selected return period.
8) It is mentioned that "For the Dutch rivers Meuse and Rhine, the GRADE instrument is used for this. It generates 50,000 years of rainfall and discharges."; please give more details on this model and how it generates so long rainfall and discharge timeseries (does it use a stochastic simulation approach for the rainfall annual extremes and input these to a hydraulic model to produce the discharge at a specific location in the area of interest?).

Citation: https://doi.org/10.5194/egusphere-2023-39-RC2
- AC2: 'Reply on RC2', Guus Rongen, 22 Apr 2023
  
  The authors would like to thank the reviewer for reviewing the article. We have written a response to the reviewer’s comments below. Please find it in the document attached.
  
  Citation: https://doi.org/10.5194/egusphere-2023-39-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Reconsider after major revisions (further review by editor and referees) (25 May 2023) by Daniel Viviroli

AR by Guus Rongen on behalf of the Authors (06 Jul 2023) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (03 Aug 2023) by Daniel Viviroli

RR by Anonymous Referee #2 (03 Sep 2023)

Suggestions for revision or reasons for rejection

The Authors have addressed all comments of the previous review; please see my replies to initiate a scientific discussion with the aim of improving the manuscript:

1) I understand the Authors' reply, but I still cannot comprehend what exactly is defined as "an expert's judgment". The traditional approach of an expert's judgment can be based only on models/methods. For example, the Authors mention that "Expert judgment (EJ), in terms of making estimates or verifying observations based on prior knowledge, is often unknowingly applied in everyday practice by researchers and practitioners"; however, in order to make an estimate one requires a model/method and historical observations for fitting/calibration, verification, and validation. Also, what do the Authors mean by "unknowingly"? An expert should know exactly what is (s)he doing and what are the impacts of the applied assumptions. Moreover, I do not entirely agree with this statement and Authors' approach for a different reason; even if the same expert applied different (equally justified) models to the same area (i.e., same case-study, initial/boundary conditions, same input, etc.), then it is certainly expected that the output would be different but not wrong (this is illustrated in the Dimitriadis et al., 2016 study). I think that this procedure is equivalent to the one where multiple (equally qualified) experts used different models/methods in the case study (which is my understanding that this is what the Authors illustrate in their study). However, even if a model/expert is closer to observation does not necessarily mean that this model/expert is better and should be assigned a larger weight coefficient, but (by assuming that all models/experts are equally justified/qualified) that there is an intrinsic uncertainty enclosed in different models/experts, which we should take into account in our flood risk management strategy rather than trying to narrow it down. The danger here is that in a future event, and since all models/experts are equally justified/qualified, the model/expert that was worst in the previous case-study could be now closer to the true observation, and therefore, would be wrong to have assigned a smaller weight coefficient. This would happen because after a limit the uncertainty is intrinsic and can be no longer removed/narrowed but rather only quantified, modelled, and considered in the management strategy. Please note that this is different than applying a wrong model/assumption in a case-study (as explained in my example in the previous review). If the Authors are certain that all scientists are experts, then I would recommend just quantifying the variability of their judgment/output (i.e., treating them as different 'models', equally correct and justified, as performed in the study by Dimitriadis et al., 2016), and assign an equal weight-coefficient.

2) Regarding the reply to the 2nd comment, please present in a clear way what method/model/observations etc., has each expert used to derive his/her results, since "the ability to use one’s experience to verify observations." is not a clear definition of an "expert judgment"; for example, what do you mean by "ability"? The only reason I can think of that one expert came up with a different output is that (s)he used different input, initial/boundary conditions, and/or methods in their thinking procedure (as explained in the previous comment). Specifically for the extreme analysis, if, by applying a method/model, the results constantly deviate from observations, then this would mean that the method/model is wrong, should be re-examined, and should be not taken into account in the management risk assessment through the Cookes method (in the recent book by Koutsoyiannis, 2022, there are plenty examples how one could severely underestimate the extremes if the assumptions are not correct, as in ignoring dependence, in assigning less robust or even invalid statistical estimators, in applying less accurate statistical distributions, etc.).

3) I agree with the Authors' reply.

4) But what are these components the Authors refer to in their reply and in the manuscript? This is important so that the Readers are able to criticize the experts' methods/models.

5) But what if more experts join this project? More importantly, what if an expert's good judgment (i.e., closer to the true observation) was achieved by accident, and his/her assumptions no longer work for a future event where the conditions have changed?

6) No there is no need, I trust the Authors' judgment.

7) Agreed.

8) I understand the Authors' reply and I am aware of the GRADE model. However, please understand that it is difficult to trust non-published material, when also, at the same time, hundreds of scientists struggle to find better and more accurate mathematical models to generate long-range rainfall and discharge timeseries. Also, it is clear from the results that the GRADE performed equally (if not better) than the experts' methods/models, and therefore, experts should base their judgment on this model to improve their own judgment.

Hide

RR by Anonymous Referee #1 (13 Oct 2023)

Suggestions for revision or reasons for rejection

I must say the Authors have considered most of my comments (and the other Reviewer's too) and have improved the manuscript.

However, I am still skeptical about the fact that the ability to guess the magnitude of small floods implies the ability to guess the magnitude of large ones. I know that the experiment cannot be changed now but I am not satisfied by the discussion of the alternative. The Authors motivate their choice in Section 5.1 by saying that the 10-yr flood is a better target than model parameters because it is "observed". I would say that, as a quantile of the model/distribution, it is not observed, it is a model characteristic such as the moments or parameters. I find the "defense" of the choice in Section 5.1 rather weak. I would suggest indicating that an alternative choice could have been taken and, for example, could be tested in future work. The alternative choice (e.g., Renard et al., 2006, doi:10.1007/s00477-006-0047-4) is possible and, I would say, preferable. I strongly suggest that the Authors read Renard et al. (2006), as suggested in my first review, and discuss that alternative method in this paper.

Regarding the Bayesian method, the Authors have made two major changes, i.e., using a reasonable prior for the GEV shape parameter, and removing the ad-hoc "weighting" procedure used in the first version of the paper. This is good. However, the language used should be corrected. I've never heard of prior likehood or posterior likehood in Bayesian statistics. I don't think the wording exists, please use proper wording (see e.g., https://en.wikipedia.org/wiki/Bayesian_inference#Formal_description_of_Bayesian_inference or any other Bayesian basics reference). Besides, since the equation on page 19 of the track-change document (no equation and line numbers there) differs from Eq. (6) in the original paper (the 10/N_i is no more in there), how comes that the results do not vary significantly? I would have liked to have an explanation in the reply to the reviewers (not in the new manuscript, of course). Also, since the expert judgment is considered as a prior now, as the Authors claim, the equation on page 19 of the track-change document should express it in terms of model parameters and therefore I would have expected a Jacobian in front of g(F^-1(1-p|theta)) (see e.g. http://mystatisticsblog.blogspot.com/2018/04/jacobian-transformation-and-uniform.html).

Hide

ED: Reconsider after major revisions (further review by editor and referees) (03 Nov 2023) by Daniel Viviroli

AR by Guus Rongen on behalf of the Authors (14 Dec 2023) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (14 Dec 2023) by Daniel Viviroli

RR by Anonymous Referee #2 (08 Feb 2024)

Suggestions for revision or reasons for rejection

Regarding the authors' reply in the 1st comment of the previous review: If the experts perfectly estimate one of the three (i.e., 5%, 50%, 95%) quantiles of the 10-year discharge for each tributary, In my opinion, the actual values of the percentiles should be shown/compared, and not what is shown in Fig. 4, which is confusing. First of all, for what percentile (5%, 50%, 95%) are the estimates are shown in Fig. 4 entitled "Seed question realizations compared to each expert’s estimates". This Figure shows the uncertainty/distribution (of the 5%, 50%, or 95% percentile) as constructed from 70 values (7 experts times 10 estimates per tributary)? If yes, is it correct to construct the distribution (of the 5%, 50%, or 95% percentile) from all the estimates while some seem to be completely off (i.e., with the exception of the expert D and maybe E, the rest experts seem to have estimates with a low probability of occurrence based on the constructed distribution).

2) Regarding "Because the goal is to elicit uncertainty, experts estimate percentiles rather than a single value. Typically, these are the 5th, 50th, and 95th percentile.", why not have asked them to also estimate the mean and variance, which are very useful (for example, why calculate the 10-year estimate from these 3 quantiles and through a distribution fitting rather than ask the experts to give at least the mean of their estimates)?

3) For using the Metalog distribution, the authors state that "This distribution is capable of exactly fitting any three percentile estimate.", but many flexible 3-parameter distributions can be fitted by 3 estimates. Similarly for the ratio, where the log-normal distribution is fitted, I think that these distributions should be used in caution (for example in expressions like "as it is unlikely that the 1,000-year discharge is lower than the highest on record"), since they may confuse the readers thinking that these are the actual distributions estimated in this study for the percentiles and discharge ratios, whereas only a few data are used for the fitting and thus, they do not capture other attributes of the distributions (e.g., its tail, etc.; for example, it is shown that streamflows follow a heavy-tail distribution, and thus, the discharge ratio should have a similar tail definitely heavier from the log-normal's one).

4) I am also concerned about the assumption “An implicit assumption is that the experts’ ability to estimate the seed variables (a 10-year discharge) reflects their ability to estimate the target variables (a 1000-year discharge).". The 10-year discharge is a not-so-extreme value, while the 1000-year discharge is considered extreme. It has been shown that streamflows follow a heavy-tail distribution (see, if found useful, the largest performed global analysis in Fig. 11 of https://www.mdpi.com/2306-5338/8/2/59, where streamflow is shown to be almost as heavy-tailed as precipitation, which is known to follow Pareto-tail as indicated and extensively discussed in https://www.itia.ntua.gr/en/docinfo/2000/), and so, an expert may have a rainfall-runoff model that is good only in estimating regular discharges rather than extreme ones (or the other way around) that require a separate rainfall-extreme analysis (since the 1000-year rainfall cannot be easily estimated from the observations). I would recommend reflecting on this issue in the Abstract, Conclusions, and maybe even the Title.

5) Regarding the "However, an informative prior was added to the shape parameter because, with only expert estimates and no data, two discharge estimates are not sufficient for fitting the three parameters of the GEV-distribution. Additionally, the variance in the shape-parameter decreases with increasing number of years (or other block maxima) in a time series. The 30 to 70 annual maxima per tributary in this study are not sufficient to reach convergence.". These are all discussed and analyzed in Koutsoyiannis 2004 (a,b), where it is suggested (Fig. 5-6 in 2004a and Fig. 10-11 in 2004b) that small sizes of records, e.g. 20–50 years hide the distribution's EV2 shape parameter around 0.15 + 0.05 (e.g., in Fig. 13 of 2004b, as estimated from only the largest-lengthed precipitation records above 100 years).

D. Koutsoyiannis, Statistics of extremes and estimation of extreme rainfall, 1, Theoretical investigation, Hydrological Sciences Journal, 49 (4), 575–590, doi:10.1623/hysj.49.4.575.54430, 2004a.
D. Koutsoyiannis, Statistics of extremes and estimation of extreme rainfall, 2, Empirical investigation of long rainfall records, Hydrological Sciences Journal, 49 (4), 591–610, doi:10.1623/hysj.49.4.591.54424, 2004b.

6) It is mentioned that "When estimates on uncertain extremes is needed, which cannot satisfactorily be derived (exclusively) from a (limited) data-record, the presented approach provides a means of supplementing this information. Structured expert judgment provides an approach of deriving defensible priors, while the Bayesian framework offers flexibility for incorporating these into probabilistic results by adjusting the likelihood of input or output parameters.". However, when estimates on extremes are needed, one requires the best statistical approaches in the literature (if direct streamflow records are available) or some, equivalently robust, rainfall-runoff models (if only rainfall records are available) that can capture several hydrodynamic aspects of the selected area (as explained in my previous reviews). From either approaches, one can then estimate the uncertainty of the results from these approaches or models. This is not equivalent (and should not be confused) with some experts using (different or even the same) statistical approaches or models in a robust (or maybe incorrect) manner.
Additionally, I would follow a more traditional approach, and see which of the expert(s) seem to achieve (in general or for each tributary) better performances in their predictions (which would mean that they have a better understanding of the area and their applied models/methods), and I would follow their suggestions and not the ones from the rest of the experts that they did not perform well.
I respect the authors' work and I would appreciate their reply to this, which is at the core of their paper.

7) In Figure 5, please indicate the observed/fitted 50th percentile of the 10-year and the 1000-year (through fitting model) discharges to compare with the experts' estimates.

Hide

ED: Reconsider after major revisions (further review by editor and referees) (14 Feb 2024) by Daniel Viviroli

AR by Guus Rongen on behalf of the Authors (22 Mar 2024) Author's response Manuscript

EF by Sarah Buchmann (26 Mar 2024) Author's tracked changes Supplement

ED: Referee Nomination & Report Request started (27 Mar 2024) by Daniel Viviroli

RR by Anonymous Referee #2 (24 Apr 2024)

ED: Publish as is (24 Apr 2024) by Daniel Viviroli

AR by Guus Rongen on behalf of the Authors (03 May 2024) Manuscript

Journal article(s) based on this preprint

03 Jul 2024

Using the classical model for structured expert judgment to estimate extremes: a case study of discharges in the Meuse River

Guus Rongen, Oswaldo Morales-Nápoles, and Matthijs Kok

Hydrol. Earth Syst. Sci., 28, 2831–2848, https://doi.org/10.5194/hess-28-2831-2024,https://doi.org/10.5194/hess-28-2831-2024, 2024

Short summary

Guus Rongen, Oswaldo Morales-Nápoles, and Matthijs Kok

Supplement

https://doi.org/10.5194/egusphere-2023-39-supplement

Guus Rongen, Oswaldo Morales-Nápoles, and Matthijs Kok

Viewed

Total article views: 778 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
470	267	41	778	76	30	29

HTML: 470
PDF: 267
XML: 41
Total: 778
Supplement: 76
BibTeX: 30
EndNote: 29

Views and downloads (calculated since 07 Mar 2023)

Month	HTML	PDF	XML	Total
Mar 2023	186	58	9	253
Apr 2023	80	21	9	110
May 2023	22	5	0	27
Jun 2023	11	3	2	16
Jul 2023	17	14	2	33
Aug 2023	17	5	0	22
Sep 2023	18	7	3	28
Oct 2023	13	8	3	24
Nov 2023	2	1	3
Dec 2023	6	10	1	17
Jan 2024	15	15	1	31
Feb 2024	15	67	1	83
Mar 2024	15	24	1	40
Apr 2024	12	6	2	20
May 2024	9	7	2	18
Jun 2024	31	17	4	52
Jul 2024	1	0	1

Cumulative views and downloads (calculated since 07 Mar 2023)

Month	HTML	PDF	XML	Total
Mar 2023	186	58	9	253
Apr 2023	80	21	9	110
May 2023	22	5	0	27
Jun 2023	11	3	2	16
Jul 2023	17	14	2	33
Aug 2023	17	5	0	22
Sep 2023	18	7	3	28
Oct 2023	13	8	3	24
Nov 2023	2	1	3
Dec 2023	6	10	1	17
Jan 2024	15	15	1	31
Feb 2024	15	67	1	83
Mar 2024	15	24	1	40
Apr 2024	12	6	2	20
May 2024	9	7	2	18
Jun 2024	31	17	4	52
Jul 2024	1	0	1

Viewed (geographical distribution)

Total article views: 768 (including HTML, PDF, and XML) Thereof 768 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 03 Jul 2024

Download

The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.

Preprint (1976 KB)
Metadata XML

Short summary

This study proposes a new method for predicting extreme flood levels in rivers like the Meuse. The current has shown to be unreliable as it did not predict a recent flood. We have developed a model that includes information from experts and combines this with measurements. We found that this approach gives more accurate predictions, particularly for extreme events. The research is important for predictions of extreme flood levels that are necessary for protecting communities against floods.


Total:	0
HTML:	0
PDF:	0
XML:	0