the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Using structured expert judgment to Estimate extreme river discharges: a case study of the Meuse River
Abstract. Accurate estimation of extreme discharges in rivers, such as the Meuse, is crucial for effective flood risk assessment. However, existing statistical and hydrological models that estimate these discharges often lack transparency regarding the uncertainty of their predictions, as evidenced by the devastating flood event that occurred in July 2021 which was not captured by the existing model for estimating design discharges. This article proposes an alternative approach with a central role for expert judgment, using Cooke’s method. A simple statistical model was developed for the river basin, consisting of correlated GEVdistributions for discharges in upstream subcatchments. The model was fitted to expert judgments, measurements, and the combination of both, using Markov chain Monte Carlo. Results from the model fitted only to measurements were accurate for more frequent events, but less certain for extreme events. Using expert judgment reduced uncertainty for these extremes but was less accurate for more frequent events. The combined approach provided the most plausible results, with Cooke's method reducing the uncertainty by appointing most weight to two of the seven experts. The study demonstrates that utilizing hydrological experts in this manner can provide plausible results with a relatively limited effort, even in situations where measurements are scarce or unavailable.
 Preprint
(1976 KB)  Metadata XML

Supplement
(5342 KB)  BibTeX
 EndNote
Status: final response (author comments only)

RC1: 'Referee Comment on egusphere202339', Anonymous Referee #1, 29 Mar 2023
The paper provides the result of an interesting experiment in which flood experts are asked to guess the flood frequency curves for several sites in a region without access to discharge data, and the information is then used together with observed maximum annual flood peaks to improve the "credibility" of the estimated tails of the distributions. A procedure is developed to use expert opinions on several tributaries and transform them into an estimate for a downstream gauge.
The paper is original, as far as I can tell, and deals with an important issue in flood risk assessment, i.e., the formal use of expert opinion in flood frequency analysis. Even though I liked reading the paper, there are some parts that, in my opinion, need to be improved, clarified, better explained or discussed before publication. My main concerns are:
1) I am not sure that the ability of the expert in providing a judgement on the flood frequency curve can be measured by her/his ability in guessing the 10yr flood in absolute terms. If one wants the expert to help in reducing uncertainty in the tails of the distribution, she/he should inform us on how large floods may compare to small floods, by reasoning on the driving processes. In the end, it is the shape of the flood frequency distribution that's hard to get with local data, not the location. The proposed method seems to be tailored for getting the order of magnitude right, i.e., the flood magnitude in m^3/s, but not how surprising can large extreme events be compared to the more frequent ones.
2) the expert information is accounted for as data (part of the likelihood) using an adhoc procedure, which seems to me inconsistent with the Bayesian way. Why not accounting for expert judgement as prior information? That would be the natural Bayesian way to do it: since the experts give their estimates without using discharge data, this can be considered as prior information.
3) given the procedure proposed, the tail of the distribution is controlled by expert judgement with a strength that is related to the subjective choice of the weight given to the expert "data" compared to the observed data. The result of the procedure is then assessed as credible/reasonable, but how could it not be so? From what I've understood, the procedure seems to allow a way to tweak subjectively the shape of the flood frequency distribution.
4) the results are not assessed against a benchmark. Why not using regional flood frequency analysis as a benchmark?
5) some of the methodological steps are unclear, sometimes, and should be properly explained (see the detailed comments below).
Detailed comments:
line 8: MCMC is just a tool. I would say here that you use Bayesian inference.
line 17: the 2021 peak at Borgharen is the highest but does not seem surprisingly high, looking at Figure 5. I think the same event has been much more surprising in other, smaller catchments. Even though it is surprising for the summer season, as I understand, your analysis later is not done accounting for seasonality. I would even expect that, if asked for the summer flood frequency curve, the experts would underestimate the probability of such an event.
line 30: here the text suggests that hydrological model simulations outperform statistical methods in flood frequency analysis. Has this been demonstrated in the literature? As far as I know, statistical models tailored for flood frequency analysis are more accurate than other methods both in gauged and ungauged basins (see Bloeschl et al., 2013, ISBN:9781107028180). Besides, despite some advantages, you clearly show limitations for the hydrological modelling approach in the discussion until line 45. Since the accurate estimate of the distribution tails is of interest, why don't you mention regional flood frequency analysis and inclusion of historical events as ways of increasing the robustness (and reducing the uncertainty) of the estimates? Besides, aren't design flows available from a regional frequency analysis in the area, e.g. to be used as a benchmark?
line 43: I don't get the factor 3 vs. 1.4 sentence. What is the "outcome"?
lines 6568: spoiler alert! I would move this sentence after the results section.
line 79: I don't get the meaning of the sentence "The discharge estimates for this catchment are therefore only used for expert calibration, as the flow is part of the French Meuse flow".
line 85: I would add a table here in the main text summarizing the data provided to the experts.
line 107: not having some more details on the construction of the correlation matrices is a pity. It would have been wise to publish that paper first.
line 109: Each variable is modelled by a marginal distribution, it is not a distribution.
Section 3.2: I am not sure that the ability of the expert in providing a judgement on the flood frequency curve can be measured by her/his ability in guessing the 10yr flood in absolute terms. If you want the expert to help in reducing uncertainty in the tails of the distribution, she/he should inform you on how large floods may compare to small floods, by reasoning on the driving processes. In the end, it is the shape of the flood frequency distribution that's hard to get with local data, not the location. Your ranking seems to me tailored for getting the order of magnitude right, but not how surprising can large extreme events be compared to the more frequent ones. I know this cannot be done now but I would have asked the experts to guess the ratios between the 10yr event and the mean event, and between the 100yr event and 10yr event, and so on, in order to get their perception on the shape of the distribution. Maybe you could discuss the idea in the discussion section, if you see that fit.
line 151: "a training exercise"
line 154: are the 26 questions made available somewhere?
line 173: the weakly informed prior in Appendix A is very peculiar to me. I imagine very strange parameter combinations, very far from what could be expected for floods, are given the same weight than more reasonable ones, and some reasonable ones are excluded because of the bound at 10000. Why not the usual priors for the GEV distribution when dealing with floods, i.e., unbounded uniform for location and for the log of the scale and the Martins and Stedinger (2000, doi:10.1029/1999WR900330) geophysical prior, or similar ones, for the shape parameter?
lines 185195: here the expert information is accounted for as data (part of the likelihood). Why not accounting for it as prior information? That would be the natural way to do it: since the experts give their estimates without using discharge data, this can be considered prior information. For getting the prior distribution of the parameters from the prior assessment of the quantiles, one could use the procedure described in Renard et al. (2006, doi:10.1007/s0047700600474), for example. This would avoid the subjective choice of weights presented in lines 196205, which actually control the fit of the tail of the flood frequency curves. Also, this would provide a more defendable prior than the one discussed in Appendix A.
line 196: loglikelihoods are summed
line 206: please indicate in which equation (and with what symbol) the "factor between the tributaries’ sum and the downstream discharge" has been introduced. Is it the one in Eq. (1)? And what are the observations to which a lognormal distribution is fitted? I am confused here.
Section 3.4: I am sorry but I don't understand the procedure at all. I wish I could suggest how to improve points 1 and 2, but I can't figure out what they do mean.
Lines 269278: here it seems evident to me that the objective assigned to the expert is to guess a reasonable mean annual peak discharge, in m^3/s, but not so much the shape of the growth curve. Afterwards, the Cook's method values the experts in how well they get the order of magnitude of flood discharges right, more than the shape of the distribution. Is this what we need to inform our analysis about how extreme can large floods be?
Line 290: "not too steep"
Figure 5: if I have understood well, the points in the third column should all be grey because discharges at Borgharen are not used in the fit. Am I right?
Line 308: I don't get what the following sentence means: "Sampling from these wide uncertainty bounds will therefore (too) often result in a high discharge event".
Figure 6bc: it seems peculiar that combining the pieces of information that individually result in the blue and yellow distributions leads to the red one (e.g., the red mode is lower than the blue and yellow ones). Can you comment on that?
line 323: why are the median values considered best estimates?
line 330: I don't understand the sentence.
line 340: but the experts knew about the 2021 event when doing the exercise and this has biased their estimates, I guess. How would have their estimates been different before 2021? That's hard to tell.
line 350: the following sentence doesn't mean anything to me: "were combined ... in ranges that are commonly 'in sample'".
line 360: since the tails of the distributions are controlled by the expert opinions, it seems to me obvious that they "seem credible". Couldn't they be compared to the outcomes of a more classical regional flood frequency analysis?
Citation: https://doi.org/10.5194/egusphere202339RC1  AC1: 'Reply on RC1', Guus Rongen, 22 Apr 2023

RC2: 'Comment on egusphere202339', Anonymous Referee #2, 15 Apr 2023
This study investigates through the Cooke's method how scientific judgments by experts can assist floodrisk managers. In my opinion, there are several issues that need to be addressed so that the applied methods, justification, and results, can be clearer and of practical use to other case studies. Please see several such comments and suggestions below:
1) The main point raised by the authors for someone to use the suggested method is that "...existing statistical and hydrological models that estimate these discharges often lack transparency regarding the uncertainty of their predictions..."; however, please note that the purpose of the probabilistic analysis is exactly this one (i.e., to estimate and take into consideration the uncertainty and variability of predictions of the input and output parameters of a flood model; see for example, a review, applications, and discussion on the uncertainty of flood parameters through benchmark examples in Dimitriadis et al., 2016). I would suggest not comparing with such methods (which are plenty in the literature), but focusing on the advantages and limitations of the proposed method.
Dimitriadis, P., A. Tegos, A. Oikonomou, V. Pagana, A. Koukouvinos, N. Mamassis, D. Koutsoyiannis, and A. Efstratiadis, Comparative evaluation of 1D and quasi2D hydraulic models based on benchmark and realworld applications for uncertainty assessment in flood mapping, Journal of Hydrology, 534, 478–492, doi:10.1016/j.jhydrol.2016.01.020, 2016.
2) The fact that "...the devastating flood event that occurred in July 2021... was not captured by the existing model for estimating design discharges.", is not for the statistical methods to blame (or replace), but a more appropriate analysis by experts should have been performed. For example, there is an application shown in Figure 10 (in Dimitriadis et al., 2016), where there was a certain flooded area that could not be captured by a 1D model (due to the 1D nature of the model that cannot account for a 180 degrees turn of the water, since only 1 direction is possible within a crosssection), whereas this area can be captured if a 2D (or quasi2D) model is applied. However, only an expert in flood modeling could identify this (e.g., the authors state that "The study demonstrates that utilizing hydrological experts in this manner can provide plausible results with a relatively limited effort, even in situations where measurements are scarce or unavailable."). If this is what the authors are trying to highlight in this work (i.e., that the flood models should not be blindly applied by nonexperts), then this is a strong and important statement, which however needs to be further discussed.
3) Please consider rephrasing the sentence "Quantifying events that are more extreme than ever measured (i.e., with return levels that are longer than the time period of representative measurements), requires extrapolating from available data or knowledge.", since it is not exactly true. The return period T corresponds to a probability of occurrence (i.e., on average, a storm event is expected to occur in T years) and not a deterministic occurrence that involves any kind of extrapolations or specific (i.e., 5th, 95th etc.) quantiles (please see the mathematical definitions and methods for extreme analysis and probability fitting in a recent work by Koutsoyiannis, 2022).
Koutsoyiannis, D., Replacing histogram with smooth empirical probability density function estimated by Kmoments, Sci, 4 (4), 50, doi:10.3390/sci4040050, 2022.
4) The application of Cooke's method to the specific study is not very clear to me. For example, the authors state that "A simple statistical model was developed for the river basin, consisting of correlated GEVdistributions for discharges in upstream subcatchments. The model was fitted to expert judgments, measurements, and the combination of both, using Markov chain Monte Carlo. Results from the model fitted only to measurements were accurate for more frequent events, but less certain for extreme events."; since they were all experts and applied the same model, how come they came up with different results, did they use different methods, and what are these methods? where did the experts base their reply, did they perform also simulations or just probabilistic fitting?
5) In my opinion, it is not very appropriate to apply a MonteCarlo method with so few samples; please consider including more samples. Also, how come "The combined approach provided the most plausible results, with Cooke’s method reducing the uncertainty by appointing most weight to two of the seven experts."; why the authors have selected these 2 scientists; were these two more experts than the other scientists?
6) More details are required to back up the statement "The discharge at the Dutch border exceeded the flood events of 1926, 1993, and 1995. Contrary to those events, this flood occurred during summer, a season that is (or was) often considered irrelevant for extreme discharges on the Meuse."; please perform a proper statistical analysis and identify for each season the appropriate probability distribution to show at what discharge the probability of occurrence in the summer season exceeds the selected return period.
7) Regarding the comments "The event was thus surprising in multiple ways. This might happen when we experience a new extreme, but given that Dutch flood risk has safety standards up to once per 100,000 years (Ministry of Infrastructure and Environment, 2016) one would have hoped this to be less of a surprise." and "While most studies aimed at obtaining better estimates of discharge extremes use hydrological or statistical modeling, some follow the approach of using expert judgment (EJ).", please note that this is a must point in every scientific application, since when nonexperts apply methods they do not understand, it could lead to failure regardless the magnitude of the selected return period.
8) It is mentioned that "For the Dutch rivers Meuse and Rhine, the GRADE instrument is used for this. It generates 50,000 years of rainfall and discharges."; please give more details on this model and how it generates so long rainfall and discharge timeseries (does it use a stochastic simulation approach for the rainfall annual extremes and input these to a hydraulic model to produce the discharge at a specific location in the area of interest?).
Citation: https://doi.org/10.5194/egusphere202339RC2  AC2: 'Reply on RC2', Guus Rongen, 22 Apr 2023
Viewed
HTML  XML  Total  Supplement  BibTeX  EndNote  

402  213  32  647  59  20  18 
 HTML: 402
 PDF: 213
 XML: 32
 Total: 647
 Supplement: 59
 BibTeX: 20
 EndNote: 18
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1