Using structured expert judgment to Estimate extreme river discharges: a case study of the Meuse River
Abstract. Accurate estimation of extreme discharges in rivers, such as the Meuse, is crucial for effective flood risk assessment. However, existing statistical and hydrological models that estimate these discharges often lack transparency regarding the uncertainty of their predictions, as evidenced by the devastating flood event that occurred in July 2021 which was not captured by the existing model for estimating design discharges. This article proposes an alternative approach with a central role for expert judgment, using Cooke’s method. A simple statistical model was developed for the river basin, consisting of correlated GEV-distributions for discharges in upstream sub-catchments. The model was fitted to expert judgments, measurements, and the combination of both, using Markov chain Monte Carlo. Results from the model fitted only to measurements were accurate for more frequent events, but less certain for extreme events. Using expert judgment reduced uncertainty for these extremes but was less accurate for more frequent events. The combined approach provided the most plausible results, with Cooke's method reducing the uncertainty by appointing most weight to two of the seven experts. The study demonstrates that utilizing hydrological experts in this manner can provide plausible results with a relatively limited effort, even in situations where measurements are scarce or unavailable.
Guus Rongen et al.
Status: open (until 03 May 2023)
- RC1: 'Referee Comment on egusphere-2023-39', Anonymous Referee #1, 29 Mar 2023 reply
Guus Rongen et al.
Guus Rongen et al.
Viewed (geographical distribution)
The paper provides the result of an interesting experiment in which flood experts are asked to guess the flood frequency curves for several sites in a region without access to discharge data, and the information is then used together with observed maximum annual flood peaks to improve the "credibility" of the estimated tails of the distributions. A procedure is developed to use expert opinions on several tributaries and transform them into an estimate for a downstream gauge.
The paper is original, as far as I can tell, and deals with an important issue in flood risk assessment, i.e., the formal use of expert opinion in flood frequency analysis. Even though I liked reading the paper, there are some parts that, in my opinion, need to be improved, clarified, better explained or discussed before publication. My main concerns are:
1) I am not sure that the ability of the expert in providing a judgement on the flood frequency curve can be measured by her/his ability in guessing the 10-yr flood in absolute terms. If one wants the expert to help in reducing uncertainty in the tails of the distribution, she/he should inform us on how large floods may compare to small floods, by reasoning on the driving processes. In the end, it is the shape of the flood frequency distribution that's hard to get with local data, not the location. The proposed method seems to be tailored for getting the order of magnitude right, i.e., the flood magnitude in m^3/s, but not how surprising can large extreme events be compared to the more frequent ones.
2) the expert information is accounted for as data (part of the likelihood) using an ad-hoc procedure, which seems to me inconsistent with the Bayesian way. Why not accounting for expert judgement as prior information? That would be the natural Bayesian way to do it: since the experts give their estimates without using discharge data, this can be considered as prior information.
3) given the procedure proposed, the tail of the distribution is controlled by expert judgement with a strength that is related to the subjective choice of the weight given to the expert "data" compared to the observed data. The result of the procedure is then assessed as credible/reasonable, but how could it not be so? From what I've understood, the procedure seems to allow a way to tweak subjectively the shape of the flood frequency distribution.
4) the results are not assessed against a benchmark. Why not using regional flood frequency analysis as a benchmark?
5) some of the methodological steps are unclear, sometimes, and should be properly explained (see the detailed comments below).
line 8: MCMC is just a tool. I would say here that you use Bayesian inference.
line 17: the 2021 peak at Borgharen is the highest but does not seem surprisingly high, looking at Figure 5. I think the same event has been much more surprising in other, smaller catchments. Even though it is surprising for the summer season, as I understand, your analysis later is not done accounting for seasonality. I would even expect that, if asked for the summer flood frequency curve, the experts would underestimate the probability of such an event.
line 30: here the text suggests that hydrological model simulations outperform statistical methods in flood frequency analysis. Has this been demonstrated in the literature? As far as I know, statistical models tailored for flood frequency analysis are more accurate than other methods both in gauged and ungauged basins (see Bloeschl et al., 2013, ISBN:9781107028180). Besides, despite some advantages, you clearly show limitations for the hydrological modelling approach in the discussion until line 45. Since the accurate estimate of the distribution tails is of interest, why don't you mention regional flood frequency analysis and inclusion of historical events as ways of increasing the robustness (and reducing the uncertainty) of the estimates? Besides, aren't design flows available from a regional frequency analysis in the area, e.g. to be used as a benchmark?
line 43: I don't get the factor 3 vs. 1.4 sentence. What is the "outcome"?
lines 65-68: spoiler alert! I would move this sentence after the results section.
line 79: I don't get the meaning of the sentence "The discharge estimates for this catchment are therefore only used for expert calibration, as the flow is part of the French Meuse flow".
line 85: I would add a table here in the main text summarizing the data provided to the experts.
line 107: not having some more details on the construction of the correlation matrices is a pity. It would have been wise to publish that paper first.
line 109: Each variable is modelled by a marginal distribution, it is not a distribution.
Section 3.2: I am not sure that the ability of the expert in providing a judgement on the flood frequency curve can be measured by her/his ability in guessing the 10-yr flood in absolute terms. If you want the expert to help in reducing uncertainty in the tails of the distribution, she/he should inform you on how large floods may compare to small floods, by reasoning on the driving processes. In the end, it is the shape of the flood frequency distribution that's hard to get with local data, not the location. Your ranking seems to me tailored for getting the order of magnitude right, but not how surprising can large extreme events be compared to the more frequent ones. I know this cannot be done now but I would have asked the experts to guess the ratios between the 10-yr event and the mean event, and between the 100-yr event and 10-yr event, and so on, in order to get their perception on the shape of the distribution. Maybe you could discuss the idea in the discussion section, if you see that fit.
line 151: "a training exercise"
line 154: are the 26 questions made available somewhere?
line 173: the weakly informed prior in Appendix A is very peculiar to me. I imagine very strange parameter combinations, very far from what could be expected for floods, are given the same weight than more reasonable ones, and some reasonable ones are excluded because of the bound at 10000. Why not the usual priors for the GEV distribution when dealing with floods, i.e., unbounded uniform for location and for the log of the scale and the Martins and Stedinger (2000, doi:10.1029/1999WR900330) geophysical prior, or similar ones, for the shape parameter?
lines 185-195: here the expert information is accounted for as data (part of the likelihood). Why not accounting for it as prior information? That would be the natural way to do it: since the experts give their estimates without using discharge data, this can be considered prior information. For getting the prior distribution of the parameters from the prior assessment of the quantiles, one could use the procedure described in Renard et al. (2006, doi:10.1007/s00477-006-0047-4), for example. This would avoid the subjective choice of weights presented in lines 196-205, which actually control the fit of the tail of the flood frequency curves. Also, this would provide a more defendable prior than the one discussed in Appendix A.
line 196: log-likelihoods are summed
line 206: please indicate in which equation (and with what symbol) the "factor between the tributaries’ sum and the downstream discharge" has been introduced. Is it the one in Eq. (1)? And what are the observations to which a log-normal distribution is fitted? I am confused here.
Section 3.4: I am sorry but I don't understand the procedure at all. I wish I could suggest how to improve points 1 and 2, but I can't figure out what they do mean.
Lines 269-278: here it seems evident to me that the objective assigned to the expert is to guess a reasonable mean annual peak discharge, in m^3/s, but not so much the shape of the growth curve. Afterwards, the Cook's method values the experts in how well they get the order of magnitude of flood discharges right, more than the shape of the distribution. Is this what we need to inform our analysis about how extreme can large floods be?
Line 290: "not too steep"
Figure 5: if I have understood well, the points in the third column should all be grey because discharges at Borgharen are not used in the fit. Am I right?
Line 308: I don't get what the following sentence means: "Sampling from these wide uncertainty bounds will therefore (too) often result in a high discharge event".
Figure 6bc: it seems peculiar that combining the pieces of information that individually result in the blue and yellow distributions leads to the red one (e.g., the red mode is lower than the blue and yellow ones). Can you comment on that?
line 323: why are the median values considered best estimates?
line 330: I don't understand the sentence.
line 340: but the experts knew about the 2021 event when doing the exercise and this has biased their estimates, I guess. How would have their estimates been different before 2021? That's hard to tell.
line 350: the following sentence doesn't mean anything to me: "were combined ... in ranges that are commonly 'in sample'".
line 360: since the tails of the distributions are controlled by the expert opinions, it seems to me obvious that they "seem credible". Couldn't they be compared to the outcomes of a more classical regional flood frequency analysis?