the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A probabilistic view of extreme sea level events in the Baltic Sea
Abstract. The importance of accounting for extreme sea level events is often an integrate part in any risk mitigation strategies that aims to protect present and future coastal infrastructure. The Extreme value theory (EVT) gives a probabilistic framework for studying such events. However, the conventional methods used in the application of EVT are often restrictive, since they are generally confined to location where there are sufficiently long tide gauge observation data, while simultaneously fail to obtain good estimates of lower probability events.
In this article, we use the Bayesian hierarchical modeling paradigm and the Block Maxima method in the EVT, to estimate the extreme sea level event that occur on average ones every e.g. 1000 years. Four novel models are presented in this study, and each of the models incorporates both missing values and spatial dependency structures to obtain estimates of such extreme events, with a varying complex dependency structure. In addition, two of the models (Hilbert and Latent) allow for the estimation of extreme sea level events at both gauged and ungauged locations.
The results of this study show that Hilbert and Latent obtain good estimates with a reduced uncertainty range for both higher and lower probability events. From the in and out-of-sample evaluation, it follows that the two model’s out-performance the conventional method of combining maximum likelihood estimates and bootstrapping, when comparing the uncertainty range, for the estimates of extreme sea level events that occurs on average ones every e.g. 100 years and up to 100000 years.
- Preprint
(14948 KB) - Metadata XML
-
Supplement
(89055 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on egusphere-2025-1257', Anonymous Referee #1, 16 May 2025
-
RC2: 'Comment on egusphere-2025-1257', Anonymous Referee #2, 25 Aug 2025
I believe that the present paper is unsuitable for publication. There may be something useful here, but in its present state referees/readers spend too much time attempting to see past presentational problems (errors of grammar, expression, syntax, inconsistent notation, typos, poor graphics, incorrect references, …) that undermine any understanding of the appropriateness of the methods and reliability of the substantive results.
A general comment: while it is good that the authors provide extensive code to document what they did, I do not think it reasonable to expect reviewers to have to parse code (possibly in a language they do not know) when refereeing a paper. A well-prepared and readable (and not too long!) supplement in English is also needed. I did not read all the code, but there seem to be many points treated in it that are not mentioned in the paper itself, and which might if properly explained reassure the reader that all is well (despite concerns generated by the paper).
From what I can understand (but I may be wrong for the reasons mentioned above), the paper applies standard methods from the statistics of extremes (or extreme-value theory, EVT) and Bayesian hierarchical modelling (BHM) to data series of varying lengths on extreme sea levels in parts of the Baltic and North Seas. It is claimed that hierarchical models that take account of spatial relationships between the tide gauge sites perform better than other such models, in terms of better predicting the results from individual maximum likelihood fits to the tide series maxima (which are treated as ground truth). This is hardly surprising, even if the extent of the improvement is marked (but see below) and the application to this particular region may be novel. At line 519 the authors claim that what is novel is the use of coordinate-dependent random functions and random coefficients using priors based on kernel density estimates taken from the same data. If this statement is true, then the random functions and coefficients might not be identifiable (but so far as I can see only the coefficients, and not the functions themselves, are random), but the gain in precision would be illusory because it would partly stem both from (improperly) using the data twice (or three times if the comparison against the `baseline' maximum likelihood fit is included).
Here are some of the apparent problems with the science:
(a) on page 3 we are told that $\zeta(s)$ is a weakly stationary (not weak stationary!) series corresponding to the observed sea level. However the sea levels are not weakly stationary, as they are subject to tidal effects, seasonality and the lunar cycle — indeed, the authors say this at line 75. The block maxima used may be stationary (though we are told in the initial sentences on page 1 that the mean sea level is rising), but the underlying data certainly are not. It would suffice here to simply say that the maxima are treated as realisations of (conditionally) independent GEV random variables, not invoke clearly incorrect reasoning to justify this. It would be necessary to check this assumption using QQplots or other suitable methods (not density plots, see below) for the assessment of fit;
(b) the fitted models treat the annual maxima as independent, conditional on the parameter surfaces. However this is untrue, because there may be common causes for annual maxima (e.g., particular tidal or meteorological conditions). This dependence is mentioned (line 90) but the opposite statement is also made (lines 217, 223). It is not clear whether this matters in terms of point estimation (it appears that the goal is improved estimation of return levels at individual locations, as well as the possibility of prediction at ungauged locations) but the authors don’t seem to realise that not taking account of this dependence will mean that confidence intervals for the return levels are too short, since the equivalent amount of `independent information’ in the data may be smaller than is assumed by their models. I say `may be’ because I could find no attempt to check this in the paper or supplement;
(c) the parameters of the priors used in the hierarchical specification seem to be estimated from the individual maximum likelihood estimates. It is not clear from the text how this is done (Figure 2 seems to have something to do with it) but this amounts to using the data twice. The authors claim that this is an empirical Bayes approach, but such an approach would not generally specify both the mean and the variance of the prior using the original data, as is done in this paper. (It is not uncommon in BHM to use a prior in which the mean is close to that in the data, but the variance is very large.) Moreover the assigned priors seem to treat the parameters as independent a priori, when they have been estimated from common data using maximum likelihood, again giving the impression of higher information content than the data actually contain;
The overall effect of (b) and (c) is to give tighter confidence sets than would be justified by a more appropriate analysis. Since this is a key selling point of the results, it is difficult to regard them as entirely reliable.
Why not simply use a BHM with vague priors, not those taken from the data?
(d) The handling of missing data is unclear. According to line 241 missing data are added — does this mean that known values are deleted (and if so why?), or that missing ones are imputed? What imputation technique, if any, was used? We are told that different series are of different lengths, but this is a common problem. What’s unusual is to have around 70% of the maxima missing (line 247), presumably because some series are much longer than others;
(e) The representativeness of the test/train split is unclear. It appears from Figure 1 that long stretches of coastline have no test stations. Why?
(f) Line 273 mentions a bootstrap. There are many possible bootstraps: what was used, precisely? Parametric? Nonparametric? Treating all observations as independent? Treating years as independent, but allowing for spatial dependence? Resampling blocks of years?
(g) The authors choose to use kernel density estimates (KDEs) to compare datasets. This is a statistically poor approach both because it does not show the individual data and because KDEs have poor tail behaviour and are unreliable unless based on sample sizes of hundreds of independent data. In particular, the authors claim (lines 276, 277) that the training data adequately match the regional block matrix, but this claim is unjustified without some indication of the uncertainty of the estimates. QQplots or two-sample tests would be more appropriate, even if themselves inadequate due to the dependence in the data.
(h) Amongst the problems with the model specification mentioned above, I found the Hilbert particularly problematic. First, at line 317 it is not clear what $x_1, x_2,x_3$ represent. It appears (line 346) that 2640 basis functions are being used to model variation in three parameters at 3083 elements of a matrix (line 246-7), which seems excessive. There also seems to be no attempt to assess the sensitivity of the results to the choice of priors and/or their parameters, and the many other apparently arbitrary choices made when fitting the models; this holds true also for the other models. So we have no idea whether the results depend heavily (or not) on the prior specifications and other choices made. Hence the results cannot be regarded as robust.
(i) The authors claim to be able to estimate 10000-year return levels accurately. However the prior handling of the data involves fitting a 20-year centred moving average to remove trend. So, looking ahead to the estimation of such return levels for (say) the year 2040: how can the results be used, since the 20-year centred moving average for 2040 is (as yet) unavailable?
Here are some other problems:
(i) as mentioned above, the English grammar and syntax are quite poor, with many typos and sentences that are incomplete and/or grammatically incorrect, to the point where the meaning can be unclear;
(ii) the authors should distinguish passive and active citation of references;
(iii) the literature review is seriously deficient. The earliest paper on spatial (strictly, coastal) modeling of annual maximum sea levels of which I’m aware is Coles and Tawn (1990, Phil Trans R Soc Lond A, Statistics of Coastal Flood Prevention), and there are many other related papers in the literature since, a similar one being the 2005 paper in the same journal by the same authors. In particular a close look at many other papers by Jonathan Tawn (and his collaborators) would be warranted. It would also be useful for the authors to read the relevant chapters in the Handbook of Ecological and Environmental Statistics (particularly chapters 8, 31, 32):
https://www.routledge.com/Handbook-of-Environmental-and-Ecological-Statistics/Gelfand-Fuentes-Hoeting-Smith/p/book/9780367731786?source=shoppingads&locale=en-GBP&utm_source=google&utm_medium=cpc&utm_campaign_P7696357662_ECOMMC_Europe_shopping&gad_source=1&gad_campaignid=22138413057&gbraid=0AAAAACWuhHWxtgjVvGkmfPedHivgVGCDN&gclid=CjwKCAjwk7DFBhBAEiwAeYbJsVESfGdwAV6HEIP62BxecIXdW9cEUqitv46V0QoSu5mSmDJ_yiQ7JRoCcRgQAvD_BwE
Several of the references in the bibliography are wrong or incomplete — for example, Coles (2001) has just one author (there is no et al.); Davison et al. (2012) appeared in the journal Statistical Science (the details are missing); Beirlant et al. was published in 2004; Dudley was published in 2002; Robert has only one author (no et al.), etc. Unfortunately the authors seem to rely on sources such as Google Scholar (full of errors) for the bibliography, rather than checking the references themselves.
(iv) Various statements are false. For example, it is claimed (line 109) that maximum likelihood shape parameter estimates are consistent and asymptotically normal when they are greater than $-1/2$. This is false: the statement applies when the unknown underlying shape parameter, not the estimate, is greater than $-1/2$. Or (line 314) it is claimed that the Common and Separate models give no Bayesian way to interpolate to new locations. Again, this is possible (but admittedly not very useful): new parameter values are simply generated from the posterior distributions and used for the new locations. There are numerous other apparent misunderstandings (I say apparent because it is possible that they will disappear when the grammar and syntax are fixed).
Citation: https://doi.org/10.5194/egusphere-2025-1257-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
280 | 63 | 15 | 358 | 68 | 11 | 31 |
- HTML: 280
- PDF: 63
- XML: 15
- Total: 358
- Supplement: 68
- BibTeX: 11
- EndNote: 31
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1