the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: Extending sea level time series for extremes analysis with machine learning and neighbouring station data
Abstract. Extreme sea levels may cause damage and disruption of activities in coastal areas. Thus, predicting extreme sea levels is essential for coastal management. Statistical inference of robust return level estimates critically depends on the length and quality of the observed time series. Here we compare two different methods for extending a very short (~10 years) time series of tide gauge measurements using a longer time series from a neighbouring tide gauge: Linear Regression and Quantile Regression Forest machine learning. Both methods are applied to stations located in the Kattegat basin between Denmark and Sweden. Reasonable results are obtained using both techniques with the machine learning method providing a better reconstruction of the observed extremes. Generating a set of stochastic time series reflecting uncertainty estimates from the machine learning model and subsequently estimating the corresponding return levels using extreme value theory, the spread of the return levels is found to agree with results derived from more physically-based methods.
-
Notice on discussion status
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
-
Preprint
(1009 KB)
-
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1009 KB) - Metadata XML
- BibTeX
- EndNote
- Final revised paper
Journal article(s) based on this preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1159', Anonymous Referee #1, 07 Jul 2023
General Remarks on the Article:
- The article is well formed and works on a significant issue. As there are so many water level stations throughout the world. Many stations have missing data or long gaps. The proposed method can help fill these gaps, especially with neighboring water level stations.
- One general negative comment is that, while talking about extreme sea levels, the authors do not talk about the storm surge or similar phenomena. Or in general if the authors are dealing with which extreme sea level events.
- I would suggest a change in the title of the manuscript. The current title suggests that the main focus is going to be about Machine Learning. However, when the overall manuscript is considered, it feels more statistical (as per the topic) than the ML part.
- As mentioned below, I believe the geographical location of the stations are very important. Hornbeck and Viken stations are constricted in a channel. In a tidal setting this will change how the water level behaves. This might be a big difference even in the characteristics of the water level time series. I believe this should be mentioned in the manuscript (event if it is not considered in the analysis).
- Between L39-50 authors mention many different methods and analysis. It would have been quite good to mention, how good the presented method compared to some of these studies.
Small Remarks on the Article:
- In the abstract there are many vague words, that has to do with the definition of the time series or quality of the outcome. For example, "Reasonable" is one of them. It would have been better to define the quantity and statistical measure.
- Between L30-40 there is a small definition of the data time series. Although the length is defined, there is no indication of the interval of the data until the section 2.1. It would be better to define the interval of the data, since it will also provide insight on the number of data points.
- Also in the same part, the highest record of 235 cm is given. It would have been a good idea to explain the event, as mentioned in the previous comment. Is it a storm surge or happened during spring tide etc.?
- In the methods part it is not clear which data is used for LR for QRF methods. Is it the hourly data or the daily data? In case if it is the hourly data, how good a good a fit is obtained using LR method to a tidally harmonic data?
- In L100 the sentence says the LR model is trained, but since it is a Least Squares Method, I don't thing "trained" is the correct word. It would be better to say "the LR model is fitted".
- If Figure 3 is showing the Setup Period 1 (as far as understood, it should be noted in the caption).
- In Section 3.1 one of the metrics is RMSE. Although it is a good metric, for example 6 cm RMSE in a 200 cm water level vs 30 cm water level is quite different. I suggest to use either a normalized RMSE or giving the range of the water level within Table 2.
- In general, and discussed in between L150-160, there are two sets of stations with very significant geographical differences. Hornbeck and Viken stations lie inside of a channel (almost at the entrance). However, compared to these stations, other stations are on the open coast. Maybe this is what’s meant in L159-160 by the physical grounds, but this might be a big difference even in the characteristics of the water level time series.
- L194 and Table3 Andersson is given two different dates (2001, 2021).
- In Figure 4 the colors of the stations are over washed by the shadow colors. Different color scheme or changing the line properties might help.
Citation: https://doi.org/10.5194/egusphere-2023-1159-RC1 - AC2: 'Reply on RC1', Kévin Dubois, 27 Oct 2023
-
RC2: 'Comment on egusphere-2023-1159', Anonymous Referee #2, 08 Sep 2023
All my comments are available in the pdf file.
- AC1: 'Reply on RC2', Kévin Dubois, 27 Oct 2023
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2023-1159', Anonymous Referee #1, 07 Jul 2023
General Remarks on the Article:
- The article is well formed and works on a significant issue. As there are so many water level stations throughout the world. Many stations have missing data or long gaps. The proposed method can help fill these gaps, especially with neighboring water level stations.
- One general negative comment is that, while talking about extreme sea levels, the authors do not talk about the storm surge or similar phenomena. Or in general if the authors are dealing with which extreme sea level events.
- I would suggest a change in the title of the manuscript. The current title suggests that the main focus is going to be about Machine Learning. However, when the overall manuscript is considered, it feels more statistical (as per the topic) than the ML part.
- As mentioned below, I believe the geographical location of the stations are very important. Hornbeck and Viken stations are constricted in a channel. In a tidal setting this will change how the water level behaves. This might be a big difference even in the characteristics of the water level time series. I believe this should be mentioned in the manuscript (event if it is not considered in the analysis).
- Between L39-50 authors mention many different methods and analysis. It would have been quite good to mention, how good the presented method compared to some of these studies.
Small Remarks on the Article:
- In the abstract there are many vague words, that has to do with the definition of the time series or quality of the outcome. For example, "Reasonable" is one of them. It would have been better to define the quantity and statistical measure.
- Between L30-40 there is a small definition of the data time series. Although the length is defined, there is no indication of the interval of the data until the section 2.1. It would be better to define the interval of the data, since it will also provide insight on the number of data points.
- Also in the same part, the highest record of 235 cm is given. It would have been a good idea to explain the event, as mentioned in the previous comment. Is it a storm surge or happened during spring tide etc.?
- In the methods part it is not clear which data is used for LR for QRF methods. Is it the hourly data or the daily data? In case if it is the hourly data, how good a good a fit is obtained using LR method to a tidally harmonic data?
- In L100 the sentence says the LR model is trained, but since it is a Least Squares Method, I don't thing "trained" is the correct word. It would be better to say "the LR model is fitted".
- If Figure 3 is showing the Setup Period 1 (as far as understood, it should be noted in the caption).
- In Section 3.1 one of the metrics is RMSE. Although it is a good metric, for example 6 cm RMSE in a 200 cm water level vs 30 cm water level is quite different. I suggest to use either a normalized RMSE or giving the range of the water level within Table 2.
- In general, and discussed in between L150-160, there are two sets of stations with very significant geographical differences. Hornbeck and Viken stations lie inside of a channel (almost at the entrance). However, compared to these stations, other stations are on the open coast. Maybe this is what’s meant in L159-160 by the physical grounds, but this might be a big difference even in the characteristics of the water level time series.
- L194 and Table3 Andersson is given two different dates (2001, 2021).
- In Figure 4 the colors of the stations are over washed by the shadow colors. Different color scheme or changing the line properties might help.
Citation: https://doi.org/10.5194/egusphere-2023-1159-RC1 - AC2: 'Reply on RC1', Kévin Dubois, 27 Oct 2023
-
RC2: 'Comment on egusphere-2023-1159', Anonymous Referee #2, 08 Sep 2023
All my comments are available in the pdf file.
- AC1: 'Reply on RC2', Kévin Dubois, 27 Oct 2023
Peer review completion
Journal article(s) based on this preprint
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
226 | 81 | 26 | 333 | 16 | 15 |
- HTML: 226
- PDF: 81
- XML: 26
- Total: 333
- BibTeX: 16
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Kévin André Daniel Dubois
Morten Andreas Dahl Larsen
Martin Drews
Erik Nilsson
Anna Rutgersson
The requested preprint has a corresponding peer-reviewed final revised paper. You are encouraged to refer to the final revised version.
- Preprint
(1009 KB) - Metadata XML