$\huge{\textit{Supplement of}}$ $\Huge{\textbf{A probabilistic view of extreme sea level events in the Baltic Sea}}$

This notebook is part of the supplementary material for the article: $\textbf{A probabilistic view of extreme sea level events in the Baltic Sea}$

For the time-series (TS) data exploration, we consider only the stations present in the training data. The exploration is done with respect to visualizable properties of the TS-data. In particular, we study the stationarity of the TS-data, for each of the following detrending methods:

  1. The non-detrended TS-data
  2. 10-year-centerd-moving-average-detrended TS-data
  3. 20-year-centerd-moving-average-detrended TS-data
  4. 1-degree-polynomial-detrended TS-data
  5. 2-degree-polynomial-detrended TS-data
  6. 3-degree-polynomial-detrended TS-data

From there, we consider the different threshold levels w.r.t the Block maxima matrix. Given the different threshold levels in {0, 10, 20, 30, 40}, we study comparatively the "min", "max", "median", "mean", "std" w.r.t the GEV parameters (xi_i:$\xi$), (mu_i:$\mu$), (sigma_i:$\sigma$), the number observations, and the upper bound of the support for GEV. Moreover, we study the KDE plots for each of the GEV parameters w.r.t the different threshold levels.

$\huge{\text{Python packages used:}}$

In [ ]:
import numpy as np
import pandas as pd 
#np.bool = np.bool_
import matplotlib
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
plt.style.use('tableau-colorblind10')
import seaborn as sns
In [110]:
print("numpy:",np.__version__)
print("pandas:",pd.__version__)
print("matplotlib:",matplotlib.__version__)
print("seaborn:",sns.__version__)
numpy: 1.26.4
pandas: 2.1.1
matplotlib: 3.7.2
seaborn: 0.12.2

$\huge{\text{Modules used:}}$

In [3]:
from CleanGeslaBM import BlockMaxima, BlockMaximaRawTsData
from BaselineModel import Baseline
from DataAnalysisUtil import train_test_split_with_info

$\huge{\text{Set-up}}$

The paths of the directories for the GESLA-3 meta, and data files used in the article. Here we have already constructed a directory containing only the CMEMs uploaded data, for the stations in the region of interest.

In [4]:
#GeslaDataset object, where datapath contains all GESLA3 data with respect to some region 
metafile = "/home/sm_seuba/Documents/GESLA3_ALL_2.csv" #GESLA metafile path
region_datapath = "/home/sm_seuba/Documents/GESLA3 DATA BALTIC NORTH CMEMS/" #Path to directory contaning only GESLA-3 CMEMS Baltic North ts-data  

$\huge{\text{Constructing a Block-Maxima matrix (BM):}}$

The BM contains the yearly (July-1th to June-30th) for all of the CMEM stations in the region of interest (Baltic-sea and parts of the North-sea).

The region_BM is detrended using a 20-year centered-moving-average (CMA) trend, the same one is to derive the BM and training data used in the article.

In [5]:
region_BM,_ =  BlockMaxima(metafile, region_datapath, start=1850, detrend=20, window=True, center=True)

The region_BM is detrended using a 20-year centered-moving-average (CMA) trend, the same one is to derive the BM and training data used in the article. Note that due to the shifted calendar-year and the non-uniformness of the observations period for the tide gauges time-series, we might have years containing no valid observed yearly maximum. Therefore, we remove any stations where the total number of observed year-maximums is 0. We illustrate the before and after by displaying the shape of the two BM.

In [8]:
print("Shape of BM before removing stations where #obs=0:",region_BM.shape)
region_BM.dropna(axis=1, how="all", inplace=True) #Remove the stations where #obs=0
print("Shape of BM after removing stations where #obs=0:",region_BM.shape)
Shape of BM before removing stations where #obs=0: (135, 98)
Shape of BM after removing stations where #obs=0: (135, 98)

Recall that we only consider the TS-data for the stations contained in the training data. Now, we extract the stations coordinates (column names).

In [9]:
region_base_BM = region_BM.dropna(axis=1, thresh= 20)
train, train_info, test, test_info = train_test_split_with_info(region_base_BM,rows=55, thresh=35, case=0)
train_stations = list(train.columns)
In [109]:
print("Shape of BM before removing stations where #obs<20:",region_BM.shape)
print("Shape of BM before removing stations where #obs<20:",region_base_BM.shape)
print("Shape of the training data:",train.shape)
Shape of BM before removing stations where #obs<20: (135, 98)
Shape of BM before removing stations where #obs<20: (135, 51)
Shape of the training data: (79, 36)

Next we detrend the data using the different methods listed above.

In [10]:
raw_dt_region_BM =   BlockMaximaRawTsData(metafile, region_datapath, detrend=None)
pre_10dt_region_BM= BlockMaximaRawTsData(metafile, region_datapath, detrend=10, window=True, center=True)
pre_20dt_region_BM= BlockMaximaRawTsData(metafile, region_datapath, detrend=20, window=True, center=True)
In [12]:
poly1_pre_dt_swe_smhi_BM = BlockMaximaRawTsData(metafile,region_datapath, detrend=1, window=False)
poly2_pre_dt_swe_smhi_BM = BlockMaximaRawTsData(metafile,region_datapath, detrend=2, window=False)
poly3_pre_dt_swe_smhi_BM= BlockMaximaRawTsData(metafile, region_datapath, detrend=3, window=False) 

The TS-plots for the different detrending methods are displayed below, where the i:th row corresponds to i:th station in the training data.

In [13]:
colors = [f"C{i}" for i in range(6)]
fig, axes = plt.subplots(nrows=36, ncols=6, figsize=(50, 80),  constrained_layout=True);
ts_data_names = ["non-detrend-TS", "10-CMA-detrended-TS", "20-CMA-detrended-TS", "Poly-deg1-detrended-TS", "Poly-deg2-detrended-TS","Poly-deg3-detrended-TS"]
i = 0
for sites in train_stations:
    raw_dt_region_BM[sites].plot(ax=axes[i,0],color=colors[0], kind="line",  legend=True, ).axhline(0, c='r'),
    pre_10dt_region_BM[sites].plot(ax=axes[i,1],color=colors[1],kind="line",  legend=True,).axhline(0, c='r'),
    pre_20dt_region_BM[sites].plot(ax=axes[i,2],color=colors[2],kind="line",  legend=True,).axhline(0, c='r'),
    poly1_pre_dt_swe_smhi_BM[sites].plot(ax=axes[i,3],color=colors[3],kind="line",  legend=True,).axhline(0, c='r'),
    poly2_pre_dt_swe_smhi_BM[sites].plot(ax=axes[i,4],color=colors[4],kind="line",  legend=True,).axhline(0, c='r'),
    poly3_pre_dt_swe_smhi_BM[sites].plot(ax=axes[i,5],color=colors[5],kind="line",  legend=True,).axhline(0, c='r'),
    if i == 0:
        for col in range(6):
            axes[i,col].set_title(ts_data_names[col],  fontdict={"fontsize":20});
    axes[i,0].set_ylabel(f"{i},{sites}", fontsize=10)
    i += 1
fig.suptitle("Comparative Time Series data (CMEMS), for both raw and detrended data", fontsize=30)
plt.style.use('tableau-colorblind10')
plt.show()
No description has been provided for this image

From the above plot we see that there are similarities between the 10-year-CMA, 20-year_CMA, 1-degree Ploy, and 2-degree Poly detrending methods. Therefore, we consider the KDE plots of 1-dim array, where the entries are the observed yearly-maxima’s.

To do this we must first generate the respective BM for the above TS-data.

In [14]:
region_BM_dt_10,_ =  BlockMaxima(metafile, region_datapath, start=1850, detrend=10, window=True, center=True)
region_BM_dt_10 = region_BM_dt_10.dropna(axis=1, how="all")
In [15]:
region_BM_dt_poly1,_ =  BlockMaxima(metafile, region_datapath, start=1850, detrend=1, window=False)
region_BM_dt_poly1 = region_BM_dt_poly1.dropna(axis=1, how="all")
In [16]:
region_BM_dt_poly2,_ =  BlockMaxima(metafile, region_datapath, start=1850, detrend=2, window=False)
region_BM_dt_poly2 = region_BM_dt_poly2.dropna(axis=1, how="all")
In [119]:
compar_detrened_BM_list = [region_BM, region_BM_dt_10, region_BM_dt_poly1, region_BM_dt_poly2]
names_dt = ["20-CMA", "10-CMA", "1-deg-Poly", "2-deg-Poly"]
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20,10))
for i, data in enumerate(compar_detrened_BM_list):
    sns.kdeplot(data=data.to_numpy(dtype="float", na_value=np.nan).flatten(),legend=True,label=f"{names_dt[i]}",ax=axes[0])
    sns.histplot(data=data.to_numpy(dtype="float", na_value=np.nan).flatten(),legend=True,label=f"{names_dt[i]}", element="step",ax=axes[1])
    axes[0].legend()
    axes[1].legend()
fig.suptitle("Comparative KDE plots for the different detrended BM data")
plt.show()
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
No description has been provided for this image
In [120]:
compar_detrened_BM_list = [region_BM, region_BM_dt_10, region_BM_dt_poly1, region_BM_dt_poly2]
names_of_dt = ["20-CMA", "10-CMA", "1-deg-Poly", "2-deg-Poly"]
info_data_dict = {}
for i, data in enumerate(compar_detrened_BM_list):
    info_data = [data.min(axis=None),data.max(axis=None), data.mean(axis=None), np.nanstd(data.to_numpy(dtype="float", na_value=np.nan).flatten()), data.notna().sum().sum(), data.isna().sum().sum(), data.shape]
    info_data_dict.update({names_of_dt[i]:info_data})
info_data_df = pd.DataFrame(info_data_dict, index=["min", "max", "mean","std","notNaN", "NaNs","shape"])
print("Comparative descriptive table for the different detrended BMs")
display(info_data_df)
Comparative descriptive table for the different detrended BMs
20-CMA 10-CMA 1-deg-Poly 2-deg-Poly
min 0.101978 0.09081 0.008546 0.000793
max 9.822584 9.758118 9.777326 9.737304
mean 0.923406 0.922245 0.914252 0.915351
std 0.485336 0.479252 0.46941 0.474232
notNaN 3420 3546 3784 3784
NaNs 9810 10899 17816 17816
shape (135, 98) (135, 107) (135, 160) (135, 160)

From the above table, we se that there are differences in minimum value, the number of both not NaNs and NaNs. In particular, the shape of each BM is different. This is due to the larger BMs contains stations with on or two not-NaNs over the entire period. Next, we study the reduced cases.

In [121]:
region_BM_base, region_BM_dt_10_base, region_BM_dt_poly1_base, region_BM_dt_poly2_base =region_BM.dropna(axis=1, thresh=20), region_BM_dt_10.dropna(axis=1, thresh=20), region_BM_dt_poly1.dropna(axis=1, thresh=20), region_BM_dt_poly2.dropna(axis=1, thresh=20)
In [122]:
compar_detrened_BM_base_list = [region_BM_base, region_BM_dt_10_base, region_BM_dt_poly1_base, region_BM_dt_poly2_base]
names_dt = ["20-CMA", "10-CMA", "1-deg-Poly", "2-deg-Poly"]
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20,10))
for i, data in enumerate(compar_detrened_BM_base_list):
    sns.kdeplot(data=data.to_numpy(dtype="float", na_value=np.nan).flatten(),legend=True,label=f"{names_dt[i]}",ax=axes[0])
    sns.histplot(data=data.to_numpy(dtype="float", na_value=np.nan).flatten(),legend=True,label=f"{names_dt[i]}", element="step",ax=axes[1])
    axes[0].legend()
    axes[1].legend()
fig.suptitle("Comparative KDE plots for the different detrended BM data with a threshold of 20")
plt.show()
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead
  if pd.api.types.is_categorical_dtype(vector):
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
No description has been provided for this image
In [123]:
compar_detrened_BM_list_base = [region_BM_base, region_BM_dt_10_base, region_BM_dt_poly1_base, region_BM_dt_poly2_base]
names_of_dt = ["20-CMA", "10-CMA", "1-deg-Poly", "2-deg-Poly"]
info_data_base_dict = {}
for i, data in enumerate(compar_detrened_BM_list_base):
    info_data = [data.min(axis=None),data.max(axis=None), data.mean(axis=None), np.nanstd(data.to_numpy(dtype="float", na_value=np.nan).flatten()), data.notna().sum().sum(), data.isna().sum().sum(), data.shape]
    info_data_base_dict.update({names_of_dt[i]:info_data})
info_data_base_df = pd.DataFrame(info_data_base_dict, index=["min", "max", "mean","std","notNaN", "NaNs","shape"])
print("Comparative descriptive table for the different detrended BMs with a threshold of 20")
display(info_data_base_df)
Comparative descriptive table for the different detrended BMs with a threshold of 20
20-CMA 10-CMA 1-deg-Poly 2-deg-Poly
min 0.101978 0.09081 0.053476 0.01823
max 2.077028 2.082404 2.078986 2.078163
mean 0.845888 0.847582 0.846886 0.847066
std 0.274355 0.274686 0.274895 0.274569
notNaN 2929 2929 2929 2929
NaNs 3956 3956 3956 3956
shape (135, 51) (135, 51) (135, 51) (135, 51)

online at: From the above analysis, it follows that there is a difference in the detrending method for range of values in the BM matrices. In particular, the "10-CMA", "1-deg-Poly", "2-deg-Poly" datasets gives lower bounds on the minimum of the annual maxima’s. Therefore, giving additional motivation for the use of the "20-CMA" data.

Next, we consider the thresholding, we do this by compare the values and KDE plots of the GEV parameters, subject to different lower bounds on the total number of observed annual maxima's at each station.

We first study the dimensions of the matrices.

In [76]:
region_base_BM_thresh_10, region_base_BM_thresh_20, region_base_BM_thresh_30, region_base_BM_thresh_40 = region_BM.dropna(axis=1, thresh=10), region_BM.dropna(axis=1, thresh=20), region_BM.dropna(axis=1, thresh=30),region_BM.dropna(axis=1, thresh=40)  
In [77]:
print("The shape of the tresh=0 (baseline): ",region_BM.shape)
print("The shape of the tresh=10 case dataset: ", region_base_BM_thresh_10.shape)
print("The shape of the tresh=20 case dataset: ", region_base_BM_thresh_20.shape)
print("The shape of the tresh=30 case dataset: ", region_base_BM_thresh_30.shape)
print("The shape of the tresh=30 case dataset: ", region_base_BM_thresh_40.shape)
The shape of the tresh=0 (baseline):  (135, 98)
The shape of the tresh=10 case dataset:  (135, 76)
The shape of the tresh=20 case dataset:  (135, 51)
The shape of the tresh=30 case dataset:  (135, 47)
The shape of the tresh=30 case dataset:  (135, 37)
In [78]:
Baseline_region_BM, Baseline_region_BM_10, Baseline_region_BM_20, Baseline_region_BM_30, Baseline_region_BM_40 = Baseline(region_BM), Baseline(region_base_BM_thresh_10), Baseline(region_base_BM_thresh_20), Baseline(region_base_BM_thresh_30), Baseline(region_base_BM_thresh_40)
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/scipy/stats/_continuous_distns.py:3200: RuntimeWarning: invalid value encountered in power
  np.sign(c)*(-g3 + (g2 + 2*g2mg12)*g1)/g2mg12**1.5,
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/scipy/stats/_continuous_distns.py:3200: RuntimeWarning: invalid value encountered in power
  np.sign(c)*(-g3 + (g2 + 2*g2mg12)*g1)/g2mg12**1.5,

Now, we consider some descriptive statistics along with the min and max.

In [83]:
print("\t\t\tThreshold=0")
Baseline_region_BM.agg(
   {
        "xi_i": ["min", "max", "median", "mean", "std"],
        "mu_i": ["min", "max", "median", "mean", "std"],
        "sigma_i": ["min", "max", "median", "mean", "std"],
        "supp_GEV": ["min", "max", "median", "mean", "std"],
        "notNaNs":["min", "max", "median", "mean", "std"],
    }
)
			Threshold=0
Out[83]:
xi_i mu_i sigma_i supp_GEV notNaNs
min -1.299513 0.048808 0.032844 -3.081994 7.000000
max 1.949632 4.110465 1.764236 48.047890 135.000000
median -0.181720 0.902482 0.207078 1.739640 22.500000
mean -0.278116 1.022005 0.360290 2.759569 34.897959
std 0.558207 0.703048 0.362787 5.773938 32.065169
In [85]:
print("\t\t\tThreshold=10")
Baseline_region_BM_10.agg(
   {
        "xi_i": ["min", "max", "median", "mean", "std"],
        "mu_i": ["min", "max", "median", "mean", "std"],
        "sigma_i": ["min", "max", "median", "mean", "std"],
        "supp_GEV": ["min", "max", "median", "mean", "std"],
        "notNaNs":["min", "max", "median", "mean", "std"],
    }
)
			Threshold=10
Out[85]:
xi_i mu_i sigma_i supp_GEV notNaNs
min -1.212554 0.048808 0.065624 -3.081994 10.0000
max 0.626062 1.141825 1.764236 48.047890 135.0000
median -0.180042 0.809895 0.201138 1.708872 35.5000
mean -0.280942 0.770527 0.348539 2.888956 42.7500
std 0.412268 0.245899 0.388792 6.491913 32.4235
In [87]:
print("\t\t\tThreshold=20")
Baseline_region_BM_20.agg(
  {
        "xi_i": ["min", "max", "median", "mean", "std"],
        "mu_i": ["min", "max", "median", "mean", "std"],
        "sigma_i": ["min", "max", "median", "mean", "std"],
        "supp_GEV": ["min", "max", "median", "mean", "std"],
        "notNaNs":["min", "max", "median", "mean", "std"],
    }
)
			Threshold=20
Out[87]:
xi_i mu_i sigma_i supp_GEV notNaNs
min -1.212554 0.048808 0.118150 1.000733 20.000000
max -0.069073 1.098399 1.764236 3.617721 135.000000
median -0.220723 0.687577 0.208164 1.752803 51.000000
mean -0.406572 0.690888 0.426940 1.860389 57.431373
std 0.400502 0.247038 0.453911 0.572481 30.062771
In [88]:
print("\t\t\tThreshold=30")
Baseline_region_BM_30.agg(
  {
        "xi_i": ["min", "max", "median", "mean", "std"],
        "mu_i": ["min", "max", "median", "mean", "std"],
        "sigma_i": ["min", "max", "median", "mean", "std"],
        "supp_GEV": ["min", "max", "median", "mean", "std"],
        "notNaNs":["min", "max", "median", "mean", "std"],
    }
)
			Threshold=30
Out[88]:
xi_i mu_i sigma_i supp_GEV notNaNs
min -1.212554 0.048808 0.118150 1.000733 30.000000
max -0.069073 1.098399 1.764236 3.617721 135.000000
median -0.223230 0.684212 0.205435 1.705471 51.000000
mean -0.423907 0.676872 0.441573 1.800957 60.404255
std 0.410421 0.249757 0.470066 0.522260 29.442397
In [89]:
print("\t\t\tThreshold=40")
Baseline_region_BM_40.agg(
   {
        "xi_i": ["min", "max", "median", "mean", "std"],
        "mu_i": ["min", "max", "median", "mean", "std"],
        "sigma_i": ["min", "max", "median", "mean", "std"],
        "supp_GEV": ["min", "max", "median", "mean", "std"],
        "notNaNs":["min", "max", "median", "mean", "std"],
    }
)
			Threshold=40
Out[89]:
xi_i mu_i sigma_i supp_GEV notNaNs
min -1.111285 0.048808 0.118150 1.000733 40.000000
max -0.084855 1.098399 1.764236 3.376771 135.000000
median -0.200752 0.707437 0.195369 1.788991 51.000000
mean -0.311584 0.708211 0.315218 1.801795 68.270270
std 0.316569 0.234127 0.365744 0.484726 28.404468
In [90]:
Baselines_list = [Baseline_region_BM, Baseline_region_BM_10, Baseline_region_BM_20, Baseline_region_BM_30, Baseline_region_BM_40]
Baseline_names = ["no-thresh", "10-thresh","20-thresh", "30-thresh", "40-thresh"]
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(30,30),constrained_layout=True);
gev_parms = ["xi_i","mu_i", "sigma_i"]
for i, param in enumerate(gev_parms):
    for j, Baseline_bm in enumerate(Baselines_list):
        Baseline_bm.iloc[0:,i].plot(kind="kde", ax=axes[i], legend=True,label=f"{Baseline_names[j]}", linewidth=5)
        axes[i].xaxis.set_major_locator(ticker.MultipleLocator(0.2))
        axes[i].xaxis.set_minor_locator(ticker.MultipleLocator(0.1))
        axes[i].tick_params(axis="x", labelsize=15)
        axes[i].legend(fontsize=20)
    axes[i].set_title(f"KDE of the {param} for the GEV",  fontdict={"fontsize":30});
fig.suptitle("Comparative GEV parameter KDE plots about the 5 thresholdbounds", fontsize=30)
plt.show()
No description has been provided for this image

From the above plots, we see that there is a difference in the KDE plots for the different BM datasets. In particular, the shape and the scale parameters greater bounds for the non-thresh, and the thresh=10 BMs. Therefore, we see that the BMs with lower or no-threshold results in suspicions values for the GEV parameters.

In [ ]: