$\huge{\textit{Supplement of}}$ $\Huge{\textbf{A probabilistic view of extreme sea level events in the Baltic Sea}}$
This notebook is part of the supplementary material for the article: $\textbf{A probabilistic view of extreme sea level events in the Baltic Sea}$
For the time-series (TS) data exploration, we consider only the stations present in the training data. The exploration is done with respect to visualizable properties of the TS-data. In particular, we study the stationarity of the TS-data, for each of the following detrending methods:
- The non-detrended TS-data
- 10-year-centerd-moving-average-detrended TS-data
- 20-year-centerd-moving-average-detrended TS-data
- 1-degree-polynomial-detrended TS-data
- 2-degree-polynomial-detrended TS-data
- 3-degree-polynomial-detrended TS-data
From there, we consider the different threshold levels w.r.t the Block maxima matrix. Given the different threshold levels in {0, 10, 20, 30, 40}, we study comparatively the "min", "max", "median", "mean", "std" w.r.t the GEV parameters (xi_i:$\xi$), (mu_i:$\mu$), (sigma_i:$\sigma$), the number observations, and the upper bound of the support for GEV. Moreover, we study the KDE plots for each of the GEV parameters w.r.t the different threshold levels.
$\huge{\text{Python packages used:}}$
import numpy as np
import pandas as pd
#np.bool = np.bool_
import matplotlib
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
plt.style.use('tableau-colorblind10')
import seaborn as sns
print("numpy:",np.__version__)
print("pandas:",pd.__version__)
print("matplotlib:",matplotlib.__version__)
print("seaborn:",sns.__version__)
numpy: 1.26.4 pandas: 2.1.1 matplotlib: 3.7.2 seaborn: 0.12.2
$\huge{\text{Modules used:}}$
from CleanGeslaBM import BlockMaxima, BlockMaximaRawTsData
from BaselineModel import Baseline
from DataAnalysisUtil import train_test_split_with_info
$\huge{\text{Set-up}}$
The paths of the directories for the GESLA-3 meta, and data files used in the article. Here we have already constructed a directory containing only the CMEMs uploaded data, for the stations in the region of interest.
#GeslaDataset object, where datapath contains all GESLA3 data with respect to some region
metafile = "/home/sm_seuba/Documents/GESLA3_ALL_2.csv" #GESLA metafile path
region_datapath = "/home/sm_seuba/Documents/GESLA3 DATA BALTIC NORTH CMEMS/" #Path to directory contaning only GESLA-3 CMEMS Baltic North ts-data
$\huge{\text{Constructing a Block-Maxima matrix (BM):}}$
The BM contains the yearly (July-1th to June-30th) for all of the CMEM stations in the region of interest (Baltic-sea and parts of the North-sea).
The region_BM is detrended using a 20-year centered-moving-average (CMA) trend, the same one is to derive the BM and training data used in the article.
region_BM,_ = BlockMaxima(metafile, region_datapath, start=1850, detrend=20, window=True, center=True)
The region_BM is detrended using a 20-year centered-moving-average (CMA) trend, the same one is to derive the BM and training data used in the article. Note that due to the shifted calendar-year and the non-uniformness of the observations period for the tide gauges time-series, we might have years containing no valid observed yearly maximum. Therefore, we remove any stations where the total number of observed year-maximums is 0. We illustrate the before and after by displaying the shape of the two BM.
print("Shape of BM before removing stations where #obs=0:",region_BM.shape)
region_BM.dropna(axis=1, how="all", inplace=True) #Remove the stations where #obs=0
print("Shape of BM after removing stations where #obs=0:",region_BM.shape)
Shape of BM before removing stations where #obs=0: (135, 98) Shape of BM after removing stations where #obs=0: (135, 98)
Recall that we only consider the TS-data for the stations contained in the training data. Now, we extract the stations coordinates (column names).
region_base_BM = region_BM.dropna(axis=1, thresh= 20)
train, train_info, test, test_info = train_test_split_with_info(region_base_BM,rows=55, thresh=35, case=0)
train_stations = list(train.columns)
print("Shape of BM before removing stations where #obs<20:",region_BM.shape)
print("Shape of BM before removing stations where #obs<20:",region_base_BM.shape)
print("Shape of the training data:",train.shape)
Shape of BM before removing stations where #obs<20: (135, 98) Shape of BM before removing stations where #obs<20: (135, 51) Shape of the training data: (79, 36)
Next we detrend the data using the different methods listed above.
raw_dt_region_BM = BlockMaximaRawTsData(metafile, region_datapath, detrend=None)
pre_10dt_region_BM= BlockMaximaRawTsData(metafile, region_datapath, detrend=10, window=True, center=True)
pre_20dt_region_BM= BlockMaximaRawTsData(metafile, region_datapath, detrend=20, window=True, center=True)
poly1_pre_dt_swe_smhi_BM = BlockMaximaRawTsData(metafile,region_datapath, detrend=1, window=False)
poly2_pre_dt_swe_smhi_BM = BlockMaximaRawTsData(metafile,region_datapath, detrend=2, window=False)
poly3_pre_dt_swe_smhi_BM= BlockMaximaRawTsData(metafile, region_datapath, detrend=3, window=False)
The TS-plots for the different detrending methods are displayed below, where the i:th row corresponds to i:th station in the training data.
colors = [f"C{i}" for i in range(6)]
fig, axes = plt.subplots(nrows=36, ncols=6, figsize=(50, 80), constrained_layout=True);
ts_data_names = ["non-detrend-TS", "10-CMA-detrended-TS", "20-CMA-detrended-TS", "Poly-deg1-detrended-TS", "Poly-deg2-detrended-TS","Poly-deg3-detrended-TS"]
i = 0
for sites in train_stations:
raw_dt_region_BM[sites].plot(ax=axes[i,0],color=colors[0], kind="line", legend=True, ).axhline(0, c='r'),
pre_10dt_region_BM[sites].plot(ax=axes[i,1],color=colors[1],kind="line", legend=True,).axhline(0, c='r'),
pre_20dt_region_BM[sites].plot(ax=axes[i,2],color=colors[2],kind="line", legend=True,).axhline(0, c='r'),
poly1_pre_dt_swe_smhi_BM[sites].plot(ax=axes[i,3],color=colors[3],kind="line", legend=True,).axhline(0, c='r'),
poly2_pre_dt_swe_smhi_BM[sites].plot(ax=axes[i,4],color=colors[4],kind="line", legend=True,).axhline(0, c='r'),
poly3_pre_dt_swe_smhi_BM[sites].plot(ax=axes[i,5],color=colors[5],kind="line", legend=True,).axhline(0, c='r'),
if i == 0:
for col in range(6):
axes[i,col].set_title(ts_data_names[col], fontdict={"fontsize":20});
axes[i,0].set_ylabel(f"{i},{sites}", fontsize=10)
i += 1
fig.suptitle("Comparative Time Series data (CMEMS), for both raw and detrended data", fontsize=30)
plt.style.use('tableau-colorblind10')
plt.show()
From the above plot we see that there are similarities between the 10-year-CMA, 20-year_CMA, 1-degree Ploy, and 2-degree Poly detrending methods. Therefore, we consider the KDE plots of 1-dim array, where the entries are the observed yearly-maxima’s.
To do this we must first generate the respective BM for the above TS-data.
region_BM_dt_10,_ = BlockMaxima(metafile, region_datapath, start=1850, detrend=10, window=True, center=True)
region_BM_dt_10 = region_BM_dt_10.dropna(axis=1, how="all")
region_BM_dt_poly1,_ = BlockMaxima(metafile, region_datapath, start=1850, detrend=1, window=False)
region_BM_dt_poly1 = region_BM_dt_poly1.dropna(axis=1, how="all")
region_BM_dt_poly2,_ = BlockMaxima(metafile, region_datapath, start=1850, detrend=2, window=False)
region_BM_dt_poly2 = region_BM_dt_poly2.dropna(axis=1, how="all")
compar_detrened_BM_list = [region_BM, region_BM_dt_10, region_BM_dt_poly1, region_BM_dt_poly2]
names_dt = ["20-CMA", "10-CMA", "1-deg-Poly", "2-deg-Poly"]
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20,10))
for i, data in enumerate(compar_detrened_BM_list):
sns.kdeplot(data=data.to_numpy(dtype="float", na_value=np.nan).flatten(),legend=True,label=f"{names_dt[i]}",ax=axes[0])
sns.histplot(data=data.to_numpy(dtype="float", na_value=np.nan).flatten(),legend=True,label=f"{names_dt[i]}", element="step",ax=axes[1])
axes[0].legend()
axes[1].legend()
fig.suptitle("Comparative KDE plots for the different detrended BM data")
plt.show()
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
compar_detrened_BM_list = [region_BM, region_BM_dt_10, region_BM_dt_poly1, region_BM_dt_poly2]
names_of_dt = ["20-CMA", "10-CMA", "1-deg-Poly", "2-deg-Poly"]
info_data_dict = {}
for i, data in enumerate(compar_detrened_BM_list):
info_data = [data.min(axis=None),data.max(axis=None), data.mean(axis=None), np.nanstd(data.to_numpy(dtype="float", na_value=np.nan).flatten()), data.notna().sum().sum(), data.isna().sum().sum(), data.shape]
info_data_dict.update({names_of_dt[i]:info_data})
info_data_df = pd.DataFrame(info_data_dict, index=["min", "max", "mean","std","notNaN", "NaNs","shape"])
print("Comparative descriptive table for the different detrended BMs")
display(info_data_df)
Comparative descriptive table for the different detrended BMs
20-CMA | 10-CMA | 1-deg-Poly | 2-deg-Poly | |
---|---|---|---|---|
min | 0.101978 | 0.09081 | 0.008546 | 0.000793 |
max | 9.822584 | 9.758118 | 9.777326 | 9.737304 |
mean | 0.923406 | 0.922245 | 0.914252 | 0.915351 |
std | 0.485336 | 0.479252 | 0.46941 | 0.474232 |
notNaN | 3420 | 3546 | 3784 | 3784 |
NaNs | 9810 | 10899 | 17816 | 17816 |
shape | (135, 98) | (135, 107) | (135, 160) | (135, 160) |
From the above table, we se that there are differences in minimum value, the number of both not NaNs and NaNs. In particular, the shape of each BM is different. This is due to the larger BMs contains stations with on or two not-NaNs over the entire period. Next, we study the reduced cases.
region_BM_base, region_BM_dt_10_base, region_BM_dt_poly1_base, region_BM_dt_poly2_base =region_BM.dropna(axis=1, thresh=20), region_BM_dt_10.dropna(axis=1, thresh=20), region_BM_dt_poly1.dropna(axis=1, thresh=20), region_BM_dt_poly2.dropna(axis=1, thresh=20)
compar_detrened_BM_base_list = [region_BM_base, region_BM_dt_10_base, region_BM_dt_poly1_base, region_BM_dt_poly2_base]
names_dt = ["20-CMA", "10-CMA", "1-deg-Poly", "2-deg-Poly"]
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20,10))
for i, data in enumerate(compar_detrened_BM_base_list):
sns.kdeplot(data=data.to_numpy(dtype="float", na_value=np.nan).flatten(),legend=True,label=f"{names_dt[i]}",ax=axes[0])
sns.histplot(data=data.to_numpy(dtype="float", na_value=np.nan).flatten(),legend=True,label=f"{names_dt[i]}", element="step",ax=axes[1])
axes[0].legend()
axes[1].legend()
fig.suptitle("Comparative KDE plots for the different detrended BM data with a threshold of 20")
plt.show()
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1498: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if pd.api.types.is_categorical_dtype(vector): /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
compar_detrened_BM_list_base = [region_BM_base, region_BM_dt_10_base, region_BM_dt_poly1_base, region_BM_dt_poly2_base]
names_of_dt = ["20-CMA", "10-CMA", "1-deg-Poly", "2-deg-Poly"]
info_data_base_dict = {}
for i, data in enumerate(compar_detrened_BM_list_base):
info_data = [data.min(axis=None),data.max(axis=None), data.mean(axis=None), np.nanstd(data.to_numpy(dtype="float", na_value=np.nan).flatten()), data.notna().sum().sum(), data.isna().sum().sum(), data.shape]
info_data_base_dict.update({names_of_dt[i]:info_data})
info_data_base_df = pd.DataFrame(info_data_base_dict, index=["min", "max", "mean","std","notNaN", "NaNs","shape"])
print("Comparative descriptive table for the different detrended BMs with a threshold of 20")
display(info_data_base_df)
Comparative descriptive table for the different detrended BMs with a threshold of 20
20-CMA | 10-CMA | 1-deg-Poly | 2-deg-Poly | |
---|---|---|---|---|
min | 0.101978 | 0.09081 | 0.053476 | 0.01823 |
max | 2.077028 | 2.082404 | 2.078986 | 2.078163 |
mean | 0.845888 | 0.847582 | 0.846886 | 0.847066 |
std | 0.274355 | 0.274686 | 0.274895 | 0.274569 |
notNaN | 2929 | 2929 | 2929 | 2929 |
NaNs | 3956 | 3956 | 3956 | 3956 |
shape | (135, 51) | (135, 51) | (135, 51) | (135, 51) |
online at: From the above analysis, it follows that there is a difference in the detrending method for range of values in the BM matrices. In particular, the "10-CMA", "1-deg-Poly", "2-deg-Poly" datasets gives lower bounds on the minimum of the annual maxima’s. Therefore, giving additional motivation for the use of the "20-CMA" data.
Next, we consider the thresholding, we do this by compare the values and KDE plots of the GEV parameters, subject to different lower bounds on the total number of observed annual maxima's at each station.
We first study the dimensions of the matrices.
region_base_BM_thresh_10, region_base_BM_thresh_20, region_base_BM_thresh_30, region_base_BM_thresh_40 = region_BM.dropna(axis=1, thresh=10), region_BM.dropna(axis=1, thresh=20), region_BM.dropna(axis=1, thresh=30),region_BM.dropna(axis=1, thresh=40)
print("The shape of the tresh=0 (baseline): ",region_BM.shape)
print("The shape of the tresh=10 case dataset: ", region_base_BM_thresh_10.shape)
print("The shape of the tresh=20 case dataset: ", region_base_BM_thresh_20.shape)
print("The shape of the tresh=30 case dataset: ", region_base_BM_thresh_30.shape)
print("The shape of the tresh=30 case dataset: ", region_base_BM_thresh_40.shape)
The shape of the tresh=0 (baseline): (135, 98) The shape of the tresh=10 case dataset: (135, 76) The shape of the tresh=20 case dataset: (135, 51) The shape of the tresh=30 case dataset: (135, 47) The shape of the tresh=30 case dataset: (135, 37)
Baseline_region_BM, Baseline_region_BM_10, Baseline_region_BM_20, Baseline_region_BM_30, Baseline_region_BM_40 = Baseline(region_BM), Baseline(region_base_BM_thresh_10), Baseline(region_base_BM_thresh_20), Baseline(region_base_BM_thresh_30), Baseline(region_base_BM_thresh_40)
/home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/scipy/stats/_continuous_distns.py:3200: RuntimeWarning: invalid value encountered in power np.sign(c)*(-g3 + (g2 + 2*g2mg12)*g1)/g2mg12**1.5, /home/sm_seuba/.conda/envs/pymc_env/lib/python3.11/site-packages/scipy/stats/_continuous_distns.py:3200: RuntimeWarning: invalid value encountered in power np.sign(c)*(-g3 + (g2 + 2*g2mg12)*g1)/g2mg12**1.5,
Now, we consider some descriptive statistics along with the min and max.
print("\t\t\tThreshold=0")
Baseline_region_BM.agg(
{
"xi_i": ["min", "max", "median", "mean", "std"],
"mu_i": ["min", "max", "median", "mean", "std"],
"sigma_i": ["min", "max", "median", "mean", "std"],
"supp_GEV": ["min", "max", "median", "mean", "std"],
"notNaNs":["min", "max", "median", "mean", "std"],
}
)
Threshold=0
xi_i | mu_i | sigma_i | supp_GEV | notNaNs | |
---|---|---|---|---|---|
min | -1.299513 | 0.048808 | 0.032844 | -3.081994 | 7.000000 |
max | 1.949632 | 4.110465 | 1.764236 | 48.047890 | 135.000000 |
median | -0.181720 | 0.902482 | 0.207078 | 1.739640 | 22.500000 |
mean | -0.278116 | 1.022005 | 0.360290 | 2.759569 | 34.897959 |
std | 0.558207 | 0.703048 | 0.362787 | 5.773938 | 32.065169 |
print("\t\t\tThreshold=10")
Baseline_region_BM_10.agg(
{
"xi_i": ["min", "max", "median", "mean", "std"],
"mu_i": ["min", "max", "median", "mean", "std"],
"sigma_i": ["min", "max", "median", "mean", "std"],
"supp_GEV": ["min", "max", "median", "mean", "std"],
"notNaNs":["min", "max", "median", "mean", "std"],
}
)
Threshold=10
xi_i | mu_i | sigma_i | supp_GEV | notNaNs | |
---|---|---|---|---|---|
min | -1.212554 | 0.048808 | 0.065624 | -3.081994 | 10.0000 |
max | 0.626062 | 1.141825 | 1.764236 | 48.047890 | 135.0000 |
median | -0.180042 | 0.809895 | 0.201138 | 1.708872 | 35.5000 |
mean | -0.280942 | 0.770527 | 0.348539 | 2.888956 | 42.7500 |
std | 0.412268 | 0.245899 | 0.388792 | 6.491913 | 32.4235 |
print("\t\t\tThreshold=20")
Baseline_region_BM_20.agg(
{
"xi_i": ["min", "max", "median", "mean", "std"],
"mu_i": ["min", "max", "median", "mean", "std"],
"sigma_i": ["min", "max", "median", "mean", "std"],
"supp_GEV": ["min", "max", "median", "mean", "std"],
"notNaNs":["min", "max", "median", "mean", "std"],
}
)
Threshold=20
xi_i | mu_i | sigma_i | supp_GEV | notNaNs | |
---|---|---|---|---|---|
min | -1.212554 | 0.048808 | 0.118150 | 1.000733 | 20.000000 |
max | -0.069073 | 1.098399 | 1.764236 | 3.617721 | 135.000000 |
median | -0.220723 | 0.687577 | 0.208164 | 1.752803 | 51.000000 |
mean | -0.406572 | 0.690888 | 0.426940 | 1.860389 | 57.431373 |
std | 0.400502 | 0.247038 | 0.453911 | 0.572481 | 30.062771 |
print("\t\t\tThreshold=30")
Baseline_region_BM_30.agg(
{
"xi_i": ["min", "max", "median", "mean", "std"],
"mu_i": ["min", "max", "median", "mean", "std"],
"sigma_i": ["min", "max", "median", "mean", "std"],
"supp_GEV": ["min", "max", "median", "mean", "std"],
"notNaNs":["min", "max", "median", "mean", "std"],
}
)
Threshold=30
xi_i | mu_i | sigma_i | supp_GEV | notNaNs | |
---|---|---|---|---|---|
min | -1.212554 | 0.048808 | 0.118150 | 1.000733 | 30.000000 |
max | -0.069073 | 1.098399 | 1.764236 | 3.617721 | 135.000000 |
median | -0.223230 | 0.684212 | 0.205435 | 1.705471 | 51.000000 |
mean | -0.423907 | 0.676872 | 0.441573 | 1.800957 | 60.404255 |
std | 0.410421 | 0.249757 | 0.470066 | 0.522260 | 29.442397 |
print("\t\t\tThreshold=40")
Baseline_region_BM_40.agg(
{
"xi_i": ["min", "max", "median", "mean", "std"],
"mu_i": ["min", "max", "median", "mean", "std"],
"sigma_i": ["min", "max", "median", "mean", "std"],
"supp_GEV": ["min", "max", "median", "mean", "std"],
"notNaNs":["min", "max", "median", "mean", "std"],
}
)
Threshold=40
xi_i | mu_i | sigma_i | supp_GEV | notNaNs | |
---|---|---|---|---|---|
min | -1.111285 | 0.048808 | 0.118150 | 1.000733 | 40.000000 |
max | -0.084855 | 1.098399 | 1.764236 | 3.376771 | 135.000000 |
median | -0.200752 | 0.707437 | 0.195369 | 1.788991 | 51.000000 |
mean | -0.311584 | 0.708211 | 0.315218 | 1.801795 | 68.270270 |
std | 0.316569 | 0.234127 | 0.365744 | 0.484726 | 28.404468 |
Baselines_list = [Baseline_region_BM, Baseline_region_BM_10, Baseline_region_BM_20, Baseline_region_BM_30, Baseline_region_BM_40]
Baseline_names = ["no-thresh", "10-thresh","20-thresh", "30-thresh", "40-thresh"]
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(30,30),constrained_layout=True);
gev_parms = ["xi_i","mu_i", "sigma_i"]
for i, param in enumerate(gev_parms):
for j, Baseline_bm in enumerate(Baselines_list):
Baseline_bm.iloc[0:,i].plot(kind="kde", ax=axes[i], legend=True,label=f"{Baseline_names[j]}", linewidth=5)
axes[i].xaxis.set_major_locator(ticker.MultipleLocator(0.2))
axes[i].xaxis.set_minor_locator(ticker.MultipleLocator(0.1))
axes[i].tick_params(axis="x", labelsize=15)
axes[i].legend(fontsize=20)
axes[i].set_title(f"KDE of the {param} for the GEV", fontdict={"fontsize":30});
fig.suptitle("Comparative GEV parameter KDE plots about the 5 thresholdbounds", fontsize=30)
plt.show()
From the above plots, we see that there is a difference in the KDE plots for the different BM datasets. In particular, the shape and the scale parameters greater bounds for the non-thresh, and the thresh=10 BMs. Therefore, we see that the BMs with lower or no-threshold results in suspicions values for the GEV parameters.