seastats
is a simple package to compare and analyse 2 time series. We use the following convention in this repo:
sim
: modelled surge time seriesmod
: observed surge time series
The main function is:
def get_stats(
sim: Series,
obs: Series,
metrics: Sequence[str] = SUGGESTED_METRICS,
quantile: float = 0,
cluster: int = 72,
round: int = -1
) -> dict[str, float]
Calculates various statistical metrics between the simulated and observed time series data.
- sim (pd.Series). The simulated time series data.
- obs (pd.Series). The observed time series data.
- metrics (list[str]). (Optional) The list of statistical metrics to calculate. If metrics = ["all"], all items in
SUPPORTED_METRICS
will be calculated. Default is all items inSUGGESTED_METRICS
. - quantile (float). (Optional) Quantile used to calculate the metrics. Default is
0
(no selection) - cluster (int). (Optional) Cluster duration for grouping storm events. Default is
72
hours. - round (int). (Optional) Apply rounding to the results to. Default is no rounding (value is
-1
)
Returns a dictionary containing the calculated metrics and their corresponding values. With 2 types of metrics:
- The "general" metrics: All the basic metrics needed for signal comparison (RMSE, RMS, Correlation etc..). See details below
bias
: Biasrmse
: Root Mean Square Errorrms
: Root Mean Squarerms_95
: Root Mean Square for data points above 95th percentilesim_mean
: Mean of simulated valuesobs_mean
: Mean of observed valuessim_std
: Standard deviation of simulated valuesobs_std
: Standard deviation of observed valuesmae
: Mean Absolute Errormse
: Mean Square Errornse
: Nash-Sutcliffe Efficiencylamba
: Lambda indexcr
: Pearson Correlation coefficientcr_95
: Pearson Correlation coefficient for data points above 95th percentileslope
: Slope of Model/Obs correlationintercept
: Intercept of Model/Obs correlationslope_pp
: Slope of Model/Obs correlation of percentilesintercept_pp
: Intercept of Model/Obs correlation of percentilesmad
: Mean Absolute Deviationmadp
: Mean Absolute Deviation of percentilesmadc
:mad + madp
kge
: Kling–Gupta Efficiency
- The storm metrics: a PoT selection is done on the observed signal (using the
match_extremes()
function). Function returns the decreasing extreme event peak values for observed and modeled signals (and time lag between events). See details below.R1
: Difference between observed and modelled for the biggest stormR1_norm
: Normalized R1 (R1 divided by observed value)R3
: Average difference between observed and modelled for the three biggest stormsR3_norm
: Normalized R3 (R3 divided by observed value)error
: Average difference between observed and modelled for all stormserror_norm
: Normalized error (error divided by observed value)
Performance Scores (PS) or Nash-Sutcliffe Eff (NSE): $$1 - \frac{\langle (x_c - x_m)^2 \rangle}{\langle (x_m - x_R)^2 \rangle}$$
-
r
the correlation -
b
the modified bias term (see ref)$$\frac{\langle x_c \rangle - \langle x_m \rangle}{\sigma_m}$$ -
g
the std dev term$$\frac{\sigma_c}{\sigma_m}$$
- with
kappa
$$2 \cdot \left| \sum{((x_m - \overline{x}_m) \cdot (x_c - \overline{x}_c))} \right|$$
The functions uses the match_extremes()
function (detailed below) and returns:
R1
: the error for the biggest stormR3
: the mean error for the 3 biggest stormserror
: the mean error for all the storms above the threshold.R1_norm
/R3_norm
/error
: Same methodology, but values are in normalised (in %) relatively to the observed peaks.
The storm_metrics()
might return:
{'R1': np.nan,
'R1_norm': np.nan,
'R3': np.nan,
'R3_norm': np.nan,
'error': np.nan,
'error_norm': np.nan}
Example of implementation:
from seastats.storms import match_extremes
extremes_df = match_extremes(sim, obs, 0.99, cluster = 72)
extremes_df
The modeled peaks are matched with the observed peaks. Function returns a pd.DataFrame of the decreasing observed storm peaks as follows:
time observed | observed | time observed | model | time model | diff | error | error_norm | tdiff |
---|---|---|---|---|---|---|---|---|
2022-01-29 19:30:00 | 0.803 | 2022-01-29 19:30:00 | 0.565 | 2022-01-29 17:00:00 | -0.237 | 0.237 | 0.296 | -2.5 |
2022-02-20 20:30:00 | 0.639 | 2022-02-20 20:30:00 | 0.577 | 2022-02-20 20:00:00 | -0.062 | 0.062 | 0.0963 | -0.5 |
... | ||||||||
2022-11-27 15:30:00 | 0.386 | 2022-11-27 15:30:00 | 0.400 | 2022-11-27 17:00:00 | 0.014 | 0.014 | 0.036 | 1.5 |
with:
diff
the difference between modeled and observed peakserror
the absolute difference between modeled and observed peakstdiff
the time difference between modeled and observed peaks
NB: the function uses pyextremes in the background, with PoT method, using the quantile
value of the observed signal as physical threshold and passes the cluster_duration
argument.
this happens when the function storms/match_extremes.py
couldn't finc concomitent storms for the observed and modeled time series.
see notebook for details
get all metrics in a 3 liner:
from seastats import get_stats, GENERAL_METRICS_ALL, STORM_METRICS_ALL
general = get_stats(sim, obs, metrics = GENERAL_METRICS)
storm = get_stats(sim, obs, quantile = 0.99, metrics = STORM_METRICS) # we use a different quantile for PoT selection
pd.DataFrame(dict(general, **storm), index=['abed'])
bias | rmse | rms | rms_95 | sim_mean | obs_mean | sim_std | obs_std | nse | lamba | cr | cr_95 | slope | intercept | slope_pp | intercept_pp | mad | madp | madc | kge | R1 | R1_norm | R3 | R3_norm | error | error_norm | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
abed | -0.007 | 0.086 | 0.086 | 0.088 | -0 | 0.007 | 0.142 | 0.144 | 0.677 | 0.929 | 0.817 | 0.542 | 0.718 | -0.005 | 1.401 | -0.028 | 0.052 | 0.213 | 0.265 | 0.81 | 0.237364 | 0.295719 | 0.147163 | 0.207019 | 0.0938142 | 0.177533 |