Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes to seasonal decomposition and stationarity check functions #32

Merged
merged 1 commit into from
Oct 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ First column in the dataframe must be a date column ('YYYY-MM-DD') and the last
* built_seasonal_plot(df): Build seasonal plot (additive, multiplicative, IQR) for a given dataframe.
* build_monthwise_plot(df): Build month-wise plot for a given dataframe.
* build_decomposition_results(df): Get seasonal decomposition results for a given dataframe.
* conduct_stationarity_check(series): Conduct stationarity check for a feature (dataframe column).
* conduct_stationarity_check(df): Conduct stationarity check (trend) for a feature (dataframe's feature or count column).



Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "pycatcher"
version = "0.0.10"
version = "0.0.11"
authors = [
{name="Aseem Anand", email="[email protected]"},
]
Expand Down
4 changes: 2 additions & 2 deletions src/pycatcher/catch.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,8 +261,8 @@ def _decompose_and_detect(df_pandas: pd.DataFrame) -> Union[pd.DataFrame, str]:
df_pandas = df_pandas.set_index(df_pandas.columns[0]).asfreq('D').dropna()

# Decompose the series using both additive and multiplicative models
decomposition_add = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='additive')
decomposition_mul = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='multiplicative')
decomposition_add = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='additive',extrapolate_trend='freq')
decomposition_mul = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='multiplicative',extrapolate_trend='freq')

# Get residuals from both decompositions
residuals_add: pd.Series = get_residuals(decomposition_add)
Expand Down
22 changes: 17 additions & 5 deletions src/pycatcher/diagnostics.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,27 +123,39 @@ def build_monthwise_plot(df):
df_pandas = df

df_pandas['Month-Year'] = pd.to_datetime(df_pandas.iloc[:, 0]).dt.to_period('M')
df_pandas['Count'] = pd.to_numeric(df_pandas.iloc[:, 1])
df_pandas['Count'] = pd.to_numeric(df_pandas.iloc[:, -1])
plt.figure(figsize=(30, 4))
sns.boxplot(x='Month-Year', y='Count', data=df_pandas).set_title("Month-wise Box Plot")
plt.show()


def conduct_stationarity_check(series):
def conduct_stationarity_check(df):

"""
Args:
series: Pandas dataframe with feature column
df (pd.DataFrame): A Pandas DataFrame with time-series data.
First column must be a date column ('YYYY-MM-DD')
and last column should be a count/feature column.

Returns:
ADF statistics, Stationarity check. Time series are stationary if they
ADF statistics, Stationarity check (trend only). Time series are stationary if they
do not have trend or seasonal effects.
Summary statistics calculated on the time series are consistent over time,
like the mean or the variance of the observations.
"""
# Check whether the argument is Pandas dataframe
if not isinstance(df, pd.DataFrame):
# Convert to Pandas dataframe for easy manipulation
df_pandas = df.toPandas()
else:
df_pandas = df

# Ensure the last column is numeric
df_pandas.iloc[:, -1] = pd.to_numeric(df_pandas.iloc[:, -1])

logger.info("Building stationarity check")

result = sm.tsa.stattools.adfuller(series.values)
result = sm.tsa.stattools.adfuller(df_pandas.iloc[:, -1].values)

print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
Expand Down
Loading