Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jp changes 20241027 #36

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ First column in the dataframe must be a date column ('YYYY-MM-DD') and the last
* built_seasonal_plot(df): Build seasonal plot (additive, multiplicative, IQR) for a given dataframe.
* build_monthwise_plot(df): Build month-wise plot for a given dataframe.
* build_decomposition_results(df): Get seasonal decomposition results for a given dataframe.
* conduct_stationarity_check(series): Conduct stationarity check for a feature (dataframe column).
* conduct_stationarity_check(df): Conduct stationarity check (trend) for a feature (dataframe's feature or count column).



Expand Down
143 changes: 143 additions & 0 deletions notebooks/example_notebook.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "ca4fc2d5-f1cc-4605-a33f-91a8c9d7b703",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pycatcher in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (0.0.6)\n",
"Collecting pycatcher\n",
" Downloading pycatcher-0.0.11-py3-none-any.whl.metadata (2.3 kB)\n",
"Requirement already satisfied: pandas==2.2.3 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pycatcher) (2.2.3)\n",
"Requirement already satisfied: statsmodels==0.14.4 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pycatcher) (0.14.4)\n",
"Requirement already satisfied: pyod==2.0.2 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pycatcher) (2.0.2)\n",
"Requirement already satisfied: seaborn==0.13.2 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pycatcher) (0.13.2)\n",
"Requirement already satisfied: numpy>=1.26.0 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pandas==2.2.3->pycatcher) (2.0.2)\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pandas==2.2.3->pycatcher) (2.9.0.post0)\n",
"Requirement already satisfied: pytz>=2020.1 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pandas==2.2.3->pycatcher) (2024.2)\n",
"Requirement already satisfied: tzdata>=2022.7 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pandas==2.2.3->pycatcher) (2024.2)\n",
"Requirement already satisfied: joblib in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pyod==2.0.2->pycatcher) (1.4.2)\n",
"Requirement already satisfied: matplotlib in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pyod==2.0.2->pycatcher) (3.9.2)\n",
"Requirement already satisfied: numba>=0.51 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pyod==2.0.2->pycatcher) (0.60.0)\n",
"Requirement already satisfied: scipy>=1.5.1 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pyod==2.0.2->pycatcher) (1.14.1)\n",
"Requirement already satisfied: scikit-learn>=0.22.0 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from pyod==2.0.2->pycatcher) (1.5.2)\n",
"Requirement already satisfied: patsy>=0.5.6 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from statsmodels==0.14.4->pycatcher) (0.5.6)\n",
"Requirement already satisfied: packaging>=21.3 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from statsmodels==0.14.4->pycatcher) (24.1)\n",
"Requirement already satisfied: contourpy>=1.0.1 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from matplotlib->pyod==2.0.2->pycatcher) (1.3.0)\n",
"Requirement already satisfied: cycler>=0.10 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from matplotlib->pyod==2.0.2->pycatcher) (0.12.1)\n",
"Requirement already satisfied: fonttools>=4.22.0 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from matplotlib->pyod==2.0.2->pycatcher) (4.54.1)\n",
"Requirement already satisfied: kiwisolver>=1.3.1 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from matplotlib->pyod==2.0.2->pycatcher) (1.4.7)\n",
"Requirement already satisfied: pillow>=8 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from matplotlib->pyod==2.0.2->pycatcher) (11.0.0)\n",
"Requirement already satisfied: pyparsing>=2.3.1 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from matplotlib->pyod==2.0.2->pycatcher) (3.2.0)\n",
"Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from numba>=0.51->pyod==2.0.2->pycatcher) (0.43.0)\n",
"Requirement already satisfied: six in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from patsy>=0.5.6->statsmodels==0.14.4->pycatcher) (1.16.0)\n",
"Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/sarika/Documents/GitHub/pycatcher/venv/lib/python3.12/site-packages (from scikit-learn>=0.22.0->pyod==2.0.2->pycatcher) (3.5.0)\n",
"Downloading pycatcher-0.0.11-py3-none-any.whl (7.9 kB)\n",
"Installing collected packages: pycatcher\n",
" Attempting uninstall: pycatcher\n",
" Found existing installation: pycatcher 0.0.6\n",
" Uninstalling pycatcher-0.0.6:\n",
" Successfully uninstalled pycatcher-0.0.6\n",
"Successfully installed pycatcher-0.0.11\n",
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install pycatcher --upgrade"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "fce496a9-7356-4629-bc19-e8003220168b",
"metadata": {},
"outputs": [],
"source": [
"from pycatcher.catch import find_outliers_iqr"
]
},
{
"cell_type": "markdown",
"id": "0fc75884-6ddb-4f4b-92e6-edc7035cf648",
"metadata": {},
"source": [
"### Example 1 - Finding Outliers Using IQR"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "4bbda689-a2d9-45b4-8a32-a78bb6e658a0",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-10-27 10:21:41,211 - INFO - Detecting outliers using the IQR method.\n",
"2024-10-27 10:21:41,220 - INFO - Outliers detected: 1 rows.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"100\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"data = {\n",
" 'ID': [1, 2, 3, 4, 5],\n",
" 'Value': [10, 12, 14, 100, 15]\n",
" }\n",
"\n",
"df = pd.DataFrame(data)\n",
"\n",
"outliers = find_outliers_iqr(df)\n",
"\n",
"print(outliers['Value'].iloc[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3bf8d294-d016-4e10-85fa-16e140093585",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "pycatcher"
version = "0.0.10"
version = "0.0.12"
authors = [
{name="Aseem Anand", email="[email protected]"},
]
Expand Down Expand Up @@ -33,7 +33,8 @@ dev = [
"pytest>=8.3.3",
"pytest-mock>=3.14.0",
"coverage>=7.6.1",
"prospector>=1.12.1"
"prospector>=1.12.1",
"notebook>=7.2.2"
]

test = [
Expand Down
4 changes: 2 additions & 2 deletions src/pycatcher/catch.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,8 +261,8 @@ def _decompose_and_detect(df_pandas: pd.DataFrame) -> Union[pd.DataFrame, str]:
df_pandas = df_pandas.set_index(df_pandas.columns[0]).asfreq('D').dropna()

# Decompose the series using both additive and multiplicative models
decomposition_add = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='additive')
decomposition_mul = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='multiplicative')
decomposition_add = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='additive',extrapolate_trend='freq')
decomposition_mul = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='multiplicative',extrapolate_trend='freq')

# Get residuals from both decompositions
residuals_add: pd.Series = get_residuals(decomposition_add)
Expand Down
28 changes: 20 additions & 8 deletions src/pycatcher/diagnostics.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ def build_seasonal_plot(df):
# throughout the time series. This is often seen in indexed time series where the
# absolute value is growing but changes stay relative.

decomposition_add = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='additive')
decomposition_add = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='additive', extrapolate_trend='freq')
residuals_add = get_residuals(decomposition_add)

decomposition_mul = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='multiplicative')
decomposition_mul = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='multiplicative', extrapolate_trend='freq')
residuals_mul = get_residuals(decomposition_mul)

# Get ACF values for both Additive and Multiplicative models
Expand Down Expand Up @@ -129,21 +129,33 @@ def build_monthwise_plot(df):
plt.show()


def conduct_stationarity_check(series):
def conduct_stationarity_check(df):

"""
Args:
series: Pandas dataframe with feature column
df (pd.DataFrame): A Pandas DataFrame with time-series data.
First column must be a date column ('YYYY-MM-DD')
and last column should be a count/feature column.

Returns:
ADF statistics, Stationarity check. Time series are stationary if they
ADF statistics, Stationarity check (trend only). Time series are stationary if they
do not have trend or seasonal effects.
Summary statistics calculated on the time series are consistent over time,
like the mean or the variance of the observations.
"""
# Check whether the argument is Pandas dataframe
if not isinstance(df, pd.DataFrame):
# Convert to Pandas dataframe for easy manipulation
df_pandas = df.toPandas()
else:
df_pandas = df

# Ensure the last column is numeric
df_pandas.iloc[:, -1] = pd.to_numeric(df_pandas.iloc[:, -1])

logger.info("Building stationarity check")

result = sm.tsa.stattools.adfuller(series.values)
result = sm.tsa.stattools.adfuller(df_pandas.iloc[:, -1].values)

print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
Expand Down Expand Up @@ -192,10 +204,10 @@ def build_decomposition_results(df):
# throughout the time series. This is often seen in indexed time series where the absolute value is
# growing but changes stay relative.

decomposition_add = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='additive')
decomposition_add = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='additive',extrapolate_trend='freq')
residuals_add = get_residuals(decomposition_add)

decomposition_mul = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='multiplicative')
decomposition_mul = sm.tsa.seasonal_decompose(df_pandas.iloc[:, -1], model='multiplicative',extrapolate_trend='freq')
residuals_mul = get_residuals(decomposition_mul)

# Get ACF values for both Additive and Multiplicative models
Expand Down
Loading