Skip to content

Commit

Permalink
Merge pull request #103 from aseemanand/aanand_123124
Browse files Browse the repository at this point in the history
Fix for Matplot issue
  • Loading branch information
jpamarthi authored Jan 1, 2025
2 parents 9dbf0c3 + f94cb7b commit fce33a3
Show file tree
Hide file tree
Showing 5 changed files with 461 additions and 384 deletions.
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,15 @@ quarter level time-series data.
pip install pycatcher
```

### DataFrame Arguments
* First column in the dataframe must be a date column ('YYYY-MM-DD') and the last column a numeric column
(sum or total count for the time period) to detect outliers using Seasonal Decomposition algorithms.
### Basic Requirements
* PyCatcher expects a Pandas DataFrame as an input for various outlier detection methods. It can convert Spark DataFrame
to Pandas DataFrame at the data processing stage.
* First column in the dataframe must be a time period column (date in 'YYYY-MM-DD'/month in 'YYYY-MM'/year in 'YYYY'
format) and the last column a numeric column (sum or total count for the time period) to detect outliers using
Seasonal Decomposition algorithms.
* Last column must be a numeric column to detect outliers using Interquartile Range (IQR) and Moving Average algorithms.
* There is no need for any labeled observations (ground truth). Outliers are detected solely through
underlying algorithms (for example, seasonal-trend decomposition and dispersion methods like MAD or Z-Score).

<hr style="border:1.25px solid gray">

Expand Down
828 changes: 450 additions & 378 deletions notebooks/Example Notebook.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "pycatcher"
version = "0.0.61"
version = "0.0.62"
description = "This package identifies outlier(s) for a given time-series dataset in simple steps. It supports day, week, month and quarter level time-series data."
authors = ["Aseem Anand <[email protected]>"]
maintainers = ["Jagadish Pamarthi <[email protected]>"]
Expand Down
2 changes: 1 addition & 1 deletion src/pycatcher/catch.py
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,7 @@ def detect_outliers_classic(df: pd.DataFrame) -> Union[pd.DataFrame, str]:
Args:
df (pd.DataFrame): A Pandas DataFrame with time-series data.
First column must be a date column ('YYYY-MM-DD')
First column must be a date ('YYYY-MM-DD) /month (YYYY-MM) /year (YYYY) column
and last column should be a count/feature column.
Returns:
Expand Down
2 changes: 1 addition & 1 deletion src/pycatcher/diagnostics.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ def build_seasonal_plot_classic(df) -> str:
logging.info("Duplicate date index values. Check your data.")


def generate_seasonal_plot_classic(df, detected_period) -> Union[str, plt]:
def generate_seasonal_plot_classic(df, detected_period) -> str:
"""
Build seasonal plot for a given dataframe using classic seasonal decomposition
Args:
Expand Down

0 comments on commit fce33a3

Please sign in to comment.