Skip to content
This repository has been archived by the owner on Sep 16, 2024. It is now read-only.

Latest commit

 

History

History
132 lines (80 loc) · 4.19 KB

README.md

File metadata and controls

132 lines (80 loc) · 4.19 KB

pyIDS

pyIDS is a custom implementation of IDS (Interpretable Decision Sets) algorithm introduced in

LAKKARAJU, Himabindu; BACH, Stephen H.; LESKOVEC, Jure. Interpretable decision sets: A joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016. p. 1675-1684.

If you find this package useful in your research, please cite our paper on this Interpretable Decision Sets Implementation:

Jiri Filip, Tomas Kliegr. PyIDS - Python Implementation of Interpretable Decision Sets Algorithm by Lakkaraju et al, 2016. RuleML+RR2019@Rule Challenge 2019. http://ceur-ws.org/Vol-2438/paper8.pdf

Installation

The pyarc, pandas, scipy and numpy packages need to be installed before using pyIDS.

All of these packages can be installed using pip.

For pyarc, please refer to the Installation section of its README file.

Examples

training a simple IDS model

import pandas as pd
from pyids.data_structures.ids_classifier import IDS, mine_CARs

from pyarc.qcba.data_structures import QuantitativeDataFrame


df = pd.read_csv("./data/iris0.csv")
cars = mine_CARs(df, rule_cutoff=50)
lambda_array = [1, 1, 1, 1, 1, 1, 1]

quant_dataframe = QuantitativeDataFrame(df)

ids = IDS()
ids.fit(quant_dataframe=quant_dataframe, class_association_rules=cars, lambda_array=lambda_array, debug=False)

acc = ids.score(quant_dataframe)

training a One-vs-all IDS model

import pandas as pd
from pyids.data_structures.ids_classifier import IDSOneVsAll, mine_CARs

from pyarc.qcba.data_structures import QuantitativeDataFrame


df = pd.read_csv("./data/iris0.csv")

quant_dataframe = QuantitativeDataFrame(df)

ids = IDSOneVsAll()
ids.fit(quant_dataframe=quant_dataframe, debug=False)

acc = ids.score_auc(quant_dataframe)

optimizing for best lambda parameters using coordinate ascent, as described in the original paper

import pandas as pd
from pyids.data_structures.ids_classifier import IDS, mine_IDS_ruleset
from pyids.model_selection import CoordinateAscentOptimizer, train_test_split_pd

from pyarc.qcba.data_structures import QuantitativeDataFrame


df = pd.read_csv("./data/titanic.csv")
df_train, df_test = train_test_split_pd(df, prop=0.2)

ids_ruleset = mine_IDS_ruleset(df_train, rule_cutoff=50)

quant_dataframe_train = QuantitativeDataFrame(df_train)
quant_dataframe_test = QuantitativeDataFrame(df_test)

coordinate_ascent = CoordinateAscentOptimizer(IDS(), debug=True, maximum_delta_between_iterations=200, maximum_score_estimation_iterations=3)
coordinate_ascent.fit(ids_ruleset, quant_dataframe_train, quant_dataframe_test)

best_lambda_array = coordinate_ascent.current_best_params

or optimizing a One-vs-all IDS model

import pandas as pd
from pyids.data_structures.ids_classifier import IDSOneVsAll, mine_IDS_ruleset
from pyids.model_selection import CoordinateAscentOptimizer, train_test_split_pd

from pyarc.qcba.data_structures import QuantitativeDataFrame


df = pd.read_csv("./data/iris0.csv")
df_train, df_test = train_test_split_pd(df, prop=0.2)

ids_ruleset = mine_IDS_ruleset(df_train, rule_cutoff=50)

quant_dataframe_train = QuantitativeDataFrame(df_train)
quant_dataframe_test = QuantitativeDataFrame(df_test)

coordinate_ascent = CoordinateAscentOptimizer(IDSOneVsAll(), debug=True, maximum_delta_between_iterations=200, maximum_score_estimation_iterations=3)
coordinate_ascent.fit(ids_ruleset, quant_dataframe_train, quant_dataframe_test)

best_lambda_array = coordinate_ascent.current_best_params

using k-fold cross validation with AUC score

import pandas as pd
from pyids.data_structures.ids_classifier import IDSOneVsAll

dataframes = [ pd.read_csv("./data/iris{}.csv".format(i)) for i in range(10)]

kfold = KFoldCV(IDSOneVsAll(), dataframes, score_auc=True)
scores = kfold.fit(rule_cutoff=50)

using k-fold cross validation with accuracy score

import pandas as pd
from pyids.ids_classifier import IDS

dataframes = [ pd.read_csv("./data/iris{}.csv".format(i)) for i in range(10)]

kfold = KFoldCV(IDS(), dataframes)
scores = kfold.fit(rule_cutoff=50)