Skip to content

Implementation of "Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods"

Notifications You must be signed in to change notification settings

MarcSpeckmann/Fooling-LIME-and-SHAP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fooling LIME and SHAP

Post-hoc explanation techniques that rely on input pertubations, such as LIME and SHAP, are not reliable towards systematic errors and underlying biases. In this project, the scaffolding technique from Slack et al. should be re-implemented, which effectively should hide the biases of any given classifier.

Installation

  1. Clone the repository

    git clone https://github.com/automl-classroom/iml-ws21-projects-fool_the_lemon.git
  2. Create the environment

    cd iml-ws21-projects-fool_the_lemon
    conda env create -f "environment.yml"
    conda activate iML-project
  3. Run experiments

    To run the experiments notebooks start a jupyterlab server.

    How to install jupyterlab: https://github.com/jupyterlab/jupyterlab

    jupyter-lab .

    The seed for the experiments can be changed. For this, only the seed at the beginning of the notebook has to be changed.

Experiments

Reproduction (10)

Implement the approach by writting a simple interface/framework and confirm yiur implementation by using any (tabular) raciscm dataset (e.g. Boston Housing)

Extension (10)

Additionally to LIME and SHAP, incoporate PDP and analyse if it is fool-able, too.

Analysis (5)

Use different perturbation approaches and compare the impact on being fooled.

Hyperparameter Sensitivity (10)

Analyze the impact of the hyperparameters of LIME and SHAP (e.g., hyperparameters of the local model and of the pertubation algorithms).

New Datasets (5)

Find at least two further (tabular) datasets with a risk of discrimination (that are not mentioned in the paper and study the impact of fooling on them.

Datasets

Limitations / Further improvement

  • The current framework can only deal with regression and binary classification tasks
  • Only one biased input feature can get hidden
  • Only numerical features are considered
  • Currently, 3 perturbation algorithms are implemented

About

Implementation of "Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published