scikit-learn-contrib · mirand863 · Mar 27, 2024 · Jan 9, 2024 · Jan 10, 2024 · Jan 14, 2024
diff --git a/Pipfile b/Pipfile
@@ -19,5 +19,5 @@ sphinx-rtd-theme = "0.5.2"
 
 [extras]
 ray = "*"
-shap = "*"
+shap = "0.44.1"
 xarray = "*"
diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/README.md b/README.md
@@ -199,6 +199,38 @@ pipeline.fit(X_train, Y_train)
 predictions = pipeline.predict(X_test)
 ```
 
+## Explaining Hierarchical Classifiers
+Hierarchical classifiers can provide additional insights when combined with explainability methods such as SHAP values. Below is a simple example to demonstrate how to calculate hierarchical SHAP values:
+```python
+from hiclass import LocalClassifierPerParentNode, Explainer
+from sklearn.ensemble import RandomForestClassifier
+import numpy as np
+
+# Define data
+X_train = np.array([[1], [2], [3], [4]])
+X_test = np.array([[4], [3], [2], [1]])
+Y_train = np.array([
+    ['Animal', 'Mammal', 'Sheep'],
+    ['Animal', 'Mammal', 'Cow'],
+    ['Animal', 'Reptile', 'Snake'],
+    ['Animal', 'Reptile', 'Lizard'],
+])
+
+# Use random forest classifiers for every node
+rf = RandomForestClassifier()
+classifier = LocalClassifierPerParentNode(local_classifier=rf, replace_classifiers=False)
+
+# Train local classifier per node
+classifier.fit(X_train, Y_train)
+
+# Predict
+predictions = classifier.predict(X_test)
+
+# Explain
+explainer = Explainer(classifier, data=X_train, mode="tree")
+explanations = explainer.explain(X_test)
+```
+
 ## Step-by-step walk-through
 
 A step-by-step walk-through is available on our documentation hosted on [Read the Docs](https://hiclass.readthedocs.io/en/latest/index.html).

diff --git a/docs/examples/plot_lcppn_explainer.py b/docs/examples/plot_lcppn_explainer.py
@@ -0,0 +1,45 @@
+# -*- coding: utf-8 -*-
+"""
+============================================
+Explaining Local Classifier Per Parent Node
+============================================
+
+A minimalist example showing how to use HiClass Explainer to obtain SHAP values of LCPPN model.
+A detailed summary of the Explainer class has been given at Algorithms Overview Section for :ref:`Hierarchical Explainability`.
+"""
+import numpy as np
+from sklearn.ensemble import RandomForestClassifier
+from hiclass import LocalClassifierPerParentNode, Explainer
+
+# Define data
+X_train = np.array(
+    [
+        [40.7, 1.0, 1.0, 2.0, 5.0, 2.0, 1.0, 5.0, 34.3],
+        [39.2, 0.0, 2.0, 4.0, 1.0, 3.0, 1.0, 2.0, 34.1],
+        [40.6, 0.0, 3.0, 1.0, 4.0, 5.0, 0.0, 6.0, 27.7],
+        [36.5, 0.0, 3.0, 1.0, 2.0, 2.0, 0.0, 2.0, 39.9],
+    ]
+)
+X_test = np.array([[35.5, 0.0, 1.0, 1.0, 3.0, 3.0, 0.0, 2.0, 37.5]])
+Y_train = np.array(
+    [
+        ["Gastrointestinal", "Norovirus", ""],
+        ["Respiratory", "Covid", ""],
+        ["Allergy", "External", "Bee Allergy"],
+        ["Respiratory", "Cold", ""],
+    ]
+)
+
+# Use random forest classifiers for every node
+rfc = RandomForestClassifier()
+classifier = LocalClassifierPerParentNode(
+    local_classifier=rfc, replace_classifiers=False
+)
+
+# Train local classifier per node
+classifier.fit(X_train, Y_train)
+
+# Define Explainer
+explainer = Explainer(classifier, data=X_train, mode="tree")
+explanations = explainer.explain(X_test)
+print(explanations)
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -9,3 +9,5 @@ pandas==1.4.2
 ray==1.13.0
 numpy
 git+https://github.com/charles9n/bert-sklearn.git@master
+shap==0.44.1
+xarray
diff --git a/docs/source/algorithms/explainer-indexing.png b/docs/source/algorithms/explainer-indexing.png
diff --git a/docs/source/algorithms/explainer.rst b/docs/source/algorithms/explainer.rst
@@ -0,0 +1,135 @@
+.. _explainer-overview:
+
+===========================
+Hierarchical Explainability
+===========================
+HiClass also provides support for eXplainable AI (XAI) using SHAP values. This section demonstrates the Explainer class along with examples and design principles.
+
+++++++++++++++++++++++++++
+Motivation
+++++++++++++++++++++++++++
+
+Explainability in machine learning refers to the ability to understand and interpret how a model arrives at a particular decision. Several explainability methods are available in literature which have found applications in various machine learning applications.
+
+SHAP values is one such method that provides a unified measure of feature importance that considers the contribution of each feature to the model prediction. These values are based on cooperative game theory and provide a fair way to distribute the credit for the prediction among the features.
+
+Integrating explainability methods to Hierarchical classifiers can yield promising results depending on the application domain. Hierarchical explainability extends the concept of SHAP values to hierarchical classification models.
+
+++++++++++++++++++++++++++
+Dataset overview
+++++++++++++++++++++++++++
+For the remainder of this section, we will utilize a synthetically generated dataset representing platypus diseases. This tabular dataset is created for visualizing and testing the essence of explainability using SHAP on hierarchical models. The diagram below illustrates the hierarchical structure of the dataset. With 9 symptoms as features—fever, diarrhea, stomach pain, skin rash, cough, sniffles, shortness of breath, headache, and body size—the objective is to predict the disease based on these feature values.
+
+.. figure:: ../algorithms/platypus_diseases_hierarchy.svg
+   :align: center
+   :width: 100%
+
+   Hierarchical structure of the synthetic dataset representing platypus diseases.
+
+++++++++++++++++++++++++++
+Background
+++++++++++++++++++++++++++
+This section introduces two main concepts: hierarchical classification and SHAP values. Hierarchical classification leverages the hierarchical structure of data, breaking down the classification task into manageable sub-tasks using models organized in a DAG structure.
+
+SHAP values, adapted from game theory, show the impact of features on model predictions, thus aiding model interpretation. The SHAP library offers practical implementation of these methods, supporting various machine learning algorithms for explanation generation.
+
+To demonstrate how SHAP values provide insights into model prediction, consider the following sample from the platypus disease dataset.
+
+.. code-block:: python
+
+   test_sample = np.array([[35.5,  0. ,  1. ,  1. ,  3. ,  3. ,  0. ,  2. , 37.5]])
+   sample_target = np.array([['Respiratory', 'Cold', '']])
+
+We can calculate SHAP values using the SHAP python package and visualize them. SHAP values tell us how much each symptom "contributes" to the model's decision about which disease a platypus might have. The following diagram illustrates how SHAP values can be visualized using the :literal:`shap.force_plot`
+
+
+.. figure:: ../algorithms/shap_explanation.png
+   :align: center
+   :width: 100%
+
+   Force plot illustrating the influence of symptoms on predicting platypus diseases using SHAP values. Each bar represents a symptom, with its length indicating the magnitude of impact on disease prediction.
+
+
+++++++++++++++++++++++++++
+API Design
+++++++++++++++++++++++++++
+
+Designing an API for hierarchical classifiers and SHAP value computation presents numerous challenges including complex data structures, difficulties accessing correct shap values corresponding to a classifier, and slow computation. We addressed these issues by using xarray dataset for organization, filtering, and storage of SHAP values efficiency. We also utilized parallelization using joblib for speed. These enhancements ensure a streamlined and user-friendly experience for users dealing with hierarchical classifiers and SHAP values.
+
+.. figure:: ../algorithms/explainer-indexing.png
+   :align: center
+   :width: 75%
+
+   Pictorial representation of dimensions along which indexing of hierarchical SHAP values are required.
+
+The Explainer class takes a fitted HiClass model, training data, and some named parameters as input. After creating an instance of the Explainer, the explain method can be called by providing the samples for which SHAP values need to be calculated.
+
+.. code-block:: python
+
+    explainer = Explainer(fitted_hiclass_model, data=training_data)
+
+The Explainer returns an Xarray.Dataset object which allows users to intuitively access, filter, slice, and plot SHAP values. This Explanation dataset can also be used interactively within the Jupyter notebook environment. The Explanation object along with its respective attributes are depicted in the following UML diagram.
+
+.. figure:: ../algorithms/hiclass-uml.png
+   :align: center
+   :width: 100%
+
+   UML diagram showing relationship between HiClass Explainer and the returned Explanation object.
+
+The Explanation object can be obtained calling the explain method of Explainer.
+
+.. code-block:: python
+
+    explanations = explainer.explain(sample_data)
+
+
+++++++++++++++++++++++++++
+Code sample
+++++++++++++++++++++++++++
+
+.. code-block:: python
+
+    from sklearn.ensemble import RandomForestClassifier
+    import numpy as np
+    from hiclass import LocalClassifierPerParentNode, Explainer
+
+    rfc = RandomForestClassifier()
+    lcppn = LocalClassifierPerParentNode(local_classifier=rfc, replace_classifiers=False)
+
+    x_train = np.array([
+        [40.7,  1. ,  1. ,  2. ,  5. ,  2. ,  1. ,  5. , 34.3],
+        [39.2,  0. ,  2. ,  4. ,  1. ,  3. ,  1. ,  2. , 34.1],
+        [40.6,  0. ,  3. ,  1. ,  4. ,  5. ,  0. ,  6. , 27.7],
+        [36.5,  0. ,  3. ,  1. ,  2. ,  2. ,  0. ,  2. , 39.9],
+    ])
+    y_train = np.array([
+        ['Gastrointestinal', 'Norovirus', ''],
+        ['Respiratory', 'Covid', ''],
+        ['Allergy', 'External', 'Bee Allergy'],
+        ['Respiratory', 'Cold', ''],
+    ])
+
+    x_test = np.array([[35.5,  0. ,  1. ,  1. ,  3. ,  3. ,  0. ,  2. , 37.5]])
+
+    lcppn.fit(x_train, y_train)
+    explainer = Explainer(lcppn, data=x_train, mode="tree")
+    explanations = explainer.explain(x_test)
+
+
+++++++++++++++++++++++++++
+Filtering and Manipulation
+++++++++++++++++++++++++++
+
+The explainer explanation object in SHAP is built using the xarray dataset, enabling the application of any xarray dataset operation. For example, filtering specific values can be easily done. To illustrate, suppose we have SHAP values stored in the Explanation object named :literal:`explanation`.
+
+A common use case is to extract SHAP values for only the predicted nodes. In Local Classifier per parent node approach, each node except the leaf nodes represent a classifier. Hence, to find the SHAP values we can pass the prediction until the penultimate element to obtain the SHAP values.
+To achieve this, we can use xarray's .sel() method:
+
+.. code-block:: python
+
+    mask = {'class': lcppn.predict(x_test).flatten()[:-1]}
+    x = explanations.sel(mask).shap_values
+
+More advanced usage and capabilities can be found at the `Xarray.Dataset <https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html>`_ documentation.
+
+
diff --git a/docs/source/algorithms/hiclass-uml.png b/docs/source/algorithms/hiclass-uml.png
diff --git a/docs/source/algorithms/index.rst b/docs/source/algorithms/index.rst
@@ -16,3 +16,4 @@ HiClass provides implementations for the most popular machine learning models fo
     local_classifier_per_level
     multi_label
     metrics
+    explainer
diff --git a/docs/source/algorithms/platypus_diseases_hierarchy.svg b/docs/source/algorithms/platypus_diseases_hierarchy.svg
diff --git a/docs/source/algorithms/shap_explanation.png b/docs/source/algorithms/shap_explanation.png
diff --git a/docs/source/api/explainer_api.rst b/docs/source/api/explainer_api.rst
@@ -0,0 +1,10 @@
+.. _explainer_api:
+
+Explainer
+========================
+
+Explainer
+-----------------------
+.. autoclass:: Explainer.Explainer
+    :members:
+    :special-members: __init__
diff --git a/docs/source/api/index.rst b/docs/source/api/index.rst
@@ -13,3 +13,4 @@ This is done in order to provide a complete list of the callable functions for e
 
     classifiers
     utilities
+    explainer_api
Original file line number	Diff line number	Diff line change
Expand Up		@@ -13,3 +13,4 @@ This is done in order to provide a complete list of the callable functions for e

		classifiers
		utilities
		explainer_api