Skip to content

Commit

Permalink
auto extend mixin classes
Browse files Browse the repository at this point in the history
Automatically load the RandomForestExplainer and XGBExplainer mixin
classes, whenever a RandomForest of xgboost model
is passed.
  • Loading branch information
oegedijk committed Sep 27, 2020
1 parent 40e6ad0 commit 6af36e7
Show file tree
Hide file tree
Showing 10 changed files with 145 additions and 42 deletions.
Binary file modified .DS_Store
Binary file not shown.
11 changes: 5 additions & 6 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,22 @@
XGBCLassifierExplainer and XGBRegressionExplainer.
- new parameter n_jobs for calculations that can be parallelized (e.g. permutation importances)
- contrib_df, plot_shap_contributions: can order by global shap feature
'importance' (as well as 'abs', 'high-to-low', 'low-to-high')
'importance' (as well as 'abs', 'high-to-low' and 'low-to-high')
- added actual outcome to plot_trees

### Bug Fixes
-
-

### Improvements

- added selenium integration tests for dashboards (also working with github actions)
- added tests for multiclass classsification, DecisionTree and ExtraTrees models
- added proper docstrings to explainer_methods.py
- optimized code for calculating permutation importance, adding possibility to calculate in parallel
- shap dependence component: if no color col selected, output standard blue dots instead of ignoring update

### Other Changes
-
- added selenium integration tests for dashboards (also working with github actions)
- added tests for multiclass classsification, DecisionTree and ExtraTrees models
- added tests for XGBExplainers
- added proper docstrings to explainer_methods.py

## Version 0.2.2

Expand Down
5 changes: 0 additions & 5 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
- add target name
- add plain language explanations


## notebooks:
- add binder/colab links on github

Expand All @@ -48,11 +47,9 @@
- test model_output='probability' and 'raw' or 'logodds' seperately
- write tests for explainer_methods
- write tests for explainer_plots
- write tests for XGBoostExplainers
- add test coverage

## Docs:
- add documentation for XGBExplainers
- add docstrings to explainer_plots
- add screenshots of components to docs
- move screenshots to separate folder
Expand All @@ -62,8 +59,6 @@


## Library level:
- Add launch from colab option (mode='colab'?):
- https://amitness.com/2020/06/google-colaboratory-tips/?s=03
- Add Altair (vega) plots for easy inclusion in websites or fastpages blogs
- Long term: add option to load from directory with pickled model, data csv and config file
- add more screenshots to README with https://postimages.org/
Expand Down
Binary file modified docs/.DS_Store
Binary file not shown.
14 changes: 8 additions & 6 deletions docs/source/dashboards.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,22 +94,23 @@ However it would be easy to turn this custom ``FeatureListTab`` into a proper
Starting a multitab dashboard
-----------------------------

Besided using the booleans as described above, you can also pass a list of
Besided the single page dashboard above you can also pass a list of
``ExplainerComponents`` to construct multiple tabs. These can be a mix of
the different types discussed above. E.g.::

ExplainerDashboard(explainer, [ImportancesTab, imp_tab, "importances", features]).run()

This would start a dashboard with three importances tabs, plus our custom
feature list tab. (not sure why you would do that, but you get the point :)
feature list tab. (not sure why you would do that, but hopefully you get the point :)


Using Dash or JupyterDash
-------------------------
Using explainerdashboard inside Jupyter notebook or google colab
----------------------------------------------------------------

You can start the dashboard with the standard ``dash.Dash()`` server or with the
new notebook friendly ``jupyter_dash`` library server. The latter will allow you
new notebook friendly ``jupyter_dash`` server. The latter will allow you
to keep working interactively in your notebook while the dashboard is running.
Also, this allows you to run an explainerdashboard from within google colab!

The default dash server is started with ``mode='dash'``. There are three different
options for ``jupyter_dash``: ``mode='inline'`` for running the dashboard in an
Expand All @@ -136,7 +137,7 @@ it with the ``external_stylesheets`` parameter. Additional info on styling boots
layout can be found at: https://dash-bootstrap-components.opensource.faculty.ai/docs/themes/

You can add a theme by putting it in an ``/assets/`` subfolder, or by linking to it directly.
`dash_bootstrap_components` offer a convenient way of inserting these::
``dash_bootstrap_components`` offer a convenient way of inserting these::

import dash_bootstrap_components as dbc
ExplainerDashboard(explainer, ["contributions", "model_summary"],
Expand Down Expand Up @@ -190,6 +191,7 @@ You then start the dashboard on the commandline with::

gunicorn dashboard:server

See the deployment section for more info on using explainerdashboard in production.

ExplainerDashboard documentation
--------------------------------
Expand Down
42 changes: 35 additions & 7 deletions docs/source/deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ server but use more robust and scalable options like ``gunicorn`` and ``nginx``.
Deploying a single dashboard instance
=====================================

``Dash`` is built on top of ``Flask``, and so the dashbaord instance
``Dash`` is built on top of ``Flask``, and so the dashboard instance
contains a Flask server. You can simply expose this server to host your dashboard.

The server can be found in ``ExplainerDashboard().app.server`` or with
Expand All @@ -21,21 +21,17 @@ The code below is from `the deployed example to heroku <https://github.com/oeged
from explainerdashboard.dashboards import *
from explainerdashboard.datasets import *

print('loading data...')
X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()

print('fitting model...')
model = RandomForestClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)

print('building Explainer...')
explainer = RandomForestClassifierExplainer(model, X_test, y_test,
cats=['Sex', 'Deck', 'Embarked'],
idxs=test_names,
labels=['Not survived', 'Survived'])

print('Building ExplainerDashboard...')
db = ExplainerDashboard(explainer)

server = db.app.server
Expand All @@ -55,7 +51,6 @@ to preload the app before starting::
gunicorn -w 3 --preload localhost:8050 dashboard:server



Deploying dashboard as part of Flask app on specific route
==========================================================

Expand Down Expand Up @@ -84,7 +79,40 @@ Now you can start the dashboard by::

And you can visit the dashboard on ``http://localhost:8050/dashboard``.

Avoid timeout by precalculating explainers and loading with joblib
==================================================================

Some of the calculations in order to generate e.g. the SHAP values and permutation
importances can take quite a longtime (especially shap interaction values).
Long enough the break the startup timeout of gunicorn. Therefore it is better
to first calculate all these values, save the explainer to disk, and then load
the explainer when starting the dashboard::

import joblib
from explainerdashboard.explainer import ClassifierExplainer
explainer = ClassifierExplainer(model, X_test, y_test,
cats=['Sex', 'Deck', 'Embarked'],
labels=['Not survived', 'Survived'])
explainer.calculate_properties()
joblib.dump(explainer, "explainer.pkl")

Then in ``dashboard.py`` load the explainer and start the dashboard::

import joblib
from explainerdashboard.dashboards import ExplainerDashboard

explainer = joblib.load("explainer.pkl")
db = ExplainerDashboard(clas_explainer)
server = db.app.server

And start the thing with gunicorn::

gunicorn -b localhost:8050 dashboard:server


Deploying as part of a multipage dash app
=========================================

**Under Construction**
**Under Construction**

74 changes: 59 additions & 15 deletions docs/source/explainers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,15 @@ Simple example
==============

In order to start an ``ExplainerDashboard`` you first need to construct an
``Explainer`` instance. They come in four flavours and at its most basic they
``Explainer`` instance. They come in six flavours and at its most basic they
only need a model, and a test set X and y::

explainer = ClassifierExplainer(model, X_test, y_test)
explainer = RegressionExplainer(model, X_test, y_test)
explainer = RandomForestClassifierExplainer(model, X_test, y_test)
explainer = RandomForestRegressionExplainer(model, X_test, y_test)
explainer = XGBClassifierExplainer(model, X_test, y_test)
explainer = XGBRegressionExplainer(model, X_test, y_test)

This is enough to launch an ExplainerDashboard::

Expand Down Expand Up @@ -78,7 +80,7 @@ For the titanic example this would be:
So you would pass ``cats=['Sex', 'Deck', 'Embarked']``. You can now use these
categorical features directly as input for plotting methods, e.g.
``explainer.plot_shap_dependence("Deck")``. For other methods you can pass
a parameter ``shap=True``, to indicate that you'd like to group the categorical
a parameter ``cats=True``, to indicate that you'd like to group the categorical
features in your output.

idxs
Expand All @@ -89,7 +91,7 @@ If you pass these the the Explainer object, you can index using both the
numerical index, e.g. ``explainer.contrib_df(0)`` for the first row, or using the
identifier, e.g. ``explainer.contrib_df("Braund, Mr. Owen Harris")``.

The proper name or idx will be use used in all ``ExplainerComponents`` that
The proper name or idxs will be use used in all ``ExplainerComponents`` that
allow index selection.

descriptions
Expand Down Expand Up @@ -169,7 +171,7 @@ LogisticRegression::
permutation_cv
--------------

Normally permuation importances get calculated over a single fold (assuming the
Normally permutation importances get calculated over a single fold (assuming the
data is the test set). However if you pass the training set to the explainer,
you may wish to cross-validate calculate the permutation importances. In that
case pass the number of folds to ``permutation_cv``.
Expand All @@ -179,7 +181,7 @@ na_fill

If you fill missing values with some extreme value such as ``-999`` (typical for
tree based methods), these can mess with the horizontal axis of your plots.
In order to filter these out, you need to tell the expaliner what the extreme value
In order to filter these out, you need to tell the explainer what the extreme value
is that you used to fill. Defaults to ``-999``.

Plots
Expand Down Expand Up @@ -336,12 +338,12 @@ plot_residuals_vs_feature
.. automethod:: explainerdashboard.explainers.RegressionExplainer.plot_residuals_vs_feature


RandomForest Plots
DecisionTree Plots
------------------

There is an additional mixin class specifically for ``sklearn`` ``RandomForests``
that defines additional methods and plots to investigate and visualize
individual decision trees within the random forest. ``RandomForestExplainer``
There are additional mixin classes specifically for ``sklearn`` ``RandomForests``
and for xgboost models that define additional methods and plots to investigate and visualize
individual decision trees within the ensemblke. These
uses the ``dtreeviz`` library to visualize individual decision trees.

You can get a pd.DataFrame summary of the path that a specific index row took
Expand All @@ -350,7 +352,7 @@ You can also plot the individual predictions of each individual tree for
specific row in your data indentified by ``index``::

explainer.decisiontree_df(tree_idx, index)
explainer.decisiontree_df_summary(tree_idx, index)
explainer.decisiontree_summary_df(tree_idx, index)
explainer.plot_trees(index)

And for dtreeviz visualization of individual decision trees (svg format)::
Expand All @@ -359,10 +361,14 @@ And for dtreeviz visualization of individual decision trees (svg format)::
explainer.decision_path_file(tree_idx, index)
explainer.decision_path_encoded(tree_idx, index)

This also works with classifiers and regression models::
These methods are not available with the standard ``ClassifierExplainer`` and
``RegressionExplainer`` classes so you need to instantiate with the specific
Explainer classes for these models::

explainer = RandomForestClassifierExplainer(model, X, y)
explainer = RandomForestRegressionExplainer(model, X, y)
explainer = XGBClassifierExplainer(model, X, y)
explainer = XGBRegressionExplainer(model, X, y)


plot_trees
Expand Down Expand Up @@ -514,7 +520,7 @@ RandomForest outputs
For ``RandomForestExplainer``::

decisiontree_df(tree_idx, index, pos_label=None)
decisiontree_df_summary(tree_idx, index, round=2, pos_label=None)
decisiontree_summary_df(tree_idx, index, round=2, pos_label=None)
decision_path_file(tree_idx, index)


Expand All @@ -523,10 +529,10 @@ decisiontree_df

.. automethod:: explainerdashboard.explainers.RandomForestExplainer.decisiontree_df

decisiontree_df_summary
decisiontree_summary_df
^^^^^^^^^^^^^^^^^^^^^^^

.. automethod:: explainerdashboard.explainers.RandomForestExplainer.decisiontree_df_summary
.. automethod:: explainerdashboard.explainers.RandomForestExplainer.decisiontree_summary_df

decision_path_file
^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -654,8 +660,14 @@ More examples in the `notebook on the github repo. <https://github.com/oegedijk/
RandomForestExplainer
=====================

The ``RandomForestExplainer`` mixin class provides additional functionality
in order to explore individual decision trees within the RandomForest.
This can be very useful for showing stakeholders that a RandomForest is
indeed just a collection of simple decision trees that you then calculate
the average off.

.. autoclass:: explainerdashboard.explainers.RandomForestExplainer
:members: decisiontree_df, decisiontree_df_summary, plot_trees, decision_path
:members: decisiontree_df, decisiontree_summary_df, plot_trees, decision_path
:member-order: bysource
:exclude-members:
:noindex:
Expand All @@ -677,4 +689,36 @@ RandomForestRegressionExplainer
:noindex:


XGBExplainer
============

The ``XGBExplainer`` mixin class provides additional functionality
in order to explore individual decision trees within an xgboost ensemble model.
This can be very useful for showing stakeholders that a xgboost is
indeed just a collection of simple decision trees that get summed together.


.. autoclass:: explainerdashboard.explainers.XGBExplainer
:members: decisiontree_df, decisiontree_summary_df, plot_trees, decision_path
:member-order: bysource
:exclude-members:
:noindex:

XGBClassifierExplainer
----------------------

.. autoclass:: explainerdashboard.explainers.XGBClassifierExplainer
:member-order: bysource
:exclude-members: __init__
:noindex:

XGBRegressionExplainer
----------------------

.. autoclass:: explainerdashboard.explainers.XGBRegressionExplainer
:member-order: bysource
:exclude-members: __init__
:noindex:



2 changes: 1 addition & 1 deletion docs/source/inline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ InlineExplainer
As datascientists you often work inside a notebook environment where you
quickly interactively like to explore your data. The ``InlineExplainer`` allows
you to do this by running ``ExplainerComponents`` (or whole tabs) inline
inside your Jupyter notebook.
inside your Jupyter notebook (also works in google colab!).

.. image:: inline_screenshot.png

Expand Down
Loading

0 comments on commit 6af36e7

Please sign in to comment.