Skip to content

Commit

Permalink
Merge branch 'release-0.9'
Browse files Browse the repository at this point in the history
  • Loading branch information
bbengfort committed Nov 14, 2018
2 parents f8f96d2 + ebd0924 commit ce77386
Show file tree
Hide file tree
Showing 328 changed files with 33,237 additions and 2,430 deletions.
5 changes: 2 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,6 @@ coverage.xml
# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

Expand Down Expand Up @@ -124,3 +121,5 @@ fabric.properties
# Data downloaded from Yellowbrick
data/
.vscode/settings.json

yellowbrick/datasets/fixtures
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ The typical workflow for a contributor to the codebase is as follows:

We believe that *contribution is collaboration* and therefore emphasize *communication* throughout the open source process. We rely heavily on GitHub's social coding tools to allow us to do this.

Ideally, any pull request should be capable of resolution within 6 weeks of being opened. This timeline helps to keep our pull request queue small and allows Yellowbrick to maintain a robust release schedule to give our users the best experience possible. However, the most important thing is to keep the dialogue going! And if you're unsure whether you can complete your idea within 6 weeks, you should still go ahead and open a PR and we will be happy to help you scope it down as needed.

If we have comments or questions when we evaluate your pull request and receive no response, we will also close the PR after this period of time. Please know that this does not mean we don't value your contribution, just that things go stale. If in the future you want to pick it back up, feel free to address our original feedback and to reference the original PR in a new pull request.

### Forking the Repository

The first step is to fork the repository into your own account. This will create a copy of the codebase that you can edit and write to. Do so by clicking the **"fork"** button in the upper right corner of the Yellowbrick GitHub page.
Expand Down
7 changes: 7 additions & 0 deletions DESCRIPTION.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Classification Visualization
- **Class Prediction Error**: shows error and support in classification
- **Classification Report**: visual representation of precision, recall, and F1
- **ROC/AUC Curves**: receiver operator characteristics and area under the curve
- **Precision-Recall Curves**: precision vs recall for different probability thresholds
- **Confusion Matrices**: visual description of class decision making
- **Discrimination Threshold**: find a threshold that best separates binary classes

Expand All @@ -57,6 +58,7 @@ Clustering Visualization

- **K-Elbow Plot**: select k using the elbow method and various metrics
- **Silhouette Plot**: select k by visualizing silhouette coefficient values
- **Intercluster Distance Maps**: show relative distance and size of clusters

Model Selection Visualization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -71,6 +73,11 @@ Text Visualization
- **t-SNE Corpus Visualization**: use stochastic neighbor embedding to project documents
- **Dispersion Plot**: visualize how key terms are dispersed throughout a corpus

Target Visualization
~~~~~~~~~~~~~~~~~~~~

- **Feature Correlation**: visualize the correlation between the dependent variables and the target

... and more! Visualizers are being added all the time; be sure to check the examples_ (or even the develop_ branch) and feel free to contribute your ideas for new Visualizers!

.. _examples: http://www.scikit-yb.org/en/latest/api/index.html
Expand Down
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
[![Build Status](https://travis-ci.com/DistrictDataLabs/yellowbrick.svg?branch=develop)](https://travis-ci.com/DistrictDataLabs/yellowbrick)
[![Build status](https://ci.appveyor.com/api/projects/status/11abg00ollbdf4oy?svg=true)](https://ci.appveyor.com/project/districtdatalabs/yellowbrick)
[![Coverage Status](https://coveralls.io/repos/github/DistrictDataLabs/yellowbrick/badge.svg?branch=master)](https://coveralls.io/github/DistrictDataLabs/yellowbrick?branch=master)
[![Code Health](https://landscape.io/github/DistrictDataLabs/yellowbrick/master/landscape.svg?style=flat)](https://landscape.io/github/DistrictDataLabs/yellowbrick/master)
[![Total Alerts](https://img.shields.io/lgtm/alerts/g/DistrictDataLabs/yellowbrick.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/DistrictDataLabs/yellowbrick/alerts/)
[![Language Grade: Python](https://img.shields.io/lgtm/grade/python/g/DistrictDataLabs/yellowbrick.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/DistrictDataLabs/yellowbrick/context:python)

[![PyPI version](https://badge.fury.io/py/yellowbrick.svg)](https://badge.fury.io/py/yellowbrick)
[![Documentation Status](https://readthedocs.org/projects/yellowbrick/badge/?version=latest)](http://yellowbrick.readthedocs.io/en/latest/?badge=latest)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1206239.svg)](https://doi.org/10.5281/zenodo.1206239)
Expand Down Expand Up @@ -44,6 +46,7 @@ Visualizers are estimators (objects that learn from data) whose primary objectiv
- **Class Prediction Error**: shows error and support in classification
- **Classification Report**: visual representation of precision, recall, and F1
- **ROC/AUC Curves**: receiver operator characteristics and area under the curve
- **Precision-Recall Curves**: precision vs recall for different probability thresholds
- **Confusion Matrices**: visual description of class decision making
- **Discrimination Threshold**: find a threshold that best separates binary classes

Expand All @@ -57,6 +60,7 @@ Visualizers are estimators (objects that learn from data) whose primary objectiv

- **K-Elbow Plot**: select k using the elbow method and various metrics
- **Silhouette Plot**: select k by visualizing silhouette coefficient values
- **Intercluster Distance Maps**: show relative distance and size of clusters

#### Model Selection Visualization

Expand All @@ -69,6 +73,10 @@ Visualizers are estimators (objects that learn from data) whose primary objectiv
- **t-SNE Corpus Visualization**: use stochastic neighbor embedding to project documents.
- **Dispersion Plot**: visualize how key terms are dispersed throughout a corpus

#### Target Visualization

- **Feature Correlation**: visualize the correlation between the dependent variables and the target

And more! Visualizers are being added all the time, so be sure to check the examples (or even the develop branch) and feel free to contribute your ideas for Visualizers!

## Installing Yellowbrick
Expand Down Expand Up @@ -168,7 +176,7 @@ $ python -m tests.images -C tests/test_visualizer.py
Glob syntax can be used to move multiple files. For example to reset all the classifier tests:

```
$ python -m tests.images tests/test_classifier/*
$ python -m tests.images tests/test_classifier/*
```

Though it is recommended that specific test cases are targeted, rather than updating entire directories.
14 changes: 13 additions & 1 deletion docs/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,18 @@ However, model selection is a bit more nuanced than simply picking the "right" o

The **model selection triple** was first described in a 2015 SIGMOD_ paper by Kumar et al. In their paper, which concerns the development of next-generation database systems built to anticipate predictive modeling, the authors cogently express that such systems are badly needed due to the highly experimental nature of machine learning in practice. "Model selection," they explain, "is iterative and exploratory because the space of [model selection triples] is usually infinite, and it is generally impossible for analysts to know a priori which [combination] will yield satisfactory accuracy and/or insights."


Who is Yellowbrick for?
-----------------------

Yellowbrick ``Visualizers`` have multiple use cases:

- For data scientists, they can help evaluate the stability and predictive value of machine learning models and improve the speed of the experimental workflow.
- For data engineers, Yellowbrick provides visual tools for monitoring model performance in real world applications.
- For users of models, Yellowbrick provides visual interpretation of the behavior of the model in high dimensional feature space.
- For teachers and students, Yellowbrick is a framework for teaching and understanding a large variety of algorithms and methods.


Name Origin
-----------
The Yellowbrick package gets its name from the fictional element in the 1900 children's novel **The Wonderful Wizard of Oz** by American author L. Frank Baum. In the book, the yellow brick road is the path that the protagonist, Dorothy Gale, must travel in order to reach her destination in the Emerald City.
Expand Down Expand Up @@ -68,7 +80,7 @@ Jupyter Notebooks:
- `Data Science Delivered: ML Regression Predications <https://github.com/ianozsvald/data_science_delivered/blob/master/ml_explain_regression_prediction.ipynb>`_

Slides:
- `Machine Learning Libraries You'd Wish You'd Known About (PyData Budapest 2017) <https://speakerdeck.com/ianozsvald/machine-learning-libraries-youd-wish-youd-known-about-1>`_
- `Machine Learning Libraries You'd Wish You'd Known About (PyData Budapest 2017) <https://speakerdeck.com/ianozsvald/machine-learning-libraries-youd-wish-youd-known-about-1>`_
- `Visualizing the Model Selection Process <https://www.slideshare.net/BenjaminBengfort/visualizing-the-model-selection-process>`_
- `Visualizing Model Selection with Scikit-Yellowbrick <https://www.slideshare.net/BenjaminBengfort/visualizing-model-selection-with-scikityellowbrick-an-introduction-to-developing-visualizers>`_
- `Visual Pipelines for Text Analysis (Data Intelligence 2017) <https://speakerdeck.com/dataintelligence/visual-pipelines-for-text-analysis>`_
Expand Down
29 changes: 0 additions & 29 deletions docs/api/classifier/class_balance.py

This file was deleted.

51 changes: 0 additions & 51 deletions docs/api/classifier/class_balance.rst

This file was deleted.

4 changes: 2 additions & 2 deletions docs/api/classifier/class_prediction_error.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The class prediction error chart provides a way to quickly understand how good y
.. code:: python
from sklearn.ensemble import RandomForestClassifier
from yellowbrick.classifier import ClassPredictionError
# Instantiate the classification model and visualizer
Expand All @@ -45,7 +45,7 @@ The class prediction error chart provides a way to quickly understand how good y
API Reference
-------------

.. automodule:: yellowbrick.classifier.class_balance
.. automodule:: yellowbrick.classifier.class_prediction_error
:members: ClassPredictionError
:undoc-members:
:show-inheritance:
6 changes: 3 additions & 3 deletions docs/api/classifier/classification_report.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ The classification report visualizer displays the precision, recall, F1, and sup
]
classes = ["unoccupied", "occupied"]
# Extract the numpy arrays from the data frame
X = data[features].as_matrix()
y = data.occupancy.as_matrix()
# Extract the instances and target
X = data[features]
y = data.occupancy
# Create the train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Expand Down
39 changes: 26 additions & 13 deletions docs/api/classifier/confusion_matrix.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,36 @@
from sklearn.datasets import load_digits
from sklearn.datasets import load_digits, load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.model_selection import train_test_split as tts

from yellowbrick.classifier import ConfusionMatrix


if __name__ == '__main__':
# Load the regression data set
digits = load_digits()
X = digits.data
y = digits.target

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size =0.2, random_state=11)

digit_X = digits.data
digit_y = digits.target
d_X_train, d_X_test, d_y_train, d_y_test = tts(
digit_X, digit_y, test_size=0.2
)
model = LogisticRegression()
digit_cm = ConfusionMatrix(model, classes=[0,1,2,3,4,5,6,7,8,9])
digit_cm.fit(d_X_train, d_y_train)
digit_cm.score(d_X_test, d_y_test)
d = digit_cm.poof(outpath="images/confusion_matrix_digits.png")

#The ConfusionMatrix visualizer taxes a model
cm = ConfusionMatrix(model, classes=[0,1,2,3,4,5,6,7,8,9])

cm.fit(X_train, y_train) # Fit the training data to the visualizer
cm.score(X_test, y_test) # Evaluate the model on the test data
g = cm.poof(outpath="images/confusion_matrix.png") # Draw/show/poof the data
iris = load_iris()
iris_X = iris.data
iris_y = iris.target
iris_classes = iris.target_names
i_X_train, i_X_test, i_y_train, i_y_test = tts(
iris_X, iris_y, test_size=0.2
)
model = LogisticRegression()
iris_cm = ConfusionMatrix(
model, classes=iris_classes,
label_encoder={0: 'setosa', 1: 'versicolor', 2: 'virginica'}
)
iris_cm.fit(i_X_train, i_y_train)
iris_cm.score(i_X_test, i_y_test)
i = iris_cm.poof(outpath="images/confusion_matrix_iris.png")
31 changes: 30 additions & 1 deletion docs/api/classifier/confusion_matrix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,37 @@ scikit-learn documentation on `confusion matrices <http://scikit-learn.org/stabl
cm.poof()
.. image:: images/confusion_matrix_digits.png

.. image:: images/confusion_matrix.png

Plotting with Class Names
#########################

Class names can be added to a `ConfusionMatrix` plot using the `label_encoder` argument. The `label_encoder` can be a `sklearn.preprocessing.LabelEncoder <http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html>`_ (or anything with an `inverse_transform` method that performs the mapping), or a `dict` with the encoding-to-string mapping as in the example below:

.. code:: python
iris = load_iris()
X = iris.data
y = iris.target
classes = iris.target_names
X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2)
model = LogisticRegression()
iris_cm = ConfusionMatrix(
model, classes=classes,
label_encoder={0: 'setosa', 1: 'versicolor', 2: 'virginica'}
)
iris_cm.fit(X_train, y_train)
iris_cm.score(X_test, y_test)
iris_cm.poof()
.. image:: images/confusion_matrix_iris.png


API Reference
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/api/classifier/images/class_balance.png
Binary file not shown.
Binary file removed docs/api/classifier/images/confusion_matrix.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/api/classifier/images/rocauc_binary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/api/classifier/images/rocauc_multiclass.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 11 additions & 7 deletions docs/api/classifier/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,16 @@ Classification models attempt to predict a target in a discrete space, that is a
- :doc:`classification_report`: A visual classification report that displays precision, recall, and F1 per-class as a heatmap.
- :doc:`confusion_matrix`: A heatmap view of the confusion matrix of pairs of classes in multi-class classification.
- :doc:`rocauc`: Graphs the receiver operating characteristics and area under the curve.
- :doc:`class_balance`: Visual inspection of the target to show the support of each class to the final estimator.
- :doc:`prcurve`: Plots the precision and recall for different probability thresholds.
- :doc:`../target/class_balance`: Visual inspection of the target to show the support of each class to the final estimator.
- :doc:`class_prediction_error`: An alternative to the confusion matrix that shows both support and the difference between actual and predicted classes.
- :doc:`threshold`: Shows precision, recall, f1, and queue rate over all thresholds for binary classifiers that use a discrimination probability or score.

Estimator score visualizers wrap scikit-learn estimators and expose the
Estimator API such that they have fit(), predict(), and score() methods
that call the appropriate estimator methods under the hood. Score
Estimator API such that they have ``fit()``, ``predict()``, and ``score()``
methods that call the appropriate estimator methods under the hood. Score
visualizers can wrap an estimator and be passed in as the final step in
a Pipeline or VisualPipeline.
a ``Pipeline`` or ``VisualPipeline``.

.. code:: python
Expand All @@ -27,8 +28,11 @@ a Pipeline or VisualPipeline.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from yellowbrick.classifier import ClassificationReport, ROCAUC
from yellowbrick.classifier import ClassBalance, ClassPredictionError
from yellowbrick.target import ClassBalance
from yellowbrick.classifier import ROCAUC
from yellowbrick.classifier import PrecisionRecallCurve
from yellowbrick.classifier import ClassificationReport
from yellowbrick.classifier import ClassPredictionError
from yellowbrick.classifier import DiscriminationThreshold
.. toctree::
Expand All @@ -37,6 +41,6 @@ a Pipeline or VisualPipeline.
classification_report
confusion_matrix
rocauc
class_balance
prcurve
class_prediction_error
threshold
Loading

0 comments on commit ce77386

Please sign in to comment.