Skip to content

Commit

Permalink
Revert "Added JointPlotVisualizer - Issue 101 (#174)" (#184)
Browse files Browse the repository at this point in the history
This reverts commit fba3212.
  • Loading branch information
rebeccabilbro authored Mar 29, 2017
1 parent fba3212 commit 84e6ad1
Show file tree
Hide file tree
Showing 42 changed files with 744 additions and 4,713 deletions.
58 changes: 0 additions & 58 deletions CONTRIBUTING.md

This file was deleted.

15 changes: 5 additions & 10 deletions DESCRIPTION.txt
Original file line number Diff line number Diff line change
@@ -1,20 +1,15 @@
Yellowbrick is a suite of visual analysis and diagnostic tools designed to facilitate machine learning with Scikit-Learn. The library implements a new core API object, the "Visualizer" that is an Scikit-Learn estimator: an object that learns from data. Like transformers or models, visualizers learn from data by creating a visual representation of the model selection workflow.

Visualizers allow users to steer the model selection process, building intuition around feature engineering, algorithm selection, and hyperparameter tuning. For example, visualizers can help diagnose common problems surrounding model complexity and bias, heteroscedasticity, underfit and overtraining, or class balance issues. By applying visualizers to the model selection workflow, Yellowbrick allows you to steer predictive models to more successful results, faster.
Yellowbrick is a suite of visual analysis and diagnostic tools designed to facilitate machine learning with Scikit-Learn. The package includes visualizations that can help users navigate the feature selection process, build intuition around model selection, diagnose common problems like bias, heteroscedasticity, underfit, and overtraining, and support hyperparameter tuning to steer predictive models toward more successful results.

Some of the available tools include:

- pairwise feature ranking
- histograms
- scatter plot matrices
- parallel coordinates
- radial visualization
- jointplots
- ROC curves
- classification heatmaps
- residual plots
- prediction error plots
- alpha selection plots
- validation curves
- gridsearch heatmaps
- text frequency distributions
- tsne corpus visualization

And much more! Please see the full documentation at: http://scikit-yb.org/
For more, please see the full documentation at: http://yellowbrick.readthedocs.org/en/latest/
76 changes: 34 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,56 +7,50 @@
[![Stories in Ready](https://badge.waffle.io/DistrictDataLabs/yellowbrick.png?label=ready&title=Ready)](https://waffle.io/DistrictDataLabs/yellowbrick)


**Visual analysis and diagnostic tools to facilitate machine learning model selection.**
A suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning.


![Follow the yellow brick road](docs/images/yellowbrickroad.jpg)
Image by [Quatro Cinco](https://flic.kr/p/2Yj9mj), used with permission, Flickr Creative Commons.

This README is a guide for developers, if you're new to Yellowbrick, get started at our [documentation](http://www.scikit-yb.org/).

## What is Yellowbrick?

Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines Scikit-Learn with Matplotlib in the best tradition of the Scikit-Learn documentation, but to produce visualizations for _your_ models!

![Visualizers](docs/images/visualizers.png)

### Visualizers

Visualizers are estimators (objects that learn from data) whose primary objective is to create visualizations that allow insight into the model selection process. In Scikit-Learn terms, they can be similar to transformers when visualizing the data space or wrap an model estimator similar to how the "ModelCV" (e.g. RidgeCV, LassoCV) methods work. The primary goal of Yellowbrick is to create a sensical API similar to Scikit-Learn. Some of our most popular visualizers include:

#### Feature Visualization

- Rank2D: pairwise ranking of features to detect relationships
- Parallel Coordinates: horizontal visualization of instances
- Radial Visualization: separation of instances around a circular plot

#### Classification Visualization

- Class Balance: see how the distribution of classes affects the model
- Classification Report: visual representation of precision, recall, and F1
- ROC/AUC Curves: receiver operator characteristics and area under the curve

#### Regression Visualization

- Prediction Error Plots: find model breakdowns along the domain of the target
- Residuals Plot: show the difference in residuals of training and test data
- Alpha Selection: show how the choice of alpha influences regularization

#### Text Visualization

- Term Frequency: visualize the frequency distribution of terms in the corpus
- TSNE: use stochastic neighbor embedding to project documents.

And more! Visualizers are being added all the time, be sure to check the examples (or even the develop branch) and feel free to contribute your ideas for Visualizers!
# What is Yellowbrick?
Yellowbrick is a suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning. All visualizations are generated in Matplotlib. Custom `yellowbrick` visualization tools include:

## Tools for feature analysis and selection
- Boxplots (box-and-whisker plots)
- Violinplots
- Histograms
- Scatter plot matrices (sploms)
- Radial visualizations (radviz)
- Parallel coordinates
- Jointplots
- Rank 1D
- Rank 2D

## Tools for model evaluation
### Classification
- ROC-AUC curves
- Classification heatmaps
- Class balance chart

### Regression
- Prediction error plots
- Residual plots
- Most informative features

### Clustering
- Silhouettes
- Density measures

## Tools for parameter tuning
- Validation curves
- Gridsearch heatmaps

## Using Yellowbrick

The Yellowbrick API is specifically designed to play nicely with Scikit-Learn. Here is an example of a typical workflow sequence with Scikit-Learn and Yellowbrick:

### Feature Visualization

In this example, we see how Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm, then returns them ranked as a lower left triangle diagram.

```python
from yellowbrick.features import Rank2D

Expand All @@ -67,9 +61,7 @@ visualizer.poof() # Draw/show/poof the data
```

### Model Visualization

In this example, we instantiate a Scikit-Learn classifier, and then we use Yellowbrick's ROCAUC class to visualize the tradeoff between the classifier's sensitivity and specificity.

```python
from sklearn.svm import LinearSVC
from yellowbrick.classifier import ROCAUC
Expand Down
3 changes: 1 addition & 2 deletions docs/api/modules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ API Reference
:maxdepth: 4

yellowbrick
yellowbrick.features
yellowbrick.regressor
yellowbrick.text
yellowbrick.style
yellowbrick.features
29 changes: 0 additions & 29 deletions docs/api/yellowbrick.regressor.rst

This file was deleted.

Binary file removed docs/images/alphaselect/elasticnet.png
Binary file not shown.
Binary file removed docs/images/alphaselect/lasso.png
Binary file not shown.
Binary file removed docs/images/alphaselect/lassolars.png
Binary file not shown.
Binary file removed docs/images/alphaselect/ridge.png
Binary file not shown.
Binary file removed docs/images/visualizers.png
Binary file not shown.
91 changes: 40 additions & 51 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,73 +3,62 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
===========================================
Yellowbrick: Machine Learning Visualization
===========================================
=======================================
Welcome to yellowbrick's documentation!
=======================================

.. image:: images/visualizers.png
Yellowbrick is a suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning. All visualizations are generated in Matplotlib.

Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines Scikit-Learn with Matplotlib in the best tradition of the Scikit-Learn documentation, but to produce visualizations for *your* models! For more on Yellowbrick, please see the :doc:`introduction`.
Custom ``yellowbrick`` visualization tools include:

If you're new to Yellowbrick, checkout the :doc:`setup` or skip ahead to the :doc:`examples/examples`. Yellowbrick is a rich library with many Visualizers being added on a regular basis. For details on specific Visualizers and extended usage head over to the :doc:`api/modules`. If you've signed up to do user testing, checkout the :doc:`evaluation`.
Tools for feature analysis and selection
----------------------------------------

Visualizers
-----------
- Boxplots ("box-and-whisker" plots)
- Violinplots
- Histograms
- Scatter plot matrices ("sploms")
- Radial visualizations ("radviz")
- Parallel coordinates
- Jointplots
- Rank 1D
- Rank 2D

Visualizers are estimators (objects that learn from data) whose primary objective is to create visualizations that allow insight into the model selection process. In Scikit-Learn terms, they can be similar to transformers when visualizing the data space or wrap an model estimator similar to how the "ModelCV" (e.g. RidgeCV, LassoCV) methods work. The primary goal of Yellowbrick is to create a sensical API similar to Scikit-Learn. Some of our most popular visualizers include:

Feature Visualization
~~~~~~~~~~~~~~~~~~~~~
Tools for model evaluation
--------------------------

- Rank2D: pairwise ranking of features to detect relationships
- Parallel Coordinates: horizontal visualization of instances
- Radial Visualization: separation of instances around a circular plot
Classification
^^^^^^^^^^^^^^
- ROC-AUC curves
- Classification heatmaps
- Class balance charts

Classification Visualization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Regression
^^^^^^^^^^
- Prediction error plots
- Residual plots
- Most informative features

- Class Balance: see how the distribution of classes affects the model
- Classification Report: visual representation of precision, recall, and F1
- ROC/AUC Curves: receiver operator characteristics and area under the curve
Clustering
^^^^^^^^^^
- Silhouettes
- Density measures

Regression Visualization
~~~~~~~~~~~~~~~~~~~~~~~~

- Prediction Error Plots: find model breakdowns along the domain of the target
- Residuals Plot: show the difference in residuals of training and test data
- Alpha Selection: show how the choice of alpha influences regularization
Tools for parameter tuning
--------------------------

Text Visualization
~~~~~~~~~~~~~~~~~~
- Validation curves
- Gridsearch heatmap

- Term Frequency: visualize the frequency distribution of terms in the corpus
- TSNE: use stochastic neighbor embedding to project documents.

And more! Visualizers are being added all the time, be sure to check the examples (or even the develop branch) and feel free to contribute your ideas for Visualizers!

Getting Help
------------

Yellowbrick is welcoming, inclusive project in the tradition of Matplotlib and Scikit-Learn. Similar to those projects, we try to follow the `Python Software Foundation Code of Conduct <http://www.python.org/psf/codeofconduct/>`_. Please don't hesitate to reach out to us for help or if you have any contributions or bugs to report!

We're still in the initial stages of the project, and don't necessarily have a mailing list or FAQ put together (but with your help we can build one). Ask questions on `Stack Overflow <http://stackoverflow.com/questions/tagged/yellowbrick>`_ and tag them with "yellowbrick". Or you can add issues on GitHub. You can also tweet or direct message us on Twitter `@DistrictDataLab <https://twitter.com/districtdatalab>`_.

Open Source
-----------

The Yellowbrick `license <https://github.com/DistrictDataLabs/yellowbrick/blob/master/LICENSE.txt>`_ is an open source `Apache 2.0 <http://www.apache.org/licenses/LICENSE-2.0>`_ license. Yellowbrick enjoys a very active developer community, join them and please consider `contributing <https://github.com/DistrictDataLabs/yellowbrick/blob/develop/CONTRIBUTING.md>`_!

Yellowbrick is hosted on `GitHub <https://github.com/DistrictDataLabs/yellowbrick/>`_. `Issues <https://github.com/DistrictDataLabs/yellowbrick/issues/>`_ and `Pull Requests <https://github.com/DistrictDataLabs/yellowbrick/pulls>`_ are tracked there.


=================
Table of Contents
=================

The following is a complete listing of the Yellowbrick documentation for this version of the library:
=========
Contents:
=========

.. toctree::
:maxdepth: 2
:maxdepth: 4

introduction
setup
Expand Down
Loading

0 comments on commit 84e6ad1

Please sign in to comment.