Skip to content

Commit

Permalink
Merge branch 'release-0.4.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
bbengfort committed May 22, 2017
2 parents b20705a + 5e406aa commit bb51d77
Show file tree
Hide file tree
Showing 71 changed files with 21,104 additions and 976 deletions.
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ language: python
python:
- '2.7'
- '3.5'
- '3.6'

before_install:
- sudo apt-get build-dep python-scipy
Expand Down
62 changes: 62 additions & 0 deletions DESCRIPTION.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
.. -*- mode: rst -*-
|Visualizers|_

.. |Visualizers| image:: http://www.scikit-yb.org/en/latest/_images/visualizers.png
.. _Visualizers: http://scikit-yb.org/

Yellowbrick
===========

Yellowbrick is a suite of visual analysis and diagnostic tools designed to facilitate machine learning with Scikit-Learn. The library implements a new core API object, the "Visualizer" that is an Scikit-Learn estimator: an object that learns from data. Like transformers or models, visualizers learn from data by creating a visual representation of the model selection workflow.

Visualizers allow users to steer the model selection process, building intuition around feature engineering, algorithm selection, and hyperparameter tuning. For example, visualizers can help diagnose common problems surrounding model complexity and bias, heteroscedasticity, underfit and overtraining, or class balance issues. By applying visualizers to the model selection workflow, Yellowbrick allows you to steer predictive models to more successful results, faster.

Please see the full documentation at: http://scikit-yb.org/

Visualizers
-----------

Visualizers are estimators (objects that learn from data) whose primary objective is to create visualizations that allow insight into the model selection process. In Scikit-Learn terms, they can be similar to transformers when visualizing the data space or wrap an model estimator similar to how the “ModelCV” (e.g. RidgeCV_, LassoCV_) methods work. The primary goal of Yellowbrick is to create a sensical API similar to Scikit-Learn. Some of our most popular visualizers include:

.. _RidgeCV: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html
.. _LassoCV: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html

Feature Visualization
~~~~~~~~~~~~~~~~~~~~~

- **Rank2D**: pairwise ranking of features to detect relationships
- **Parallel Coordinates**: horizontal visualization of instances
- **Radial Visualization**: separation of instances around a circular plot

Classification Visualization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- **Class Balance**: see how the distribution of classes affects the model
- **Classification Report**: visual representation of precision, recall, and F1
- **ROC/AUC Curves**: receiver operator characteristics and area under the curve
- **Confusion Matrices**: visual description of class decision making

Regression Visualization
~~~~~~~~~~~~~~~~~~~~~~~~

- **Prediction Error Plots**: find model breakdowns along the domain of the target
- **Residuals Plot**: show the difference in residuals of training and test data
- **Alpha Selection**: show how the choice of alpha influences regularization

Clustering Visualization
~~~~~~~~~~~~~~~~~~~~~~~~

- **K-Elbow Plot**: select k using the elbow method and various metrics
- **Silhouette Plot**: select k by visualizing silhouette coefficient values

Text Visualization
~~~~~~~~~~~~~~~~~~

- **Term Frequency**: visualize the frequency distribution of terms in the corpus
- **TSNE**: use stochastic neighbor embedding to project documents.

... and more! Visualizers are being added all the time; be sure to check the examples_ (or even the develop_ branch) and feel free to contribute your ideas for new Visualizers!

.. _examples: http://www.scikit-yb.org/en/latest/examples/examples.html
.. _develop: https://github.com/districtdatalabs/yellowbrick/tree/develop
20 changes: 0 additions & 20 deletions DESCRIPTION.txt

This file was deleted.

4 changes: 3 additions & 1 deletion MAINTAINERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,16 @@ When creating a pull request, your contribution will be reviewed by one or proba

For more about how to develop visualizers and contribute features to Yellowbrick, see our [contributor's guide](CONTRIBUTING.md) and the documentation.

For everyone who has [contributed](https://github.com/DistrictDataLabs/yellowbrick/graphs/contributors) in big and in small ways, **thank you!**. Yellowbrick is intended to be a community project, welcoming to new and experienced developers alike. If you would like to become a core contributor you must simply submit a pull request that shows core knowledge of the Yellowbrick library. Usually new Visualizers meet this standard; let the maintainers know you'd like to join the team, and they'll help you work toward it!

## Maintainers

This is a list of the primary project maintainers. Feel free to @ message them in issues and converse with them directly.

- [bbengfort](https://github.com/bbengfort)
- [NealHumphrey](https://github.com/NealHumphrey)
- [jkeung](https://github.com/jkeung)
- [ndanielsen](https://github.com/ndanielsen)

## Core Contributors

Expand All @@ -21,7 +24,6 @@ This is a list of the core-contributors of the project. Core contributors set th
- [rebeccabilbro](https://github.com/rebeccabilbro)
- [mattandahalfew](https://github.com/mattandahalfew)
- [pdamodaran](https://github.com/pdamodaran)
- [ndanielsen](https://github.com/ndanielsen)
- [tuulihill](https://github.com/tuulihill)
- [balavenkatesan](https://github.com/balavenkatesan)
- [morganmendis](https://github.com/morganmendis)
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,24 @@ Visualizers are estimators (objects that learn from data) whose primary objectiv

And more! Visualizers are being added all the time, be sure to check the examples (or even the develop branch) and feel free to contribute your ideas for Visualizers!

## Installing Yellowbrick

Yellowbrick is compatible with Python 2.7 or later but it is preferred to use Python 3.5 or later to take full advantage of all functionality. Yellowbrick also depends on Scikit-Learn 0.18 or later and Matplotlib 1.5 or later. The simplest way to install Yellowbrick is from PyPI with pip, Python's preferred package installer.

$ pip install yellowbrick

Note that Yellowbrick is an active project and routinely publishes new releases with more visualizers and updates. In order to upgrade Yellowbrick to the latest version, use pip as follows.

$ pip install -u yellowbrick

You can also use the `-u` flag to update Scikit-Learn, matplotlib, or any other third party utilities that work well with Yellowbrick to their latest versions.

If you're using Windows or Anaconda, you can take advantage of the conda utility to install Yellowbrick:

conda install -c districtdatalabs yellowbrick

Note, however, that there is a [known bug](https://github.com/DistrictDataLabs/yellowbrick/issues/205) installing Yellowbrick on Linux with Anaconda.

## Using Yellowbrick

The Yellowbrick API is specifically designed to play nicely with Scikit-Learn. Here is an example of a typical workflow sequence with Scikit-Learn and Yellowbrick:
Expand Down Expand Up @@ -89,7 +107,7 @@ visualizer.poof()

For additional information on getting started with Yellowbrick, check out our [examples notebook](https://github.com/DistrictDataLabs/yellowbrick/blob/develop/examples/examples.ipynb).

We also have a [quick start guide](https://github.com/DistrictDataLabs/yellowbrick/blob/master/docs/setup.rst).
We also have a [quick start guide](https://github.com/DistrictDataLabs/yellowbrick/blob/master/docs/quickstart.rst).

## Contributing to Yellowbrick

Expand Down
39 changes: 39 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,45 @@
Changelog
=========

Version 0.4.1
-------------
This release is an intermediate version bump in anticipation of the PyCon 2017 sprints.

The primary goals of this version were to (1) update the Yellowbrick dependencies (2) enhance the Yellowbrick documentation to help orient new users and contributors, and (3) make several small additions and upgrades (e.g. pulling the Yellowbrick utils into a standalone module).

We have updated the Scikit-Learn and SciPy dependencies from version 0.17.1 or later to 0.18 or later. This primarily entails moving from ``from sklearn.cross_validation import train_test_split`` to ``from sklearn.model_selection import train_test_split``.

The updates to the documentation include new Quickstart and Installation guides as well as updates to the Contributors documentation, which is modeled on the Scikit-Learn contributing documentation.

This version also included upgrades to the KMeans visualizer, which now supports not only ``silhouette_score`` but also ``distortion_score`` and ``calinski_harabaz_score``. The ``distortion_score`` computes the mean distortion of all samples as the sum of the squared distances between each observation and its closest centroid. This is the metric that K-Means attempts to minimize as it is fitting the model. The ``calinski_harabaz_score`` is defined as ratio between the within-cluster dispersion and the between-cluster dispersion.

Finally, this release includes a prototype of the ``VisualPipeline``, which extends Scikit-Learn's ``Pipeline`` class, allowing multiple Visualizers to be chained or sequenced together.

* Tag: v0.4.1_
* Deployed: Monday, May 22, 2017
* Contributors: Benjamin Bengfort, Rebecca Bilbro, Nathan Danielsen

Changes:
- Score and model visualizers now wrap estimators as proxies so that all methods on the estimator can be directly accessed from the visualizer
- Updated Scikit-learn dependency from >=0.17.1 to >=0.18
- Replaced ``sklearn.cross_validation`` with ``model_selection``
- Updated SciPy dependency from >=0.17.1 to >=0.18
- ScoreVisualizer now subclasses ModelVisualizer; towards allowing both fitted and unfitted models passed to Visualizers
- Added CI tests for Python 3.6 compatibility
- Added new quickstart guide and install instructions
- Updates to the contributors documentation
- Added ``distortion_score`` and ``calinski_harabaz_score`` computations and visualizations to KMeans visualizer.
- Replaced the ``self.ax`` property on all of the individual ``draw`` methods with a new property on the ``Visualizer`` class that ensures all visualizers automatically have axes.
- Refactored the utils module into a package
- Continuing to update the docstrings to conform to Sphinx
- Added a prototype visual pipeline class that extends the Scikit-learn pipeline class to ensure that visualizers get called correctly.

Bug Fixes:
- Fixed title bug in Rank2D FeatureVisualizer

.. _v0.4.1: https://github.com/DistrictDataLabs/yellowbrick/releases/tag/v0.4.1


Version 0.4
-----------
This release is the culmination of the Spring 2017 DDL Research Labs that focused on developing Yellowbrick as a community effort guided by a sprint/agile workflow. We added several more visualizers, did a lot of user testing and bug fixes, updated the documentation, and generally discovered how best to make Yellowbrick a friendly project to contribute to.
Expand Down
14 changes: 7 additions & 7 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Contributing

Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you've never contributed to an open source project before, we hope you will start with Yellowbrick!

Principally, Yellowbrick development is about the addition and creation of *visualizers* --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with Scikit-Learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is therefore a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.
Principally, Yellowbrick development is about the addition and creation of *visualizers* --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with Scikit-Learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.

Beyond creating visualizers, there are many ways to contribute:

Expand All @@ -28,15 +28,15 @@ Yellowbrick is hosted on GitHub at https://github.com/DistrictDataLabs/yellowbri
The typical workflow for a contributor to the codebase is as follows:

1. **Discover** a bug or a feature by using Yellowbrick.
2. **Discuss** with the core contributes by `adding an issue <https://github.com/DistrictDataLabs/yellowbrick/issues>`_.
2. **Discuss** with the core contributors by `adding an issue <https://github.com/DistrictDataLabs/yellowbrick/issues>`_.
3. **Assign** yourself the task by pulling a card from our `Waffle Kanban <https://waffle.io/DistrictDataLabs/yellowbrick>`_
4. **Fork** the repository into your own GitHub account.
5. Create a **Pull Request** first thing to `connect with us <https://github.com/DistrictDataLabs/yellowbrick/pulls>`_ about your task.
6. **Code** the feature, write the documentation, add your contribution.
6. **Code** the feature, write the tests and documentation, add your contribution.
7. **Review** the code with core contributors who will guide you to a high quality submission.
8. **Merge** your contribution into the Yellowbrick codebase.

.. note:: Create a pull request as soon as possible, even before you've started coding. This will allow the core contributors to give you advice about where to add your code or utilities and discuss other style choices and implementation details as you go. Don't wait!
.. note:: Please create a pull request as soon as possible, even before you've started coding. This will allow the core contributors to give you advice about where to add your code or utilities and discuss other style choices and implementation details as you go. Don't wait!

We believe that *contribution is collaboration* and therefore emphasize *communication* throughout the open source process. We rely heavily on GitHub's social coding tools to allow us to do this.

Expand Down Expand Up @@ -66,7 +66,7 @@ Once forked, use the following steps to get your development environment set up

$ pip install -r requirements.txt

Note that there may be other dependencies required for development and testing, you can simply install them with ``pip``.
Note that there may be other dependencies required for development and testing; you can simply install them with ``pip``.

4. Switch to the develop branch.

Expand Down Expand Up @@ -131,12 +131,12 @@ Tag the release in GitHub::
$ git tag -a vx.x
$ git push origin vx.x

You'll have to go to the release_ page to edit the release with similar information as added to the changelog. Once done, push the release to PyPI:
You'll have to go to the release_ page to edit the release with similar information as added to the changelog. Once done, push the release to PyPI::

$ make build
$ make deploy

Check that the PyPI page is updated with the correct version and that ``pip install -U yellowbrick`` updates the version and works correctly. Also check the documentation on PyHosted, ReadTheDocs, and on our website to make sure that it was correctly updated. Finally merge the release into develop and clean up:
Check that the PyPI page is updated with the correct version and that ``pip install -U yellowbrick`` updates the version and works correctly. Also check the documentation on PyHosted, ReadTheDocs, and on our website to make sure that it was correctly updated. Finally merge the release into develop and clean up::

$ git checkout develop
$ git merge --no-ff --no-edit release-x.x
Expand Down
File renamed without changes.
6 changes: 3 additions & 3 deletions docs/examples/methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ the final step in a ``Pipeline`` or ``VisualPipeline``.
# Regression Evaluation Imports
from sklearn.linear_model import Ridge, Lasso
from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
from yellowbrick.regressor import PredictionError, ResidualsPlot
Expand Down Expand Up @@ -392,7 +392,7 @@ a Pipeline or VisualPipeline.
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
from yellowbrick.classifier import ClassificationReport, ROCAUC, ClassBalance
Expand Down Expand Up @@ -456,7 +456,7 @@ Scikit-Learn documentation on `confusion matrices <http://scikit-learn.org/stabl
import yellowbrick
from sklearn.datasets import load_digits
from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from yellowbrick.classifier import ConfusionMatrix
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/quickstart/bikeshare_rank2d.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/quickstart/bikeshare_ridge_alphas.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Yellowbrick: Machine Learning Visualization

Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines Scikit-Learn with Matplotlib in the best tradition of the Scikit-Learn documentation, but to produce visualizations for *your* models! For more on Yellowbrick, please see the :doc:`introduction`.

If you're new to Yellowbrick, checkout the :doc:`setup` or skip ahead to the :doc:`examples/examples`. Yellowbrick is a rich library with many Visualizers being added on a regular basis. For details on specific Visualizers and extended usage head over to the :doc:`api/modules`. If you've signed up to do user testing, checkout the :doc:`evaluation` (and thank you!).
If you're new to Yellowbrick, checkout the :doc:`quickstart` or skip ahead to the :doc:`examples/index`. Yellowbrick is a rich library with many Visualizers being added on a regular basis. For details on specific Visualizers and extended usage head over to the :doc:`api/modules`. If you've signed up to do user testing, checkout the :doc:`evaluation` (and thank you!).

Visualizers
-----------
Expand Down Expand Up @@ -79,8 +79,8 @@ The following is a complete listing of the Yellowbrick documentation for this ve
:maxdepth: 2

introduction
setup
examples/examples
quickstart
examples/index
api/modules
about
evaluation
Expand Down
Loading

0 comments on commit bb51d77

Please sign in to comment.