Merge branch 'release-0.4.1'

DistrictDataLabs · May 22, 2017 · bb51d77 · bb51d77
2 parents b20705a + 5e406aa
commit bb51d77
Show file tree

Hide file tree

Showing 71 changed files with 21,104 additions and 976 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -2,6 +2,7 @@ language: python
 python:
   - '2.7'
   - '3.5'
+  - '3.6'
 
 before_install:
   - sudo apt-get build-dep python-scipy

diff --git a/DESCRIPTION.rst b/DESCRIPTION.rst
@@ -0,0 +1,62 @@
+.. -*- mode: rst -*-
+
+|Visualizers|_
+
+.. |Visualizers| image:: http://www.scikit-yb.org/en/latest/_images/visualizers.png
+.. _Visualizers: http://scikit-yb.org/
+
+Yellowbrick
+===========
+
+Yellowbrick is a suite of visual analysis and diagnostic tools designed to facilitate machine learning with Scikit-Learn. The library implements a new core API object, the "Visualizer" that is an Scikit-Learn estimator: an object that learns from data. Like transformers or models, visualizers learn from data by creating a visual representation of the model selection workflow.
+
+Visualizers allow users to steer the model selection process, building intuition around feature engineering, algorithm selection, and hyperparameter tuning. For example, visualizers can help diagnose common problems surrounding model complexity and bias, heteroscedasticity, underfit and overtraining, or class balance issues. By applying visualizers to the model selection workflow, Yellowbrick allows you to steer predictive models to more successful results, faster.
+
+Please see the full documentation at: http://scikit-yb.org/
+
+Visualizers
+-----------
+
+Visualizers are estimators (objects that learn from data) whose primary objective is to create visualizations that allow insight into the model selection process. In Scikit-Learn terms, they can be similar to transformers when visualizing the data space or wrap an model estimator similar to how the “ModelCV” (e.g. RidgeCV_, LassoCV_) methods work. The primary goal of Yellowbrick is to create a sensical API similar to Scikit-Learn. Some of our most popular visualizers include:
+
+.. _RidgeCV: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html
+.. _LassoCV: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html
+
+Feature Visualization
+~~~~~~~~~~~~~~~~~~~~~
+
+- **Rank2D**: pairwise ranking of features to detect relationships
+- **Parallel Coordinates**: horizontal visualization of instances
+- **Radial Visualization**: separation of instances around a circular plot
+
+Classification Visualization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **Class Balance**: see how the distribution of classes affects the model
+- **Classification Report**: visual representation of precision, recall, and F1
+- **ROC/AUC Curves**: receiver operator characteristics and area under the curve
+- **Confusion Matrices**: visual description of class decision making
+
+Regression Visualization
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **Prediction Error Plots**: find model breakdowns along the domain of the target
+- **Residuals Plot**: show the difference in residuals of training and test data
+- **Alpha Selection**: show how the choice of alpha influences regularization
+
+Clustering Visualization
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **K-Elbow Plot**: select k using the elbow method and various metrics
+- **Silhouette Plot**: select k by visualizing silhouette coefficient values
+
+Text Visualization
+~~~~~~~~~~~~~~~~~~
+
+- **Term Frequency**: visualize the frequency distribution of terms in the corpus
+- **TSNE**: use stochastic neighbor embedding to project documents.
+
+... and more! Visualizers are being added all the time; be sure to check the examples_ (or even the develop_ branch) and feel free to contribute your ideas for new Visualizers!
+
+.. _examples: http://www.scikit-yb.org/en/latest/examples/examples.html
+.. _develop: https://github.com/districtdatalabs/yellowbrick/tree/develop
diff --git a/DESCRIPTION.txt b/DESCRIPTION.txt
diff --git a/MAINTAINERS.md b/MAINTAINERS.md
@@ -6,13 +6,16 @@ When creating a pull request, your contribution will be reviewed by one or proba
 
 For more about how to develop visualizers and contribute features to Yellowbrick, see our [contributor's guide](CONTRIBUTING.md) and the documentation.
 
+For everyone who has [contributed](https://github.com/DistrictDataLabs/yellowbrick/graphs/contributors) in big and in small ways, **thank you!**. Yellowbrick is intended to be a community project, welcoming to new and experienced developers alike. If you would like to become a core contributor you must simply submit a pull request that shows core knowledge of the Yellowbrick library. Usually new Visualizers meet this standard; let the maintainers know you'd like to join the team, and they'll help you work toward it!
+
 ## Maintainers
 
 This is a list of the primary project maintainers. Feel free to @ message them in issues and converse with them directly.
 
 - [bbengfort](https://github.com/bbengfort)
 - [NealHumphrey](https://github.com/NealHumphrey)
 - [jkeung](https://github.com/jkeung)
+- [ndanielsen](https://github.com/ndanielsen)
 
 ## Core Contributors
 
@@ -21,7 +24,6 @@ This is a list of the core-contributors of the project. Core contributors set th
 - [rebeccabilbro](https://github.com/rebeccabilbro)
 - [mattandahalfew](https://github.com/mattandahalfew)
 - [pdamodaran](https://github.com/pdamodaran)
-- [ndanielsen](https://github.com/ndanielsen)
 - [tuulihill](https://github.com/tuulihill)
 - [balavenkatesan](https://github.com/balavenkatesan)
 - [morganmendis](https://github.com/morganmendis)
diff --git a/README.md b/README.md
@@ -55,6 +55,24 @@ Visualizers are estimators (objects that learn from data) whose primary objectiv
 
 And more! Visualizers are being added all the time, be sure to check the examples (or even the develop branch) and feel free to contribute your ideas for Visualizers!
 
+## Installing Yellowbrick
+
+Yellowbrick is compatible with Python 2.7 or later but it is preferred to use Python 3.5 or later to take full advantage of all functionality. Yellowbrick also depends on Scikit-Learn 0.18 or later and Matplotlib 1.5 or later. The simplest way to install Yellowbrick is from PyPI with pip, Python's preferred package installer.
+
+    $ pip install yellowbrick
+
+Note that Yellowbrick is an active project and routinely publishes new releases with more visualizers and updates. In order to upgrade Yellowbrick to the latest version, use pip as follows.
+
+    $ pip install -u yellowbrick
+
+You can also use the `-u` flag to update Scikit-Learn, matplotlib, or any other third party utilities that work well with Yellowbrick to their latest versions.
+
+If you're using Windows or Anaconda, you can take advantage of the conda utility to install Yellowbrick:
+
+    conda install -c districtdatalabs yellowbrick
+
+Note, however, that there is a [known bug](https://github.com/DistrictDataLabs/yellowbrick/issues/205) installing Yellowbrick on Linux with Anaconda.
+
 ## Using Yellowbrick
 
 The Yellowbrick API is specifically designed to play nicely with Scikit-Learn. Here is an example of a typical workflow sequence with Scikit-Learn and Yellowbrick:
@@ -89,7 +107,7 @@ visualizer.poof()
 
 For additional information on getting started with Yellowbrick, check out our [examples notebook](https://github.com/DistrictDataLabs/yellowbrick/blob/develop/examples/examples.ipynb).
 
-We also have a [quick start guide](https://github.com/DistrictDataLabs/yellowbrick/blob/master/docs/setup.rst).
+We also have a [quick start guide](https://github.com/DistrictDataLabs/yellowbrick/blob/master/docs/quickstart.rst).
 
 ## Contributing to Yellowbrick
 

diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -2,6 +2,45 @@
 Changelog
 =========
 
+Version 0.4.1
+-------------
+This release is an intermediate version bump in anticipation of the PyCon 2017 sprints.
+
+The primary goals of this version were to (1) update the Yellowbrick dependencies (2) enhance the Yellowbrick documentation to help orient new users and contributors, and (3) make several small additions and upgrades (e.g. pulling the Yellowbrick utils into a standalone module).
+
+We have updated the Scikit-Learn and SciPy dependencies from version 0.17.1 or later to 0.18 or later. This primarily entails moving from ``from sklearn.cross_validation import train_test_split`` to ``from sklearn.model_selection import train_test_split``.
+
+The updates to the documentation include new Quickstart and Installation guides as well as updates to the Contributors documentation, which is modeled on the Scikit-Learn contributing documentation.
+
+This version also included upgrades to the KMeans visualizer, which now supports not only ``silhouette_score`` but also ``distortion_score`` and ``calinski_harabaz_score``. The ``distortion_score`` computes the mean distortion of all samples as the sum of the squared distances between each observation and its closest centroid. This is the metric that K-Means attempts to minimize as it is fitting the model. The ``calinski_harabaz_score`` is defined as ratio between the within-cluster dispersion and the between-cluster dispersion.
+
+Finally, this release includes a prototype of the ``VisualPipeline``, which extends Scikit-Learn's ``Pipeline`` class, allowing multiple Visualizers to be chained or sequenced together.
+
+* Tag: v0.4.1_
+* Deployed: Monday, May 22, 2017
+* Contributors: Benjamin Bengfort, Rebecca Bilbro, Nathan Danielsen
+
+Changes:
+   - Score and model visualizers now wrap estimators as proxies so that all methods on the estimator can be directly accessed from the visualizer
+   - Updated Scikit-learn dependency from >=0.17.1  to >=0.18
+   - Replaced ``sklearn.cross_validation`` with ``model_selection``
+   - Updated SciPy dependency from >=0.17.1 to >=0.18
+   - ScoreVisualizer now subclasses ModelVisualizer; towards allowing both fitted and unfitted models passed to Visualizers
+   - Added CI tests for Python 3.6 compatibility
+   - Added new quickstart guide and install instructions
+   - Updates to the contributors documentation
+   - Added ``distortion_score`` and ``calinski_harabaz_score`` computations and visualizations to KMeans visualizer.
+   - Replaced the ``self.ax`` property on all of the individual ``draw`` methods with a new property on the ``Visualizer`` class that ensures all visualizers automatically have axes.
+   - Refactored the utils module into a package
+   - Continuing to update the docstrings to conform to Sphinx
+   - Added a prototype visual pipeline class that extends the Scikit-learn pipeline class to ensure that visualizers get called correctly.
+
+Bug Fixes:
+   - Fixed title bug in Rank2D FeatureVisualizer
+
+.. _v0.4.1: https://github.com/DistrictDataLabs/yellowbrick/releases/tag/v0.4.1
+
+
 Version 0.4
 -----------
 This release is the culmination of the Spring 2017 DDL Research Labs that focused on developing Yellowbrick as a community effort guided by a sprint/agile workflow. We added several more visualizers, did a lot of user testing and bug fixes, updated the documentation, and generally discovered how best to make Yellowbrick a friendly project to contribute to.

diff --git a/docs/contributing.rst b/docs/contributing.rst
@@ -4,7 +4,7 @@ Contributing
 
 Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you've never contributed to an open source project before, we hope you will start with Yellowbrick!
 
-Principally, Yellowbrick development is about the addition and creation of *visualizers* --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with Scikit-Learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is therefore a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.
+Principally, Yellowbrick development is about the addition and creation of *visualizers* --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with Scikit-Learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.
 
 Beyond creating visualizers, there are many ways to contribute:
 
@@ -28,15 +28,15 @@ Yellowbrick is hosted on GitHub at https://github.com/DistrictDataLabs/yellowbri
 The typical workflow for a contributor to the codebase is as follows:
 
 1. **Discover** a bug or a feature by using Yellowbrick.
-2. **Discuss** with the core contributes by `adding an issue <https://github.com/DistrictDataLabs/yellowbrick/issues>`_.
+2. **Discuss** with the core contributors by `adding an issue <https://github.com/DistrictDataLabs/yellowbrick/issues>`_.
 3. **Assign** yourself the task by pulling a card from our `Waffle Kanban <https://waffle.io/DistrictDataLabs/yellowbrick>`_
 4. **Fork** the repository into your own GitHub account.
 5. Create a **Pull Request** first thing to `connect with us <https://github.com/DistrictDataLabs/yellowbrick/pulls>`_ about your task.
-6. **Code** the feature, write the documentation, add your contribution.
+6. **Code** the feature, write the tests and documentation, add your contribution.
 7. **Review** the code with core contributors who will guide you to a high quality submission.
 8. **Merge** your contribution into the Yellowbrick codebase.
 
-.. note:: Create a pull request as soon as possible, even before you've started coding. This will allow the core contributors to give you advice about where to add your code or utilities and discuss other style choices and implementation details as you go. Don't wait!
+.. note:: Please create a pull request as soon as possible, even before you've started coding. This will allow the core contributors to give you advice about where to add your code or utilities and discuss other style choices and implementation details as you go. Don't wait!
 
 We believe that *contribution is collaboration* and therefore emphasize *communication* throughout the open source process. We rely heavily on GitHub's social coding tools to allow us to do this.
 
@@ -66,7 +66,7 @@ Once forked, use the following steps to get your development environment set up
 
         $ pip install -r requirements.txt
 
-    Note that there may be other dependencies required for development and testing, you can simply install them with ``pip``.
+    Note that there may be other dependencies required for development and testing; you can simply install them with ``pip``.
 
 4. Switch to the develop branch.
 
@@ -131,12 +131,12 @@ Tag the release in GitHub::
     $ git tag -a vx.x
     $ git push origin vx.x
 
-You'll have to go to the release_ page to edit the release with similar information as added to the changelog. Once done, push the release to PyPI:
+You'll have to go to the release_ page to edit the release with similar information as added to the changelog. Once done, push the release to PyPI::
 
     $ make build
     $ make deploy
 
-Check that the PyPI page is updated with the correct version and that ``pip install -U yellowbrick`` updates the version and works correctly. Also check the documentation on PyHosted, ReadTheDocs, and on our website to make sure that it was correctly updated. Finally merge the release into develop and clean up:
+Check that the PyPI page is updated with the correct version and that ``pip install -U yellowbrick`` updates the version and works correctly. Also check the documentation on PyHosted, ReadTheDocs, and on our website to make sure that it was correctly updated. Finally merge the release into develop and clean up::
 
     $ git checkout develop
     $ git merge --no-ff --no-edit release-x.x

diff --git a/docs/examples/examples.rst → docs/examples/index.rst b/docs/examples/examples.rst → docs/examples/index.rst
diff --git a/docs/examples/methods.rst b/docs/examples/methods.rst
@@ -282,7 +282,7 @@ the final step in a ``Pipeline`` or ``VisualPipeline``.
     # Regression Evaluation Imports
 
     from sklearn.linear_model import Ridge, Lasso
-    from sklearn.cross_validation import train_test_split
+    from sklearn.model_selection import train_test_split
 
     from yellowbrick.regressor import PredictionError, ResidualsPlot
 
@@ -392,7 +392,7 @@ a Pipeline or VisualPipeline.
     from sklearn.naive_bayes import GaussianNB
     from sklearn.linear_model import LogisticRegression
     from sklearn.ensemble import RandomForestClassifier
-    from sklearn.cross_validation import train_test_split
+    from sklearn.model_selection import train_test_split
 
     from yellowbrick.classifier import ClassificationReport, ROCAUC, ClassBalance
 
@@ -456,7 +456,7 @@ Scikit-Learn documentation on `confusion matrices <http://scikit-learn.org/stabl
     import yellowbrick
 
     from sklearn.datasets import load_digits
-    from sklearn.cross_validation import train_test_split
+    from sklearn.model_selection import train_test_split
     from sklearn.linear_model import LogisticRegression
 
     from yellowbrick.classifier import ConfusionMatrix

diff --git a/docs/images/quickstart/bikeshare_ols_residuals.png b/docs/images/quickstart/bikeshare_ols_residuals.png
diff --git a/docs/images/quickstart/bikeshare_rank2d.png b/docs/images/quickstart/bikeshare_rank2d.png
diff --git a/docs/images/quickstart/bikeshare_ridge_alphas.png b/docs/images/quickstart/bikeshare_ridge_alphas.png
diff --git a/docs/images/quickstart/bikeshare_ridge_prediction_error.png b/docs/images/quickstart/bikeshare_ridge_prediction_error.png
diff --git a/docs/images/quickstart/temp_feelslike_jointplot.png b/docs/images/quickstart/temp_feelslike_jointplot.png
diff --git a/docs/index.rst b/docs/index.rst
@@ -11,7 +11,7 @@ Yellowbrick: Machine Learning Visualization
 
 Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines Scikit-Learn with Matplotlib in the best tradition of the Scikit-Learn documentation, but to produce visualizations for *your* models! For more on Yellowbrick, please see the :doc:`introduction`.
 
-If you're new to Yellowbrick, checkout the :doc:`setup` or skip ahead to the :doc:`examples/examples`. Yellowbrick is a rich library with many Visualizers being added on a regular basis. For details on specific Visualizers and extended usage head over to the :doc:`api/modules`. If you've signed up to do user testing, checkout the :doc:`evaluation` (and thank you!).
+If you're new to Yellowbrick, checkout the :doc:`quickstart` or skip ahead to the :doc:`examples/index`. Yellowbrick is a rich library with many Visualizers being added on a regular basis. For details on specific Visualizers and extended usage head over to the :doc:`api/modules`. If you've signed up to do user testing, checkout the :doc:`evaluation` (and thank you!).
 
 Visualizers
 -----------
@@ -79,8 +79,8 @@ The following is a complete listing of the Yellowbrick documentation for this ve
    :maxdepth: 2
 
    introduction
-   setup
-   examples/examples
+   quickstart
+   examples/index
    api/modules
    about
    evaluation
-Original file line number
+Diff line change
@@ Expand Up / @@ -2,6 +2,7 @@ language: python @@
     python:
       - '2.7'
       - '3.5'
+      - '3.6'
     before_install:
       - sudo apt-get build-dep python-scipy
@@ Expand Down @@