diff --git a/.travis.yml b/.travis.yml index ad48c9055..459d22b1b 100644 --- a/.travis.yml +++ b/.travis.yml @@ -28,12 +28,14 @@ env: - TEST=issue PANDAS="<1" - TEST=console PANDAS="<1" - TEST=examples PANDAS="<1" - - TEST=unit PANDAS=">=1" - - TEST=issue PANDAS=">=1" - - TEST=console PANDAS=">=1" - - TEST=examples PANDAS=">=1" - - TEST=lint PANDAS=">=1" - - TEST=typing PANDAS=">=1" + - TEST=unit PANDAS="==1.0.5" + - TEST=issue PANDAS="==1.0.5" + - TEST=unit PANDAS=">=1.1" + - TEST=issue PANDAS=">=1.1" + - TEST=console PANDAS=">=1.1" + - TEST=examples PANDAS=">=1.1" + - TEST=lint PANDAS=">=1.1" + - TEST=typing PANDAS=">=1.1" before_install: - pip install --upgrade pip setuptools wheel diff --git a/README.md b/README.md index e2360a668..f486db83b 100644 --- a/README.md +++ b/README.md @@ -27,32 +27,16 @@ For each column the following statistics - if relevant for the column type - are ## Announcements -### Version v2.8.0 released +### Version v2.9.0 released -News for users working with image datasets: ``pandas-profiling`` now has build-in supports for Files and Images. -Moreover, the text analysis features have also been reworked, providing more informative statistics. +The release candidate for v2.9.0 was already out for a while, now v2.9.0 is finally released. See the changelog below to know what has changed. -For a better feel, have a look at the [examples](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/examples.html#showcasing-specific-features) section in the docs or read the changelog for a complete view of the changes. +### Spark backend in progress -### Version v2.7.0 released +We can happily announce that we're working on a Spark backend for generating profile reports. +Stay tuned. -#### Performance - -There were several performance regressions pointed out to me recently when comparing 1.4.1 to 2.6.0. -To that end, we benchmarked the code and found several minor features introducing disproportionate computational complexity. -Version 2.7.0 optimizes these, giving significant performance improvements! -Moreover, the default configuration is tweaked for towards the needs of the average user. - -#### Phased builds and lazy loading - -A report is built in phases, which allows for new exciting features such as caching, only re-rendering partial reports and lazily computing the report. -Moreover, the progress bar provides more information on the building phase and step. - -#### Documentation - -This version introduces [more elaborate documentation](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/index.html) powered by Sphinx. The previously used pdoc3 has been adequate initially, however misses functionality and extensibility. Several recurring topics are now documented, for instance the configuration parameters are documented and there are pages on big datasets, sensitive data, integrations and resources. - -#### Support `pandas-profiling` +### Support `pandas-profiling` The development of ``pandas-profiling`` relies completely on contributions. If you find value in the package, we welcome you to support the project through [GitHub Sponsors](https://github.com/sponsors/sbrugman)! @@ -60,18 +44,17 @@ It's extra exciting that GitHub **matches your contribution** for the first year Find more information here: - - [Changelog v2.7.0](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html#changelog-v2-7-0) - - [Changelog v2.8.0](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html#changelog-v2-8-0) + - [Changelog v2.9.0](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html#changelog-v2-9-0) - [Sponsor the project on GitHub](https://github.com/sponsors/sbrugman) - *May 7, 2020 💘* + *September 2, 2020 💘* --- _Contents:_ **[Examples](#examples)** | **[Installation](#installation)** | **[Documentation](#documentation)** | **[Large datasets](#large-datasets)** | **[Command line usage](#command-line-usage)** | -**[Advanced usage](#advanced-usage)** | +**[Advanced usage](#advanced-usage)** | **[Support](#supporting-open-source)** | **[Types](#types)** | **[How to contribute](#contributing)** | **[Editor Integration](#editor-integration)** | **[Dependencies](#dependencies)** @@ -97,7 +80,7 @@ Specific features: * [Orange prices](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/united_report.html) and [Coal prices](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/flatly_report.html) (showcases report themes) Tutorials: -* [Tutorial: report structure using Kaggle data (advanced)](https://pandas-profiling.github.io/pandas-profiling/examples/master/tutorials/modify_report_structure.ipynb) (modify the report's structure) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/kaggle/modify_report_structure.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Fkaggle%2Fmodify_report_structure.ipynb) +* [Tutorial: report structure using Kaggle data (advanced)](https://pandas-profiling.github.io/pandas-profiling/examples/master/tutorials/modify_report_structure.ipynb) (modify the report's structure) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/tutorials/modify_report_structure.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Ftutorials%2Fmodify_report_structure.ipynb) ## Installation @@ -237,19 +220,36 @@ profile = df.profile_report(title='Pandas Profiling Report', plot={'histogram': profile.to_file("output.html") ``` +# Supporting open source + +Maintaining and developing the open-source code for pandas-profiling, with millions of downloads and thousands of users, would not be possible with support of our gracious sponsors. + + + + + + +
+ +Lambda Labs + + + +[Lambda workstations](https://lambdalabs.com/), servers, laptops, and cloud services power engineers and researchers at Fortune 500 companies and 94% of the top 50 universities. [Lambda Cloud](https://lambdalabs.com/service/gpu-cloud) offers 4 & 8 GPU instances starting at $1.50 / hr. Pre-installed with TensorFlow, PyTorch, Ubuntu, CUDA, and cuDNN. + +
+ +We would like to thank our generous Github Sponsors supporters who make pandas-profiling possible: + + Martin Sotir, Joseph Yuen, Brian Lee, Stephanie Rivera, nscsekhar, abdulAziz + +More info if you would like to appear here: [Github Sponsor page](https://github.com/sponsors/sbrugman) + + ## Types Types are a powerful abstraction for effective data analysis, that goes beyond the logical data types (integer, float etc.). -`pandas-profiling` currently recognizes the following types: - -- Boolean -- Numerical -- Date -- Categorical -- URL -- Path -- File -- Image +`pandas-profiling` currently recognizes the following types: _Boolean, Numerical, Date, Categorical, URL, Path, File_ and _Image_. We have developed a type system for Python, tailored for data analysis: [visions](https://github.com/dylan-profiler/visions). Selecting the right typeset drastically reduces the complexity the code of your analysis. diff --git a/docsrc/assets/lambda-labs.png b/docsrc/assets/lambda-labs.png new file mode 100644 index 000000000..0ce557a4d Binary files /dev/null and b/docsrc/assets/lambda-labs.png differ diff --git a/docsrc/source/_static/streamlit-integration.gif b/docsrc/source/_static/streamlit-integration.gif new file mode 100644 index 000000000..6f0ddc257 Binary files /dev/null and b/docsrc/source/_static/streamlit-integration.gif differ diff --git a/docsrc/source/pages/changelog.rst b/docsrc/source/pages/changelog.rst index 24a95c361..91808a905 100644 --- a/docsrc/source/pages/changelog.rst +++ b/docsrc/source/pages/changelog.rst @@ -2,6 +2,8 @@ Changelog ========= +.. include:: changelog/v2_9_0.rst + .. include:: changelog/v2_9_0rc1.rst .. include:: changelog/v2_8_0.rst diff --git a/docsrc/source/pages/changelog/v2_9_0.rst b/docsrc/source/pages/changelog/v2_9_0.rst new file mode 100644 index 000000000..befb88ca9 --- /dev/null +++ b/docsrc/source/pages/changelog/v2_9_0.rst @@ -0,0 +1,24 @@ +Changelog v2.9.0 +---------------- + +🎉 Features +^^^^^^^^^^^ +- Description per variable now possible (see the metadata page) or the Census example. + +🐛 Bug fixes +^^^^^^^^^^^^ +- Fixed bug for small DataFrames with unused categories. +- Fixed bug where parallelization would have side effects. +- Removed warning where colormap was modified in place. +- Distinguish between unique and distinct correctly. + +📖 Documentation +^^^^^^^^^^^^^^^^ +- Extend documentation for frequent issues. +- Extended documentation for Streamlit and Panel. +- Provide visibility to our supporters. + +⬆️ Dependencies +^^^^^^^^^^^^^^^^^^ +- Pandas 1.1.0 contains bugs that make it incompatible. Please up- or downgrade. +- Upgraded visions to 0.5.0. \ No newline at end of file diff --git a/docsrc/source/pages/installation.rst b/docsrc/source/pages/installation.rst index 48d9d5290..671dfba2d 100644 --- a/docsrc/source/pages/installation.rst +++ b/docsrc/source/pages/installation.rst @@ -57,7 +57,7 @@ This creates a new conda environment containing the module. .. hint:: - Don't forget to specify the ``conda-forge`` channel. Omitting it won't result in an error, as an outdated package lives on the main channel. + Don't forget to specify the ``conda-forge`` channel. Omitting it won't result in an error, as an outdated package lives on the main channel. See `frequent issues `_ Jupyter notebook/lab -------------------- diff --git a/docsrc/source/pages/integrations.rst b/docsrc/source/pages/integrations.rst index c10966201..be2aa5ffd 100644 --- a/docsrc/source/pages/integrations.rst +++ b/docsrc/source/pages/integrations.rst @@ -101,11 +101,37 @@ Ensure to install ``pyqt5``. Via pip use the extras ``app``: pip install pandas-profiling[app] +Streamlit +~~~~~~~~~ -Streamlit / Panel -~~~~~~~~~~~~~~~~~ +`Streamlit ` is an open-source Python library made to build web-apps for machine learning and data science. -For more information of how to use ``pandas-profiling`` with Streamlit or Panel, see the https://github.com/streamlit/streamlit/issues/693 and https://github.com/pandas-profiling/pandas-profiling/issues/491. +.. image:: ../_static/streamlit-integration.gif + +.. code-block:: python + + import pandas as pd + import pandas_profiling + import streamlit as st + from streamlit_pandas_profiling import st_profile_report + + df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv") + pr = df.profile_report() + + st.title("Pandas Profiling in Streamlit") + st.write(df) + st_profile_report(pr) + +You can install this `Pandas Profiling component ` for Streamlit with pip: + +.. code-block:: console + + pip install streamlit-pandas-profiling + +Panel +~~~~~ + +For more information on how to use ``pandas-profiling`` in Panel, see https://github.com/pandas-profiling/pandas-profiling/issues/491 and the Pandas Profiling example at https://awesome-panel.org. Cloud Integrations ------------------ @@ -133,12 +159,14 @@ Kaggle Pipeline Integrations --------------------- -With the Python, Command-line and Jupyter interfaces, `pandas-profiling` integrates seamlessly with DAG execution tools as Airflow, dagser, Kedro, prefect and any other you can think of. -Integration with `dagser `_ or `prefect `_ can be achieved in a similar way as Airflow. +With Python, command-line and Jupyter interfaces, `pandas-profiling` integrates seamlessly with DAG execution tools like Airflow, Dagster, Kedro and Prefect. + +Integration with `Dagster `_ or `Prefect `_ can be achieved in a similar way as with Airflow. Airflow ~~~~~~~ + Integration with Airflow can be easily achieved through the `BashOperator `_ or the `PythonOperator `_. .. code-block:: python diff --git a/docsrc/source/pages/metadata.rst b/docsrc/source/pages/metadata.rst index a6dac12b4..9d1c8e0fb 100644 --- a/docsrc/source/pages/metadata.rst +++ b/docsrc/source/pages/metadata.rst @@ -2,6 +2,8 @@ Metadata ======== +Dataset metadata +---------------- When sharing reports with coworkers or publishing online, you might want to include metadata of the dataset, such as author, copyright holder or a description. The supported properties are inspired by `https://schema.org/Dataset `_. Currently supported are: "description", "creator", "author", "url", "copyright_year", "copyright_holder". The following example generates a report with a "description", "copyright_holder" and "copyright_year", "creator" and "url". @@ -19,3 +21,59 @@ You can find these properties in the "Overview" section under the "About" tab. ), ) report.to_file(Path("stata_auto_report.html")) + +Descriptions per variable +------------------------- +In addition to providing dataset details, users often would like to include column-specific descriptions when sharing reports with team members and stakeholders. This section provides two code examples how to do this in pandas-profiling. + +.. code-block:: python + :caption: Generate a report with descriptions per variable + + profile = df.profile_report( + variables={ + 'descriptions': + { + 'files': 'Files in the filesystem', + 'datec': 'Creation date', + 'datem': 'Modification date', + } + ) + ) + + profile.to_file("report.html") + + +This alternative example demonstrates how you could load the definitions from a json file. +By default, the descriptions are presented in the overview tab and next to each variable. + +.. code-block:: json + :caption: dataset_column_definition.json + + { + "column name 1": "column 1 definition", + "column name 2": "column 2 definition" + } + +.. code-block:: python + :caption: Generate a report with descriptions per variable from a definitions file + + import json + import pandas as pd + import pandas_profiling + + definition_file = 'dataset_column_definition.json' + + # Read the variable descriptions + with open(definition_file, 'r') as f: + definitions = json.load(f) + + # By default, the descriptions are presented in the overview tab and next to each variable + report = df.profile_report(variable=dict(descriptions=definitions)) + + # We can disable showing the descriptions next to each variable + report = df.profile_report( + variable=dict(descriptions=definitions), + show_variable_description=False + ) + + report.to_file('report.html') \ No newline at end of file diff --git a/docsrc/source/pages/support.rst b/docsrc/source/pages/support.rst index 99db1a78b..2b04a7fa0 100644 --- a/docsrc/source/pages/support.rst +++ b/docsrc/source/pages/support.rst @@ -10,9 +10,20 @@ First, we need to know whether a problem is actually a bug in the code, or that Frequent issues ~~~~~~~~~~~~~~~ -- This thread discusses `conda installing older versions `_ of the package. +Conda install defaults to v1.4.1 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- When in a Jupyter environment, you see some text, such as ``IntSlider(value=0)`` or interactive ``(children=(IntSlider(value=0, description='x', max=1), Output()), _dom_classes=('widget-interact',))``, then the Jupyter Widgets are not activated. The :doc:`installation` page contains instructions on how to resolve this problem. +Some users experience that ``conda install -c conda-forge pandas-profiling`` defaults to 1.4.1. + +More details, `here `_, `here `__ and `here `__. + +Jupyter "IntSlider(value=0)" +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +When in a Jupyter environment, you see some text, such as ``IntSlider(value=0)`` or interactive ``(children=(IntSlider(value=0, description='x', max=1), Output()), _dom_classes=('widget-interact',))``, then the Jupyter Widgets are not activated. The :doc:`installation` page contains instructions on how to resolve this problem. + + +Help on Stackoverflow +--------------------- Users with a request for help on how to use `pandas-profiling` should consider asking their question on stackoverflow. There is a specific tag for `pandas-profiling`: diff --git a/examples/census/census.py b/examples/census/census.py index 93d4044e2..e63460dda 100644 --- a/examples/census/census.py +++ b/examples/census/census.py @@ -1,3 +1,4 @@ +import json from pathlib import Path import numpy as np @@ -38,5 +39,24 @@ # Prepare missing values df = df.replace("\\?", np.nan, regex=True) + # Initialize the report profile = ProfileReport(df, title="Census Dataset", explorative=True) + + # show column definition + definitions = json.load(open(f"census_column_definition.json")) + profile.set_variable( + "dataset", + { + "description": 'Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0)). Prediction task is to determine whether a person makes over 50K a year.', + "copyright_year": "1996", + "author": "Ronny Kohavi and Barry Becker", + "creator": "Barry Becker", + "url": "https://archive.ics.uci.edu/ml/datasets/adult", + }, + ) + profile.set_variable("variables.descriptions", definitions) + + # Only show the descriptions in the overview + profile.set_variable("show_variable_description", False) + profile.to_file(Path("./census_report.html")) diff --git a/examples/census/census_column_definition.json b/examples/census/census_column_definition.json new file mode 100644 index 000000000..c2c51ae23 --- /dev/null +++ b/examples/census/census_column_definition.json @@ -0,0 +1,16 @@ +{ + "age": "definition 0", + "workclass": "definition 1", + "fnlwgt": "definition 2", + "education": "definition 3", + "education-num": "definition 4", + "marital-status": "definition 5", + "occupation": "definition 6", + "relationship": "definition 7", + "race": "definition 8", + "sex": "definition 9", + "capital-gain": "definition 10", + "capital-loss": "definition 11", + "hours-per-week": "definition 12", + "native-country": "definition 13" +} \ No newline at end of file diff --git a/examples/meteorites/meteorites.ipynb b/examples/meteorites/meteorites.ipynb index 3aaf7816b..862a9f46d 100644 --- a/examples/meteorites/meteorites.ipynb +++ b/examples/meteorites/meteorites.ipynb @@ -8,6 +8,48 @@ "Source of data: https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The autoreload instruction reloads modules automatically before code execution, which is helpful for the update below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%load_ext autoreload\n", + "%autoreload 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Make sure that we have the latest version of pandas-profiling." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sys\n", + "!{sys.executable} -m pip install -U pandas-profiling[notebook]\n", + "!jupyter nbextension enable --py widgetsnbextension" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You might want to restart the kernel now." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -122,9 +164,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": false - }, + "metadata": {}, "outputs": [], "source": [ "profile_report = df.profile_report(explorative=True, html={'style': {'full_width': True}})\n", @@ -164,9 +204,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.6" + "version": "3.7.3" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/examples/titanic/titanic.ipynb b/examples/titanic/titanic.ipynb index 4c51c91fa..1910d02f8 100644 --- a/examples/titanic/titanic.ipynb +++ b/examples/titanic/titanic.ipynb @@ -1,8 +1,50 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The autoreload instruction reloads modules automatically before code execution, which is helpful for the update below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%load_ext autoreload\n", + "%autoreload 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Make sure that we have the latest version of pandas-profiling." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sys\n", + "!{sys.executable} -m pip install -U pandas-profiling[notebook]\n", + "!jupyter nbextension enable --py widgetsnbextension" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You might want to restart the kernel now." + ] + }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -20,7 +62,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -34,178 +76,9 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f55a3d55e80a44cc880299eeb6d871a5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='variables', max=12.0, style=ProgressStyle(description_wid…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "026a28431f4d491bae3d0702f3a864d0", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='correlations', max=6.0, style=ProgressStyle(description_w…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "dd8198fe966b4aacad4491a6d4c11082", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='interactions [continuous]', max=25.0, style=ProgressStyle…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f812ab363c5343d691563d369ed04238", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='table', max=1.0, style=ProgressStyle(description_width='i…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "bebe44defe3a404da6d653358f60bdb5", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='missing', max=4.0, style=ProgressStyle(description_width=…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d4daa4b7a51b42f6854a542a21ed900d", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='warnings', max=3.0, style=ProgressStyle(description_width…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "0f257a5e5e244a0888def801edd34802", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='package', max=1.0, style=ProgressStyle(description_width=…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "acf570f6b69d439b9119bdae0ad33079", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='build report structure', max=1.0, style=ProgressStyle(des…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], + "outputs": [], "source": [ "# Generate the Profiling Report\n", "profile = ProfileReport(df, title=\"Titanic Dataset\", html={'style': {'full_width': True}}, sort=\"None\")" @@ -213,36 +86,9 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d46e593fab414029bdfe2ec82674f2dc", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(value='Number of va…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Report generated with pandas-profiling." - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "# The Notebook Widgets Interface\n", "profile.to_widgets()" @@ -250,23725 +96,9 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "# Or use the HTML report in an iframe\n", "profile" diff --git a/examples/tutorials/modify_report_structure.ipynb b/examples/tutorials/modify_report_structure.ipynb index 64c89cf8d..98e35a319 100644 --- a/examples/tutorials/modify_report_structure.ipynb +++ b/examples/tutorials/modify_report_structure.ipynb @@ -8,9 +8,51 @@ "In this notebook we have a look at two use cases in which we modify the existing report structure: splitting up large reports and reordering the report sections. Both use cases are based on actual user inquiries. The datasets used in this notebook are obtained using the `kaggle` api. If you haven't done so already, you should set up the [api credentials](https://github.com/Kaggle/kaggle-api#api-credentials)." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The autoreload instruction reloads modules automatically before code execution, which is helpful for the update below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%load_ext autoreload\n", + "%autoreload 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Make sure that we have the latest version of pandas-profiling." + ] + }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sys\n", + "!{sys.executable} -m pip install -U pandas-profiling[notebook]\n", + "!jupyter nbextension enable --py widgetsnbextension" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You might want to restart the kernel now." + ] + }, + { + "cell_type": "code", + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -34,7 +76,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -57,178 +99,9 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "673acaa3312846d8b9fa9fc3d9bc287b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='variables', max=25.0, style=ProgressStyle(description_wid…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "91c50df6d25b4333b4e098c7758eb3f2", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='correlations', max=6.0, style=ProgressStyle(description_w…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "f01038c422374f69bac07178e5948b86", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='interactions [continuous]', max=36.0, style=ProgressStyle…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "db47e7b4731b414eb470a5fbd3cfce38", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='table', max=1.0, style=ProgressStyle(description_width='i…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "d2a169db30184faa9af8fe91ed82562b", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='missing', max=4.0, style=ProgressStyle(description_width=…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "474e8282604c42f18f409b7d4bc8ecd7", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='warnings', max=3.0, style=ProgressStyle(description_width…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4dd1ed45d9434906bd3af3db2036f4bf", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='package', max=1.0, style=ProgressStyle(description_width=…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "a8e5c44d80494a15838c7a6470ce65c4", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "HBox(children=(FloatProgress(value=0.0, description='build report structure', max=1.0, style=ProgressStyle(des…" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], + "outputs": [], "source": [ "import pandas as pd\n", "from pandas_profiling import ProfileReport\n", @@ -249,17 +122,9 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Sequence(name=Report)\n" - ] - } - ], + "outputs": [], "source": [ "print(repr(vehicles_report.report))" ] @@ -273,17 +138,9 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'items': [Sequence(name=Overview), Sequence(name=Variables), Sequence(name=Interactions), Collapse, Sequence(name=Missing values), Sequence(name=Sample)], 'name': 'Report'}\n" - ] - } - ], + "outputs": [], "source": [ "print(vehicles_report.report.content)" ] @@ -304,7 +161,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -321,31532 +178,16 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "