-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
6fd99d0
commit 0941e46
Showing
119 changed files
with
852,702 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
env/ | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*,cover | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
*~ | ||
*/.*~ | ||
.*/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
## Data Wrangling with Python | ||
|
||
Welcome to the code repository for [Data Wrangling with Python](http://shop.oreilly.com/product/0636920032861.do)! We hope you find the code and data here useful. If you have any questions reach out to @kjam or @JackieKazil on Twitter or GitHub. | ||
|
||
### Code Structure | ||
|
||
We've kept all of the code samples in folders separated by chapter and the data in a similar fashion. You'll likely want to 'undo' this work. It will be far more useful for you to have the data all in one folder so you can easily import and use. Remember, storing data in a repository is usually not a good idea; we've done it here so you can replicate the work you see in the book. | ||
|
||
### Code Examples | ||
|
||
We have not included every code sample you've found in the book, but we have included a majority of the finished scripts. Although these are included, we encourage you to write out each code sample on your own and use these only as a reference. | ||
|
||
We've also included some of the data investigation and IPython exploration used to first determine what to explore with the book. If you have any questions about the code you see in the book or the exploration conclusions, please reach out. Most of the exploration was performed using an older version of some of the libraries, so it might not work without modification. | ||
|
||
#### Scraping Data Folders | ||
|
||
In case the web pages the book uses change significantly, we've included copies of the web pages as they are now. You can use them and tell the scraping library to access them via a File URI (normally `file://file_name.html`). | ||
|
||
* Note: this has already occured with the Fairphone page. Please see `data/chp11/fairphone.html` to see the old page as it was in the book | ||
|
||
|
||
### Firefox Issues | ||
|
||
Depending on your version of Firefox and Selenium, you may run into JavaScript errors. Here are some fixes: | ||
* Use an older version of Firefox | ||
* Upgrade Selenium to >=3.0.2 and download the [geckodriver](https://github.com/mozilla/geckodriver/releases). Make sure the geckodriver is findable by your PATH variable. You can do this by adding this line to your `.bashrc` or `.bash_profile`. (Wondering what these are? Please read the Appendix C on learning the command line). | ||
* Use [PhantomJS](http://phantomjs.org/) with Selenium (change your browser line to `webdriver.PhantomJS('path/to/your/phantomjs/installation')`) | ||
* Use Chrome, InternetExplorer or any other [supported browser](http://www.seleniumhq.org/about/platforms.jsp) | ||
|
||
Feel free to reach out if you have any questions! | ||
|
||
### Corrections? | ||
|
||
If you find any issues in these code examples, feel free to submit an Issue or Pull Request. We appreciate your input! | ||
|
||
### Questions? | ||
|
||
Reach out to @kjam and @JackieKazil on Twitter or GitHub. @kjam is also often on freenode. :) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
import json | ||
|
||
# NOTE: you will need the 'ranked' table we first created in Chapter 9. | ||
|
||
country_codes = json.loads(open('../../data/chp10/iso-2-cleaned.json', 'rb').read()) | ||
country_dict = {} | ||
|
||
for c in country_codes: | ||
country_dict[c.get('name')] = c.get('alpha-2') | ||
|
||
def get_country_code(row): | ||
return country_dict.get(row['Countries and areas']) | ||
|
||
ranked = ranked.compute([(agate.Formula(text_type, get_country_code), | ||
'country_code')]) | ||
|
||
for r in ranked.where(lambda x: x.get('country_code') is None).rows: | ||
print r['Countries and areas'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
from bokeh.plotting import figure, show, output_file | ||
|
||
# NOTE: You'll need to have 'africa_cpi_cl' table from Chapter 9 to use this | ||
# code. | ||
|
||
|
||
def scatter_point(chart, x, y, marker_type): | ||
chart.scatter(x, y, marker=marker_type, line_color="#6666ee", | ||
fill_color="#ee6666", fill_alpha=0.7, size=10) | ||
|
||
chart = figure(title="Perceived Corruption and Child Labor in Africa") | ||
output_file("scatter_plot.html") | ||
|
||
for row in africa_cpi_cl.rows: | ||
scatter_point(chart, float(row['CPI 2013 Score']), | ||
float(row['Total (%)']), 'circle') | ||
|
||
show(chart) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
from bokeh.plotting import ColumnDataSource, figure, show, output_file | ||
from bokeh.models import HoverTool | ||
|
||
# NOTE: For this chart, you will also need the 'africa_cpi_cl' table from | ||
# Chapter 9. | ||
|
||
|
||
TOOLS = "pan,reset,hover" | ||
|
||
|
||
def scatter_point(chart, x, y, source, marker_type): | ||
chart.scatter(x, y, source=source, marker=marker_type, | ||
line_color="#6666ee", fill_color="#ee6666", | ||
fill_alpha=0.7, size=10) | ||
|
||
chart = figure(title="Perceived Corruption and Child Labor in Africa", | ||
tools=TOOLS) | ||
|
||
output_file("scatter_int_plot.html") | ||
|
||
for row in africa_cpi_cl.rows: | ||
column_source = ColumnDataSource( | ||
data={'country': [row['Country / Territory']]}) | ||
scatter_point(chart, float(row['CPI 2013 Score']), | ||
float(row['Total (%)']), column_source, 'circle') | ||
|
||
hover = chart.select(dict(type=HoverTool)) | ||
hover.tooltips = [ | ||
("Country", "@country"), | ||
("CPI Score", "$x"), | ||
("Child Labor (%)", "$y"), | ||
] | ||
|
||
show(chart) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
import matplotlib.pyplot as plt | ||
|
||
# NOTE: You'll need to have the 'africa_cpi_cl' table and 'highest_cpi_cl' | ||
# table we worked on in Chapter 9. | ||
|
||
plt.plot(africa_cpi_cl.columns['CPI 2013 Score'], | ||
africa_cpi_cl.columns['Total (%)']) | ||
plt.xlabel('CPI Score - 2013') | ||
plt.ylabel('Child Labor Percentage') | ||
plt.title('CPI & Child Labor Correlation') | ||
plt.show() | ||
|
||
|
||
plt.plot(highest_cpi_cl.columns['CPI 2013 Score'], | ||
highest_cpi_cl.columns['Total (%)']) | ||
plt.xlabel('CPI Score - 2013') | ||
plt.ylabel('Child Labor Percentage') | ||
plt.title('CPI & Child Labor Correlation') | ||
plt.show() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
import pygal | ||
|
||
# NOTE: you'll need the 'ranked' table from Chp 9 with the ISO codes added | ||
# (see: add_iso_data.py) | ||
|
||
worldmap_chart = pygal.Worldmap() | ||
worldmap_chart.title = 'Child Labor Worldwide' | ||
|
||
cl_dict = {} | ||
for r in ranked.rows: | ||
cl_dict[r.get('country_code_complete').lower()] = r.get('Total (%)') | ||
|
||
worldmap_chart.add('Total Child Labor (%)', cl_dict) | ||
worldmap_chart.render() |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.