Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applied-Data-Science-with-Python #1

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
ecdd653
add slides
MaxPoon Mar 12, 2017
ff0b189
make directories
MaxPoon Mar 12, 2017
508992d
add assignment
MaxPoon Mar 12, 2017
0d9d64a
add slides
MaxPoon Mar 12, 2017
0c3fb36
add demo
MaxPoon Mar 14, 2017
faed386
add assignment solution
MaxPoon Mar 15, 2017
34768b8
add demo
MaxPoon Mar 16, 2017
aa42428
add practice assignment
MaxPoon Mar 17, 2017
7b552b0
add assignment code
MaxPoon Mar 17, 2017
5f6c2ea
add demo
MaxPoon Mar 18, 2017
d18f2f4
add slide
MaxPoon Mar 18, 2017
988935f
add assignment codes
MaxPoon Mar 18, 2017
6d1fc03
add certificate
MaxPoon Mar 19, 2017
14cb176
add ipython module
MaxPoon Jun 11, 2017
cf560c9
run the module
MaxPoon Jun 12, 2017
d593888
finish assignment
MaxPoon Jun 12, 2017
b923b74
add modules
MaxPoon Jun 13, 2017
4d19311
finish assignment
MaxPoon Jun 13, 2017
99e1c8d
add module
MaxPoon Jun 14, 2017
5c01415
finish assignment
MaxPoon Jun 14, 2017
0ca6e6c
add module
MaxPoon Jun 15, 2017
47075b2
add module
MaxPoon Jun 15, 2017
bf917e0
finish assignment
MaxPoon Jun 15, 2017
53b81da
add certificate
MaxPoon Jun 15, 2017
6f505c8
add week1 practice file
MaxPoon Oct 20, 2017
15dea51
add practice
MaxPoon Oct 20, 2017
9d4fc0f
borrow solutions...
MaxPoon Oct 20, 2017
1767e8c
add ipython file
MaxPoon Oct 21, 2017
b866b0c
finish assignment2
MaxPoon Oct 21, 2017
b540b68
add practice file
MaxPoon Oct 21, 2017
10a3b91
finish assignment3
MaxPoon Oct 21, 2017
dace701
add slides
MaxPoon Oct 21, 2017
ebeb783
finish assignment4
MaxPoon Oct 21, 2017
ea5edd7
add certificate
MaxPoon Oct 21, 2017
da24a96
add slides
MaxPoon Oct 24, 2017
d4e4df5
add practice file
MaxPoon Oct 25, 2017
4523f37
add solution to assignment1
MaxPoon Oct 25, 2017
7f45d1a
add slides
MaxPoon Oct 25, 2017
627018d
add practice file
MaxPoon Oct 25, 2017
02e08b3
add solutions to assignment2
MaxPoon Oct 25, 2017
a6a6969
add slides
MaxPoon Oct 25, 2017
204c397
add solution to assignment3
MaxPoon Oct 25, 2017
281ee03
add slides
MaxPoon Oct 26, 2017
ff01784
add practice file
MaxPoon Oct 26, 2017
fa618f8
add solution to assignment4
MaxPoon Oct 26, 2017
a8cc150
add certificate
MaxPoon Oct 26, 2017
68fb869
add certificate
MaxPoon Oct 26, 2017
d202fd1
Delete jupyter_notebooks.txt
MaxPoon Oct 27, 2017
8a4eb0e
Update Assignment2.ipynb
Vipul115 Nov 23, 2017
4c968b4
Merge pull request #4 from Vipul115/patch-1
MaxPoon Nov 24, 2017
c34860e
update solution
MaxPoon Dec 19, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
5,254 changes: 5,254 additions & 0 deletions Applied-Machine-Learning-In-Python/week1/Assignment1.ipynb

Large diffs are not rendered by default.

4,444 changes: 4,444 additions & 0 deletions Applied-Machine-Learning-In-Python/week1/Module1.ipynb

Large diffs are not rendered by default.

2,212 changes: 2,212 additions & 0 deletions Applied-Machine-Learning-In-Python/week2/Assignment2.ipynb

Large diffs are not rendered by default.

9,027 changes: 9,027 additions & 0 deletions Applied-Machine-Learning-In-Python/week2/ClassifierVisualization.ipynb

Large diffs are not rendered by default.

21,408 changes: 21,408 additions & 0 deletions Applied-Machine-Learning-In-Python/week2/Module2.ipynb

Large diffs are not rendered by default.

2,821 changes: 2,821 additions & 0 deletions Applied-Machine-Learning-In-Python/week3/Assignment3.ipynb

Large diffs are not rendered by default.

8,374 changes: 8,374 additions & 0 deletions Applied-Machine-Learning-In-Python/week3/Module3.ipynb

Large diffs are not rendered by default.

2,017 changes: 2,017 additions & 0 deletions Applied-Machine-Learning-In-Python/week4/Assignment4.ipynb

Large diffs are not rendered by default.

13,386 changes: 13,386 additions & 0 deletions Applied-Machine-Learning-In-Python/week4/Module4.ipynb

Large diffs are not rendered by default.

10,831 changes: 10,831 additions & 0 deletions Applied-Machine-Learning-In-Python/week4/UnsupervisedLearning.ipynb

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@

# coding: utf-8

# # Assignment 2
#
# Before working on this assignment please read these instructions fully. In the submission area, you will notice that you can click the link to **Preview the Grading** for each step of the assignment. This is the criteria that will be used for peer grading. Please familiarize yourself with the criteria before beginning the assignment.
#
# An NOAA dataset has been stored in the file `data/C2A2_data/BinnedCsvs_d100/4e86d2106d0566c6ad9843d882e72791333b08be3d647dcae4f4b110.csv`. The data for this assignment comes from a subset of The National Centers for Environmental Information (NCEI) [Daily Global Historical Climatology Network](https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt) (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe.
#
# Each row in the assignment datafile corresponds to a single observation.
#
# The following variables are provided to you:
#
# * **id** : station identification code
# * **date** : date in YYYY-MM-DD format (e.g. 2012-01-24 = January 24, 2012)
# * **element** : indicator of element type
# * TMAX : Maximum temperature (tenths of degrees C)
# * TMIN : Minimum temperature (tenths of degrees C)
# * **value** : data value for element (tenths of degrees C)
#
# For this assignment, you must:
#
# 1. Read the documentation and familiarize yourself with the dataset, then write some python code which returns a line graph of the record high and record low temperatures by day of the year over the period 2005-2014. The area between the record high and record low temperatures for each day should be shaded.
# 2. Overlay a scatter of the 2015 data for any points (highs and lows) for which the ten year record (2005-2014) record high or record low was broken in 2015.
# 3. Watch out for leap days (i.e. February 29th), it is reasonable to remove these points from the dataset for the purpose of this visualization.
# 4. Make the visual nice! Leverage principles from the first module in this course when developing your solution. Consider issues such as legends, labels, and chart junk.
#
# The data you have been given is near **None, None, Singapore**, and the stations the data comes from are shown on the map below.

# In[1]:

import matplotlib.pyplot as plt
import mplleaflet
import pandas as pd

def leaflet_plot_stations(binsize, hashid):

df = pd.read_csv('data/C2A2_data/BinSize_d{}.csv'.format(binsize))

station_locations_by_hash = df[df['hash'] == hashid]

lons = station_locations_by_hash['LONGITUDE'].tolist()
lats = station_locations_by_hash['LATITUDE'].tolist()

plt.figure(figsize=(8,8))

plt.scatter(lons, lats, c='r', alpha=0.7, s=200)

return mplleaflet.display()

leaflet_plot_stations(100,'4e86d2106d0566c6ad9843d882e72791333b08be3d647dcae4f4b110')


# In[2]:

df = pd.read_csv('data/C2A2_data/BinnedCsvs_d100/4e86d2106d0566c6ad9843d882e72791333b08be3d647dcae4f4b110.csv')


# In[3]:

df.sort(['ID','Date']).head()


# In[4]:

df['Year'], df['Month-Date'] = zip(*df['Date'].apply(lambda x: (x[:4], x[5:])))
df = df[df['Month-Date'] != '02-29']


# In[5]:

import numpy as np
temp_min = df[(df['Element'] == 'TMIN') & (df['Year'] != '2015')].groupby('Month-Date').aggregate({'Data_Value':np.min})
temp_max = df[(df['Element'] == 'TMAX') & (df['Year'] != '2015')].groupby('Month-Date').aggregate({'Data_Value':np.max})


# In[6]:

temp_min.head()


# In[7]:

temp_min_15 = df[(df['Element'] == 'TMIN') & (df['Year'] == '2015')].groupby('Month-Date').aggregate({'Data_Value':np.min})
temp_max_15 = df[(df['Element'] == 'TMAX') & (df['Year'] == '2015')].groupby('Month-Date').aggregate({'Data_Value':np.max})


# In[8]:

broken_min = np.where(temp_min_15['Data_Value'] < temp_min['Data_Value'])[0]
broken_max = np.where(temp_max_15['Data_Value'] > temp_max['Data_Value'])[0]


# In[9]:

broken_max, broken_min


# In[10]:

temp_min_15.head()


# In[11]:

plt.figure()
plt.plot(temp_min.values, 'b', label = 'record low')
plt.plot(temp_max.values, 'r', label = 'record high')
plt.scatter(broken_min, temp_min_15.iloc[broken_min], s = 10, c = 'g', label = 'broken low')
plt.scatter(broken_max, temp_max_15.iloc[broken_max], s = 10, c = 'm', label = 'broken high')
plt.gca().axis([-5, 370, -150, 650])
plt.xticks(range(0, len(temp_min), 20), temp_min.index[range(0, len(temp_min), 20)], rotation = '45')
plt.xlabel('Day of the Year')
plt.ylabel('Temperature (Tenths of Degrees C)')
plt.title('Temperature Summary Plot near Singapore')
plt.legend(loc = 4, frameon = False)
plt.gca().fill_between(range(len(temp_min)), temp_min['Data_Value'], temp_max['Data_Value'], facecolor = 'yellow', alpha = 0.5)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.show()

Loading