Skip to content

Commit

Permalink
Added some explanations
Browse files Browse the repository at this point in the history
  • Loading branch information
jessegeerts committed Apr 20, 2018
1 parent 7400f80 commit ec199ef
Show file tree
Hide file tree
Showing 2 changed files with 186 additions and 47 deletions.
136 changes: 112 additions & 24 deletions .ipynb_checkpoints/python_stats_intro-checkpoint.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas is a Python package for easy to use data structures and analysis tools. "
"Pandas is a Python package for easy to use data structures and analysis tools. The main tool it uses is the pandas DataFrame, which is very similar to R's data.frame and ideal for data exploration"
]
},
{
Expand All @@ -33,7 +33,7 @@
"metadata": {},
"outputs": [],
"source": [
"# IQ and brain size data\n",
"# Load in a dataset that measured participants' IQ and brain size, among some other characteristics\n",
"data = pd.read_csv('data/brain_size.csv', sep=';', na_values='.')"
]
},
Expand All @@ -43,6 +43,7 @@
"metadata": {},
"outputs": [],
"source": [
"# The head() function allows you to inspect the first few entries in your dataframe. \n",
"data.head()"
]
},
Expand Down Expand Up @@ -75,13 +76,6 @@
"females.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -102,6 +96,8 @@
"metadata": {},
"outputs": [],
"source": [
"# The groupby method allows you to extract characteristics grouped by categorical variables. For example: the mean\n",
"# of all continuous variables grouped by gender:\n",
"data.groupby('Gender').mean()"
]
},
Expand Down Expand Up @@ -253,6 +249,13 @@
"stats.ttest_rel(data['FSIQ'], data['PIQ'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Wilcoxon sign test signed rank test is a close sibling of the dependent samples t-test. Because the dependent samples t-tests analyzes if the average difference of two repeated measures is zero; it requires metric (interval or ratio) and normally distributed data; the Wilcoxon sign test uses ranked or ordinal data. Thus it is a common alternative to the dependent samples t-test when its assumptions are not met."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -280,13 +283,6 @@
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Wilcoxon sign test signed rank test is a close sibling of the dependent samples t-test. Because the dependent samples t-tests analyzes if the average difference of two repeated measures is zero; it requires metric (interval or ratio) and normally distributed data; the Wilcoxon sign test uses ranked or ordinal data. Thus it is a common alternative to the dependent samples t-test when its assumptions are not met."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -419,19 +415,14 @@
"iris_data.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# the plotting.scatter_matrix method allows you to plot the different categories in your data as different colours\n",
"# using the pandas.Categorical class as an entry in the 'color' keyword argument. \n",
"categories = pd.Categorical(iris_data['name'])\n",
"categories"
]
Expand All @@ -442,6 +433,7 @@
"metadata": {},
"outputs": [],
"source": [
"# That way, we can plot our variables in separate colours for the different flower types\n",
"plotting.scatter_matrix(iris_data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']], c=categories.labels)\n",
"plt.show()"
]
Expand All @@ -452,6 +444,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Statsmodels allows you to define a multiple regression model with R syntax like this:\n",
"model = ols('sepal_width ~ name + petal_length + sepal_length', iris_data).fit()"
]
},
Expand Down Expand Up @@ -488,6 +481,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Testing for interactions is as simple as using the multiplication symbol in defining your model\n",
"# This way, it will test for main effects and interaction. \n",
"model = ols('sepal_width ~ name + petal_length * petal_width', iris_data).fit()"
]
},
Expand Down Expand Up @@ -1173,13 +1168,57 @@
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Converting variables from python to R"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# "
"# Author: Charly\n",
"\n",
"from rpy2.robjects.vectors import Matrix, Array, DataFrame, FloatVector, IntVector, StrVector, ListVector\n",
"import numpy as np\n",
"from pandas import DataFrame as PdDF\n",
"from collections import OrderedDict\n",
"known_r_types = Matrix, Array, DataFrame, FloatVector, IntVector, StrVector, ListVector\n",
"\n",
"python_to_r_types = {\n",
" 'list': (StrVector, ),\n",
" 'dict': (ListVector, ),\n",
" 'np_array': (FloatVector, IntVector, Array, Matrix),\n",
" 'pandas_df': (DataFrame, )\n",
"}\n",
"def recursive_r_to_py(data):\n",
" \"\"\"\n",
" The recursive function to convert from rpy2 objects to native python\n",
" \"\"\"\n",
"\n",
" dtype = type(data)\n",
" if dtype in python_to_r_types['dict']:\n",
" return OrderedDict(zip(data.names, [recursive_r_to_py(d) for d in data]))\n",
" elif dtype in python_to_r_types['list']:\n",
" return [recursive_r_to_py(d) for d in data]\n",
" elif dtype in python_to_r_types['np_array']:\n",
" array = np.array(data)\n",
" if array.size == 1:\n",
" return array[0]\n",
" else:\n",
" return array\n",
" elif dtype in python_to_r_types['pandas']:\n",
" return PdDF(data)\n",
" else:\n",
" if is_r_type(data): # An unknown r class\n",
" raise NotImplementedError('Could not proceed, type {} is not defined.'\n",
" 'Recognised types are: {}'. format(dtype, known_r_types))\n",
" else:\n",
" return data # We reached the end of recursion"
]
},
{
Expand All @@ -1189,6 +1228,55 @@
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
Loading

0 comments on commit ec199ef

Please sign in to comment.