Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
gboeing committed Apr 3, 2024
1 parent 9685ffc commit 67d4f32
Showing 1 changed file with 19 additions and 8 deletions.
27 changes: 19 additions & 8 deletions modules/12-unsupervised-learning/lecture.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@
"source": [
"## 1. Linear discriminant analysis\n",
"\n",
"Dimensionality reduction lets us reduce the number of features (variables) in our data set with minimal loss of information. This data compression is called **feature extraction**. Feature extraction is similar to feature selection in that they both reduce the total number of variables in your analysis. In feature selection, we use domain theory or an algorithm to select important variables for our model. Feature extraction instead projects your features onto a lower-dimension space, creating new features rather than just selecting a subset of existing ones.\n",
"Dimensionality reduction lets us reduce the number of features (variables) in our data set with minimal loss of information. This data compression is called **feature extraction**. Feature extraction is similar to feature selection in that they both reduce the total number of variables in your analysis. In feature selection, we use domain theory or an algorithm to select important variables for our model. Feature extraction instead projects your features onto a lower-dimension space, creating wholly new features rather than just selecting a subset of existing ones.\n",
"\n",
"LDA is *supervised* dimensionality reduction, providing a link between supervised learning and dimensionality reduction. It uses a categorical response and continuous features to identify features that account for the most variance between classes (ie, maximum separability). It can be used as a classifier, similar to what we saw last week, or it can be used for dimensionality reduction by projecting the features in the most discriminative directions.\n",
"\n",
Expand Down Expand Up @@ -157,7 +157,7 @@
"metadata": {},
"outputs": [],
"source": [
"# reduce data from n dimensions to 2\n",
"# reduce data from original n dimensions to 2\n",
"lda = LinearDiscriminantAnalysis(n_components=2)\n",
"X_reduced = lda.fit_transform(X, y)\n",
"X_reduced.shape"
Expand All @@ -170,6 +170,7 @@
"metadata": {},
"outputs": [],
"source": [
"# scatter plot the 2 new dimensions\n",
"fig, ax = plt.subplots(figsize=(6, 6))\n",
"for county_name in data[\"county_name\"].unique():\n",
" mask = y == county_name\n",
Expand Down Expand Up @@ -234,9 +235,17 @@
"\n",
"PCA is used 1) to fix multicollinearity problems and 2) for dimensionality reduction. In the former, it converts a set of original, correlated features into a new set of orthogonal features, which is useful in regression and cluster analysis. In the latter, it summarizes a set of original, correlated features with a smaller number of features that still explain most of the variance in your data (data compression).\n",
"\n",
"PCA identifies the combinations of features (directions in feature space) that account for the most variance in the dataset. These orthogonal axes of maximum variance are called principal components. A **principal component** is an eigenvector (direction of maximum variance) of the features' covariance matrix, and the corresponding eigenvalue is its magnitude (factor by which it is \"stretched\"). An eigenvector is the cosine of the angle between a feature and a component. Its corresponding eigenvalue represents the share of variance it accounts for. PCA takes your (standardized) features' covariance matrix, decomposes it into its eigenvectors/eigenvalues, sorts them by eigenvalue magnitude, constructs a projection matrix $W_k$ from the corresponding top $k$ eigenvectors, then transforms the features using the projection matrix to get the new $k$-dimensional feature subspace. Always standardize your data before PCA because it is sensitive to features' scale.\n",
"PCA identifies the combinations of features (directions in feature space) that account for the most variance in the dataset. These orthogonal axes of maximum variance are called principal components. A **principal component** is an eigenvector (direction of maximum variance) of the features' covariance matrix, and the corresponding eigenvalue is its magnitude (factor by which it is \"stretched\"). An eigenvector is the cosine of the angle between a feature and a component. Its corresponding eigenvalue represents the share of variance it accounts for. Always standardize your data before PCA because it is sensitive to features' scale.\n",
"\n",
"We will reduce our feature set to fewer dimensions."
"How does PCA work? It...\n",
"\n",
"- calculates your (standardized) features' covariance matrix\n",
"- decomposes it into its eigenvectors/eigenvalues\n",
"- sorts them by eigenvalue magnitude\n",
"- constructs a projection matrix $W_k$ from the corresponding top $k$ eigenvectors\n",
"- transforms the features using the projection matrix to get the new $k$-dimensional feature subspace\n",
"\n",
"Let's practice reducing our feature set to fewer dimensions with PCA."
]
},
{
Expand Down Expand Up @@ -400,7 +409,9 @@
"id": "improving-thanksgiving",
"metadata": {},
"source": [
"We often refer to these projected data as \"principal component scores\" or a \"score matrix\", $T_k$, where $T_k = XW_k$ and $X$ is your original feature matrix and $W_k$ is the projection matrix, that is, a matrix containing the first $k$ principal components (ie, the $k$ eigenvectors with the largest corresponding eigenvalues). In our case, $k=2$. We can calculate this manually:"
"We often refer to these projected data as \"principal component scores\" or a \"score matrix\", $T_k$, where $T_k = XW_k$ and $X$ is your original feature matrix and $W_k$ is the projection matrix, that is, a matrix containing the first $k$ principal components (ie, the $k$ eigenvectors with the largest corresponding eigenvalues). In our case, $k=2$.\n",
"\n",
"We can calculate this manually:"
]
},
{
Expand Down Expand Up @@ -565,7 +576,7 @@
"outputs": [],
"source": [
"# cluster the data\n",
"km = KMeans(n_clusters=5).fit(X_reduced)"
"km = KMeans(n_clusters=5, n_init=\"auto\").fit(X_reduced)"
]
},
{
Expand Down Expand Up @@ -646,14 +657,14 @@
"metadata": {},
"outputs": [],
"source": [
"# create an elbow plot\n",
"# create an elbow plot: distortion vs cluster count\n",
"fig, ax = plt.subplots()\n",
"ax.set_xlabel(\"Number of clusters\")\n",
"ax.set_ylabel(\"Distortion\")\n",
"kvals = range(1, 15)\n",
"distortions = []\n",
"for k in kvals:\n",
" km = KMeans(n_clusters=k).fit(X_reduced)\n",
" km = KMeans(n_clusters=k, n_init=\"auto\").fit(X_reduced)\n",
" distortions.append(km.inertia_)\n",
"ax.plot(kvals, distortions, marker=\"o\")\n",
"_ = ax.grid(True)"
Expand Down

0 comments on commit 67d4f32

Please sign in to comment.