Skip to content

An implementation and visualization of the k-means and k-means++ clustering algorithms.

Notifications You must be signed in to change notification settings

ajcheon/kmeans-exploration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

kmeans-exploration

An implementation and visualization of the k-means and k-means++ clustering algorithms.

Clusters

The problem of clustering comes up often in machine learaning, where we want to assign a category or group to a set of data points that are similar. If we imagine the x-y plane, with points scattered around, it might be easy to visually group these points into a number of clusters based on proximity; points that are close to each other are assigned the same cluster. Similar, in 3D space, points -- or in this case, vectors -- that are close to each other should belong together.

While our eyes can cluster points in 2D or even 3D space with relative ease, what happens when we have a large number of points, say, a few million, or if the "points" we are working with are really p-dimensional vectors? We need an algorithm that can efficiently cluster points for us, regardless of scale or dimension.

The k-means algorithm is one solution to the clustering problem: Given n p-dimensional vectors, assign each vector to one of k clusters based on Euclidean distance. Each cluster has a centroid, which is simply the mean vector of that cluster.

See notebook for a deep dive into the k-means clustering algorithm.

About

An implementation and visualization of the k-means and k-means++ clustering algorithms.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published