kmeans-exploration

An implementation and visualization of the k-means and k-means++ clustering algorithms.

The problem of clustering comes up often in machine learaning, where we want to assign a category or group to a set of data points that are similar. If we imagine the x-y plane, with points scattered around, it might be easy to visually group these points into a number of clusters based on proximity; points that are close to each other are assigned the same cluster. Similar, in 3D space, points -- or in this case, vectors -- that are close to each other should belong together.

While our eyes can cluster points in 2D or even 3D space with relative ease, what happens when we have a large number of points, say, a few million, or if the "points" we are working with are really p-dimensional vectors? We need an algorithm that can efficiently cluster points for us, regardless of scale or dimension.

The k-means algorithm is one solution to the clustering problem: Given n p-dimensional vectors, assign each vector to one of k clusters based on Euclidean distance. Each cluster has a centroid, which is simply the mean vector of that cluster.

See notebook for a deep dive into the k-means clustering algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
.gitignore		.gitignore
README.md		README.md
kmeans.ipynb		kmeans.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kmeans-exploration

About

Releases

Packages

Languages

ajcheon/kmeans-exploration

Folders and files

Latest commit

History

Repository files navigation

kmeans-exploration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages