GitHub - 0xmikelee/Same-Size-K-Means-Clustering: This is a variation of the k-means clustering that produces equally sized clusters.

Same Size Clustering

This is a variation of the k-means clustering that produces equally sized clusters.

The algorithm consists of two phases:

#1. Initialization:

Compute the desired cluster size: n/k

Initialize means with k-means

Order points by the biggest benefit (delta distance) of best over worst assignment

Assign points to their prefered cluster until the cluster is full, then resort remaining
objects by excluding the clusters that are full.

#2. Refinement of clustering: This is done in an iterative fashion, until there is no change in clustering or the max number of iterations have been reached.

Interation:

    Compute current cluster means.

    For each object, compute the distance to the cluster means.

    Sort elements based on the delta of the current assignment and the alternative best 
    possible assignment, sort in descending order.

    For each element by priority:

        For each cluster to whom it doesn't belong, by element gain:

            if there is an element on the transfer list of the cluster, and swapping the 
            two element yields improvement, swap the two elements;

        End For

        If the element is not swapped, add to outgoing transfer list

    End For

    If no transfers were done, or max iteration was reached, terminate.

Since any transfer must decrease variance, thus the clustering will converge.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
sscluster.py		sscluster.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Same Size Clustering

About

Releases

Packages

Languages

0xmikelee/Same-Size-K-Means-Clustering

Folders and files

Latest commit

History

Repository files navigation

Same Size Clustering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages