This library allows to clusterize a dataset of purely categorical data by using as a measure of fitness the Category Utility.
You can read more about this value on wikipedia. In a nutshell, it tries to maximize the similarity between items in the same cluster, and minimize the one between different clusters.
The function clusterize
accepts up to 4 parameters and returns an array of arrays of Point
:
The array of objects with categorical values to be clusterized. It won't be modified.
The desired number of clusters. They all will start with one item each.
You can set the weight for each object key, so that while clustering you'll be able to force the function to rely more on some values than others.
Just like setting the weights for those values to 0, it will not use those object keys while clusterizing.
import { clusterize } from 'category-utility'
const data = [
{ id: '1', size: 'small', color: 'red' },
{ id: '2', size: 'medium', color: 'blue' },
{ id: '3', size: 'large', color: 'red' },
{ id: '4', size: 'medium', color: 'red' },
{ id: '5', size: 'medium', color: 'red' },
{ id: '6', size: 'medium', color: 'blue' },
{ id: '7', size: 'small', color: 'blue' },
]
const weights = { size: 0.5, color: 2 }
const filteredValues = ['id']
const clusters = clusterize(data, 2, weights, filteredValues)
// [ [ { id: '1', size: 'small', color: 'red' },
// { id: '3', size: 'large', color: 'red' },
// { id: '4', size: 'medium', color: 'red' },
// { id: '5', size: 'medium', color: 'red' } ],
// [ { id: '2', size: 'medium', color: 'blue' },
// { id: '6', size: 'medium', color: 'blue' },
// { id: '7', size: 'small', color: 'blue' } ] ]