Skip to content
This repository was archived by the owner on Jul 16, 2021. It is now read-only.

Initialize GMM parameters with k-means #150

Open
andrewcsmith opened this issue Oct 8, 2016 · 3 comments
Open

Initialize GMM parameters with k-means #150

andrewcsmith opened this issue Oct 8, 2016 · 3 comments

Comments

@andrewcsmith
Copy link
Contributor

The general practice seems to be to use GMM as an optimization of k-means. The initializer of GMM should therefore use k-means for the initial parameters, then GMM for fine-tuning.

@AtheMathmo
Copy link
Owner

Yes this is indeed standard. I had thought about implementing this before but I'm not sure how exactly it should look. Certainly I'd like GMM to be accessible without the K-Means initialization. I see two approaches:

  • Add some new initialization trait/enum similar to K-Means to GMM
  • Allow a GMM model to be created from_init_clusters.

I think that it might even be worth implementing both of these - with the second suggestion coming first. The second would let us use K-Means to initialize GMM.

And to give a little more information on the K-Means initialization. The idea (from my knowledge) is to use K-Means to determine the location of the clusters (their means) and the data that belongs to them. We then compute the covariance matrix within each cluster (similarly to what we do now for the whole data set). This lets us assume a sensible mixture of gaussians over the whole data set which can be fine tuned.


References

Simple Methods for Initializing the EM Algorithm of GMMs
Small (but englightening) SO question thread

@andrewcsmith
Copy link
Contributor Author

My opinion for the API here is basically the same as it is in #153. We should create a ClusterInitializer trait that has whatever methods we need, and then RandomInitializer and KMeansInitializer structs to do the proper calculations accordingly.

@andrewcsmith
Copy link
Contributor Author

Same here—this issue is also fixed by #155.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants