diff --git a/NEWS.md b/NEWS.md index f0a3308f..a7ed1fae 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,6 +1,6 @@ # uwot 0.0.0.9007 -## New features +## New features (December 9 2018) * New parameter `pca`: set this to a positive integer to reduce matrix of data frames to that number of columns using PCA. Only works if @@ -8,6 +8,12 @@ data frames to that number of columns using PCA. Only works if improve the speed of the nearest neighbor search. t-SNE implementations often set this value to 50. +## Bug fixes and minor improvements + +* Laplacian Eigenmap initialization convergence failure is now correctly +detected. +* C++ code was over-writing data passed from R as a function argument. + # uwot 0.0.0.9006 (December 5 2018) ## New features diff --git a/README.md b/README.md index 4f243932..356e7e9f 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,13 @@ the basic method. Translated from the ## News +*December 9 2018*. Added a `pca` argument that will reduce `X` to the specified +number of dimensions (e.g. 50, commonly used in t-SNE routines). This should +give a big speed up to the nearest neighbor search if you are using Euclidean +distance metric and you have lots of features (where lots might be as little as +100-1000), for instance +[COIL-100](http://www.cs.columbia.edu/CAVE/software/softlib/coil-100.php). + *December 5 2018*. Some deeply experimental mixed data type features are now available: you can now mix different metrics (e.g. euclidean for some columns and cosine for others). The type of data that can be used with `y` @@ -89,6 +96,10 @@ iris_umap <- umap(iris, metric = list("euclidean" = c("Sepal.Length", "Sepal.Wid iris_umap <- umap(iris, metric = list("euclidean" = c("Sepal.Length", "Sepal.Width"), "euclidean" = c("Petal.Length", "Petal.Width"), "categorical" = "Species")) + +# MNIST with PCA reduction to 50 dimensions can speed up calculation without +# affecting results much +mnist_umap <- umap(mnist, pca = 50) ``` ## Documentation