You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Thanks a lot for this package. I am interested in how best to choose values of the hyperparameters. There are five of them that seem particularly relevant:
d: the number of hash functions, used to initialize the LSH forest data structure, by default 128.
l: the number of prefix trees, used to initialize the LSH forest data structure, by default 8.
k: the number of nearest neighbors used to create the k-nearest neighbor graph, by default 10.
$k_c$: the scalar by which k is multiplied before querying the LSH forest, by default 10.
p: the size of the nodes, which affects the magnitude of their repelling force, by default 1/65.
The first two parameters are from tmap.LSHForest and their default values are defined here. The remaining parameters are from tmap.layout_from_lsh_forest and their default values are defined here.
From the supplement (https://ndownloader.figstatic.com/files/21710592) it seems that p is particularly important (cf. figures S1+S2+S3+S7). I often see tmap visualizations that are too sparse, in particular that some branches are very long and that some branches are very short (e.g., with the leaves). The paper and the corresponding analysis of the hyperparameters are already from 4 years ago. I am wondering whether there is someone who has used this tool extensively, who has experimented with these hyperparameters, and who maybe has developed some rules of thumb how to optimize these hyperparameters, especially p, for example dependent on the number of data points, and maybe also dependent on the approximate number of suspected clusters.
The text was updated successfully, but these errors were encountered:
Hi! Thanks a lot for this package. I am interested in how best to choose values of the hyperparameters. There are five of them that seem particularly relevant:
The first two parameters are from
tmap.LSHForest
and their default values are defined here. The remaining parameters are fromtmap.layout_from_lsh_forest
and their default values are defined here.From the supplement (https://ndownloader.figstatic.com/files/21710592) it seems that p is particularly important (cf. figures S1+S2+S3+S7). I often see tmap visualizations that are too sparse, in particular that some branches are very long and that some branches are very short (e.g., with the leaves). The paper and the corresponding analysis of the hyperparameters are already from 4 years ago. I am wondering whether there is someone who has used this tool extensively, who has experimented with these hyperparameters, and who maybe has developed some rules of thumb how to optimize these hyperparameters, especially p, for example dependent on the number of data points, and maybe also dependent on the approximate number of suspected clusters.
The text was updated successfully, but these errors were encountered: