Hyperparameters optimisation #65

wiktorolszowy · 2024-10-02T09:51:28Z

Hi! Thanks a lot for this package. I am interested in how best to choose values of the hyperparameters. There are five of them that seem particularly relevant:

d: the number of hash functions, used to initialize the LSH forest data structure, by default 128.
l: the number of prefix trees, used to initialize the LSH forest data structure, by default 8.
k: the number of nearest neighbors used to create the k-nearest neighbor graph, by default 10.
$k_c$: the scalar by which k is multiplied before querying the LSH forest, by default 10.
p: the size of the nodes, which affects the magnitude of their repelling force, by default 1/65.

The first two parameters are from tmap.LSHForest and their default values are defined here. The remaining parameters are from tmap.layout_from_lsh_forest and their default values are defined here.

From the supplement (https://ndownloader.figstatic.com/files/21710592) it seems that p is particularly important (cf. figures S1+S2+S3+S7). I often see tmap visualizations that are too sparse, in particular that some branches are very long and that some branches are very short (e.g., with the leaves). The paper and the corresponding analysis of the hyperparameters are already from 4 years ago. I am wondering whether there is someone who has used this tool extensively, who has experimented with these hyperparameters, and who maybe has developed some rules of thumb how to optimize these hyperparameters, especially p, for example dependent on the number of data points, and maybe also dependent on the approximate number of suspected clusters.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameters optimisation #65

Hyperparameters optimisation #65

wiktorolszowy commented Oct 2, 2024 •

edited

Loading

Hyperparameters optimisation #65

Hyperparameters optimisation #65

Comments

wiktorolszowy commented Oct 2, 2024 • edited Loading

wiktorolszowy commented Oct 2, 2024 •

edited

Loading