-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#24465] Fix Usearch / Hnswlib precision discrepancy
Summary: Fixing some inconsistencies in index parameters that are causing a discrepancy between Usearch and Hnswlib performance: - Correctly specifying connectivity for hnswlib as num_neighbors_per_vertex instead of max_neighbors_per_vertex. - Passing the ef option into hnswlib configuration. Adding internal statistics introspection to Usearch and Hnswlib index wrappers. PR for hnswlib changes: nmslib/hnswlib#594. PR for usearch changes: unum-cloud/usearch#508 Also allow specifying multiple values of k to pass in as input, as long as they are not greater than the precomputed ground truth result list size. Updating hnsw_tool to always convert uint8_t coordinates to float32 when using Hnswlib to have a fair comparison with Usearch on the SIFT1B dataset. Usearch does not currently support the uint8_t type natively. The changes to src/inline-thirdparty will be pushed as separate commits generated by `build-support/thirdparty_tool --sync-inline-thirdparty`. Test Plan: Jenkins Manual testing using hnsw_tool - hnswlib: https://gist.githubusercontent.com/mbautin/d21580dcac0b51ad2d7bc9fc130c5f9e/raw ``` Hnswlib index with 5 levels max_elements: 1000000 M: 16 maxM: 16 maxM0: 32 ef_construction: 128 ef: 10 mult: 0.360674 Level 0: 1000000 nodes, 21613828 edges, 21.61 average edges per node Level 1: 62323 nodes, 885027 edges, 14.20 average edges per node Level 2: 3855 nodes, 50515 edges, 13.10 average edges per node Level 3: 238 nodes, 2543 edges, 10.68 average edges per node Level 4: 17 nodes, 244 edges, 14.35 average edges per node Totals: 1066433 nodes, 22552157 edges, 21.15 average edges per node i-recall @ 50, i=1..10: 1-recall @ 50: 0.9695000052 2-recall @ 50: 0.9645000100 3-recall @ 50: 0.9604333043 4-recall @ 50: 0.9568499923 5-recall @ 50: 0.9541400075 6-recall @ 50: 0.9504333138 7-recall @ 50: 0.9467428327 8-recall @ 50: 0.9435999990 9-recall @ 50: 0.9406333566 10-recall @ 50: 0.9377999902 ``` - usearch: https://gist.githubusercontent.com/mbautin/74948b310780562e74831eb29e43cb13/raw ``` Usearch index with 4 levels connectivity: 16 connectivity_base: 32 expansion_add: 128 expansion_search: 10 inverse_log_connectivity: 0.360674 Level 0: 1000000 nodes, 20973352 edges, 20.97 average edges per node Level 1: 64036 nodes, 890428 edges, 13.91 average edges per node Level 2: 5090 nodes, 66295 edges, 13.02 average edges per node Level 3: 481 nodes, 5304 edges, 11.03 average edges per node Totals: 1069607 nodes, 21935379 edges, 20.51 average edges per node i-recall@50, i=1..10: 1-recall @ 40: 0.9305999875 2-recall @ 40: 0.9201999903 3-recall @ 40: 0.9141333103 4-recall @ 40: 0.9085000157 5-recall @ 40: 0.9036399722 6-recall @ 40: 0.8987166882 7-recall @ 40: 0.8932142854 8-recall @ 40: 0.8890249729 9-recall @ 40: 0.8852999806 10-recall @ 40: 0.8813199997 ``` Reviewers: sergei, aleksandr.ponomarenko Reviewed By: sergei, aleksandr.ponomarenko Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D38977
- Loading branch information
Showing
17 changed files
with
608 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.