Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse RangeSearch/AnnIterator to return raw distance #944

Merged
merged 2 commits into from
Nov 14, 2024

Conversation

zhengbuqian
Copy link
Collaborator

instead of the quantized distance

/kind improvement

Copy link
Collaborator

@alexanderguzhva alexanderguzhva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is clear (return raw distances in case of the use of threshold for removing small values), but please consider altering names a bit in order to avoid the confusion while reading. I'd change quantization to some other name.

@@ -502,6 +515,11 @@ class InvertedIndex : public BaseInvertedIndex<T> {
return n_cols_internal();
}

[[nodiscard]] virtual bool
HasQuantization() const override {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to what I understand, this is not Quantization, this is Pruning or maybe Whitening. Correct me if I'm wrong, but this drop_during_build_ indicates whether a threshold is used for removing small values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to IsApproximated. wanted to express whether the index is accurate kNN or ANN so not choosing a name that reflects how we do the approximation.

@zhengbuqian zhengbuqian changed the title RangeSearch/AnnIterator to return raw distance sparse RangeSearch/AnnIterator to return raw distance Nov 14, 2024
@buqian-zilliz buqian-zilliz force-pushed the sparse-refined-iterator branch from c53e0d0 to 67dbe9b Compare November 14, 2024 03:01
…be the raw instead of the quantized distance

Signed-off-by: Buqian Zheng <[email protected]>
@buqian-zilliz buqian-zilliz force-pushed the sparse-refined-iterator branch from 037653d to b2e3baa Compare November 14, 2024 04:39
Copy link
Collaborator

@foxspy foxspy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: foxspy, zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.22%. Comparing base (3c46f4c) to head (b2e3baa).
Report is 248 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           main     #944       +/-   ##
=========================================
+ Coverage      0   74.22%   +74.22%     
=========================================
  Files         0       82       +82     
  Lines         0     6615     +6615     
=========================================
+ Hits          0     4910     +4910     
- Misses        0     1705     +1705     

see 82 files with indirect coverage changes

@mergify mergify bot added the ci-passed label Nov 14, 2024
@sre-ci-robot sre-ci-robot merged commit f8ab12c into zilliztech:main Nov 14, 2024
12 of 14 checks passed
@zhengbuqian zhengbuqian deleted the sparse-refined-iterator branch November 14, 2024 05:55
foxspy pushed a commit to foxspy/knowhere that referenced this pull request Nov 18, 2024
* sparse: make the distance returned by RangeSearch and AnnIterator to be the raw instead of the quantized distance

Signed-off-by: Buqian Zheng <[email protected]>

* sparse: remove mutex in the index: we now use CC index if concurrent read/write is needed

Signed-off-by: Buqian Zheng <[email protected]>

---------

Signed-off-by: Buqian Zheng <[email protected]>
foxspy pushed a commit to foxspy/knowhere that referenced this pull request Nov 18, 2024
* sparse: make the distance returned by RangeSearch and AnnIterator to be the raw instead of the quantized distance

Signed-off-by: Buqian Zheng <[email protected]>

* sparse: remove mutex in the index: we now use CC index if concurrent read/write is needed

Signed-off-by: Buqian Zheng <[email protected]>

---------

Signed-off-by: Buqian Zheng <[email protected]>
Signed-off-by: xianliang.li <[email protected]>
sre-ci-robot pushed a commit that referenced this pull request Nov 18, 2024
* update raft to 24.10 (#914)

Signed-off-by: yusheng.ma <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* fix Index parameters handling and anniterator (#913)

Signed-off-by: Alexandr Guzhva <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* add range check (#915)

Signed-off-by: xianliang.li <[email protected]>

* fix knowhere ut (#918)

Signed-off-by: xianliang.li <[email protected]>

* raft index supports cosine similarity by normalizing the input data. (#924)

Signed-off-by: yusheng.ma <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* compensate for the missing acceleration functions in ARM NEON. (#922)

Signed-off-by: yusheng.ma <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* improve sparse vector index mmap: to mmap almost everything (#928)

Signed-off-by: Buqian Zheng <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* move sparse index Add to build pool (#933)

Add SparseInvertedIndexNodeCC to allow being thread safe growing index

Signed-off-by: Buqian Zheng <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* sparse mmap on disk (#935)

Signed-off-by: Buqian Zheng <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* use MAP_PRIVATE for mmapped file (#938)

Signed-off-by: Buqian Zheng <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* sparse RangeSearch/AnnIterator to return raw distance (#944)

* sparse: make the distance returned by RangeSearch and AnnIterator to be the raw instead of the quantized distance

Signed-off-by: Buqian Zheng <[email protected]>

* sparse: remove mutex in the index: we now use CC index if concurrent read/write is needed

Signed-off-by: Buqian Zheng <[email protected]>

---------

Signed-off-by: Buqian Zheng <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* Add optimized distance functions for PowerPC (#894)

Added the PowerPC vector functions in src/simd/distances_powerpc.cc,
src/simd/distances_powerpc.h.  The hooks to the PowerPC functions are
added in src/simd/hook.cc.

Signed-off-by: Carl Love <[email protected]>
Co-authored-by: Carl Love <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

* enhance: optimize get norms function (#950)

Signed-off-by: cqy123456 <[email protected]>
Signed-off-by: xianliang.li <[email protected]>

---------

Signed-off-by: yusheng.ma <[email protected]>
Signed-off-by: xianliang.li <[email protected]>
Signed-off-by: Alexandr Guzhva <[email protected]>
Signed-off-by: Buqian Zheng <[email protected]>
Signed-off-by: Carl Love <[email protected]>
Signed-off-by: cqy123456 <[email protected]>
Co-authored-by: presburger <[email protected]>
Co-authored-by: Alexander Guzhva <[email protected]>
Co-authored-by: Buqian Zheng <[email protected]>
Co-authored-by: carll99 <[email protected]>
Co-authored-by: Carl Love <[email protected]>
Co-authored-by: cqy123456 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants