New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Set norm to 1.0 for all-0 vectors #803

Merged

sre-ci-robot merged 1 commit into zilliztech:main from cydrain:caiyd_dont_norm_all_zero_vectors_sol2

Sep 3, 2024

Collaborator

cydrain commented Aug 30, 2024 •

edited

Loading

Issue: milvus-io/milvus#35594

sre-ci-robot requested review from chasingegg and hhy3

August 30, 2024 08:45

Collaborator

sre-ci-robot commented Aug 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cydrain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cydrain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot added the approved label

Collaborator Author

cydrain commented Aug 30, 2024

/kind improvement
/hold

sre-ci-robot added size/M kind/improvement do-not-merge/hold labels

mergify bot added the dco-passed label

cydrain mentioned this pull request

Set norm to 1.0 for all-0 vectors #799

Closed

codecov bot commented Aug 30, 2024 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.66%. Comparing base (3c46f4c) to head (e328437).
Report is 169 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##           main     #803       +/-   ##
=========================================
+ Coverage      0   61.66%   +61.66%     
=========================================
  Files         0       84       +84     
  Lines         0     6149     +6149     
=========================================
+ Hits          0     3792     +3792     
- Misses        0     2357     +2357

see 84 files with indirect coverage changes

Collaborator

alexanderguzhva commented Aug 30, 2024

/hold

alexanderguzhva requested changes

View reviewed changes

thirdparty/faiss/faiss/IVFlib.cpp Outdated

@@ @@ -508,6 +508,7 @@ void ivf_residual_add_from_flat_codes( @@
                               // ok
                               index->rq.decode(tmp_code.data(), tmp.data(), 1);
                               float norm = fvec_norm_L2sqr(tmp.data(), rq.d);
+                              norm = (norm == 0.0 ? 1.0 : norm);

Collaborator

alexanderguzhva Aug 30, 2024

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

thirdparty/faiss/faiss/IndexAdditiveQuantizer.cpp Outdated

@@ @@ -93,6 +93,7 @@ struct AQDistanceComputerLUT : FlatCodesDistanceComputer { @@
                           bias = 0;
                       } else {
                           bias = fvec_norm_L2sqr(x, d);
+                          bias = (bias == 0.0 ? 1.0 : bias);

Collaborator

alexanderguzhva Aug 30, 2024

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

thirdparty/faiss/faiss/IndexAdditiveQuantizer.cpp Outdated

@@ @@ -174,6 +175,7 @@ void search_with_LUT( @@
                       if (!is_IP) { // the LUT function returns ||y||^2 - 2 * <x, y>, need to
                                     // add ||x||^2
                           bias = fvec_norm_L2sqr(xq + q * d, d);
+                          bias = (bias == 0.0 ? 1.0 : bias);

Collaborator

alexanderguzhva Aug 30, 2024

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

thirdparty/faiss/faiss/IndexFlat.cpp Outdated

@@ @@ -354,6 +354,7 @@ struct FlatL2WithNormsDis : FlatCodesDistanceComputer { @@
                   void set_query(const float* x) override {
                       q = x;
                       query_l2norm = fvec_norm_L2sqr(q, d);
+                      query_l2norm = (query_l2norm == 0.0 ? 1.0 : query_l2norm);

Collaborator

alexanderguzhva Aug 30, 2024

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

thirdparty/faiss/faiss/IndexIVFAdditiveQuantizer.cpp Outdated

@@ @@ -212,6 +212,7 @@ struct AQInvertedListScannerLUT : AQInvertedListScanner { @@
                       AQInvertedListScanner::set_query(query_vector);
                       if (!is_IP && !ia.by_residual) {
                           distance_bias = fvec_norm_L2sqr(query_vector, ia.d);
+                          distance_bias = (distance_bias == 0.0 ? 1.0 : distance_bias);

Collaborator

alexanderguzhva Aug 30, 2024

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

thirdparty/faiss/faiss/impl/AdditiveQuantizer.cpp Outdated

               #pragma omp for
                       for (int64_t i = 0; i < ntotal; i++) {
                           decode_64bit(i, tmp.data());
-                          norms[i] = fvec_norm_L2sqr(tmp.data(), d);
+                          float norm = fvec_norm_L2sqr(tmp.data(), d);
+                          norms[i] = (norm == 0.0 ? 1.0 : norm);

Collaborator

alexanderguzhva Aug 30, 2024

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

thirdparty/faiss/faiss/utils/distances.cpp Outdated

@@ @@ -65,7 +65,9 @@ void fvec_norms_L2( @@
                       size_t nx) {
               #pragma omp parallel for if (nx > 10000)
                   for (int64_t i = 0; i < nx; i++) {
-                      nr[i] = sqrtf(fvec_norm_L2sqr(x + i * d, d));
+                      auto norm = fvec_norm_L2sqr(x + i * d, d);

Collaborator

alexanderguzhva Aug 30, 2024

no, because it might use norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

thirdparty/faiss/faiss/utils/distances.cpp Outdated

-                      nr[i] = fvec_norm_L2sqr(x + i * d, d);
+                  for (int64_t i = 0; i < nx; i++) {
+                      float norm = fvec_norm_L2sqr(x + i * d, d);
+                      nr[i] = (norm == 0.0 ? 1.0 : norm);

Collaborator

alexanderguzhva Aug 30, 2024

no, because it might use norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

thirdparty/faiss/faiss/utils/distances.cpp

                                   (y_norms != nullptr) ?
                                       y_norms[j] :
                                       sqrtf(fvec_norm_L2sqr(y + j * d, d));
+                              norm = (norm == 0.0 ? 1.0 : norm);

Collaborator

alexanderguzhva Aug 30, 2024

yes

thirdparty/faiss/faiss/utils/distances.cpp

                               (y_norms != nullptr) ?
                                   y_norms[idsi[j]] :
                                   sqrtf(fvec_norm_L2sqr(y + d * idsi[j], d));
+                          norm = (norm == 0.0 ? 1.0 : norm);

Collaborator

alexanderguzhva Aug 30, 2024

yes

cydrain force-pushed the caiyd_dont_norm_all_zero_vectors_sol2 branch from 83a7cea to ae055f2 Compare

September 2, 2024 02:28

sre-ci-robot added size/S and removed size/M labels

cydrain force-pushed the caiyd_dont_norm_all_zero_vectors_sol2 branch 2 times, most recently from a3be7c2 to 13416c7 Compare

September 2, 2024 03:58


          Set norm to 1.0 for all-0 vectors

e328437

Signed-off-by: Cai Yudong <[email protected]>

cydrain force-pushed the caiyd_dont_norm_all_zero_vectors_sol2 branch from 13416c7 to e328437 Compare

September 2, 2024 06:34

mergify bot added the ci-passed label

Collaborator

alexanderguzhva commented Sep 3, 2024

/lgtm

sre-ci-robot assigned alexanderguzhva

sre-ci-robot added the lgtm label

Collaborator

alexanderguzhva commented Sep 3, 2024

@cydrain does it solve milvus-io/milvus#35594 ?

Collaborator

alexanderguzhva commented Sep 3, 2024

/unhold

sre-ci-robot removed the do-not-merge/hold label

sre-ci-robot merged commit 14818a1 into zilliztech:main

13 checks passed

Collaborator Author

cydrain commented Sep 4, 2024

@cydrain does it solve milvus-io/milvus#35594 ?

Yes, milvus#35594 will be fixed with this PR

cydrain deleted the caiyd_dont_norm_all_zero_vectors_sol2 branch

September 4, 2024 02:01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved ci-passed dco-passed kind/improvement lgtm size/S