Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set norm to 1.0 for all-0 vectors #803

Merged

Conversation

cydrain
Copy link
Collaborator

@cydrain cydrain commented Aug 30, 2024

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cydrain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cydrain
Copy link
Collaborator Author

cydrain commented Aug 30, 2024

/kind improvement
/hold

Copy link

codecov bot commented Aug 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.66%. Comparing base (3c46f4c) to head (e328437).
Report is 169 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           main     #803       +/-   ##
=========================================
+ Coverage      0   61.66%   +61.66%     
=========================================
  Files         0       84       +84     
  Lines         0     6149     +6149     
=========================================
+ Hits          0     3792     +3792     
- Misses        0     2357     +2357     

see 84 files with indirect coverage changes

@alexanderguzhva
Copy link
Collaborator

/hold

@@ -508,6 +508,7 @@ void ivf_residual_add_from_flat_codes(
// ok
index->rq.decode(tmp_code.data(), tmp.data(), 1);
float norm = fvec_norm_L2sqr(tmp.data(), rq.d);
norm = (norm == 0.0 ? 1.0 : norm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

@@ -93,6 +93,7 @@ struct AQDistanceComputerLUT : FlatCodesDistanceComputer {
bias = 0;
} else {
bias = fvec_norm_L2sqr(x, d);
bias = (bias == 0.0 ? 1.0 : bias);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

@@ -174,6 +175,7 @@ void search_with_LUT(
if (!is_IP) { // the LUT function returns ||y||^2 - 2 * <x, y>, need to
// add ||x||^2
bias = fvec_norm_L2sqr(xq + q * d, d);
bias = (bias == 0.0 ? 1.0 : bias);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

@@ -354,6 +354,7 @@ struct FlatL2WithNormsDis : FlatCodesDistanceComputer {
void set_query(const float* x) override {
q = x;
query_l2norm = fvec_norm_L2sqr(q, d);
query_l2norm = (query_l2norm == 0.0 ? 1.0 : query_l2norm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

@@ -212,6 +212,7 @@ struct AQInvertedListScannerLUT : AQInvertedListScanner {
AQInvertedListScanner::set_query(query_vector);
if (!is_IP && !ia.by_residual) {
distance_bias = fvec_norm_L2sqr(query_vector, ia.d);
distance_bias = (distance_bias == 0.0 ? 1.0 : distance_bias);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

@@ -322,7 +322,8 @@ void AdditiveQuantizer::compute_centroid_norms(float* norms) const {
#pragma omp for
for (int64_t i = 0; i < ntotal; i++) {
decode_64bit(i, tmp.data());
norms[i] = fvec_norm_L2sqr(tmp.data(), d);
float norm = fvec_norm_L2sqr(tmp.data(), d);
norms[i] = (norm == 0.0 ? 1.0 : norm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because it uses norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

@@ -65,7 +65,9 @@ void fvec_norms_L2(
size_t nx) {
#pragma omp parallel for if (nx > 10000)
for (int64_t i = 0; i < nx; i++) {
nr[i] = sqrtf(fvec_norm_L2sqr(x + i * d, d));
auto norm = fvec_norm_L2sqr(x + i * d, d);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because it might use norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

nr[i] = fvec_norm_L2sqr(x + i * d, d);
for (int64_t i = 0; i < nx; i++) {
float norm = fvec_norm_L2sqr(x + i * d, d);
nr[i] = (norm == 0.0 ? 1.0 : norm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because it might use norm for distance = x^2 - 2xy + y^2 calculation, not for a cosine-related code

(y_norms != nullptr) ?
y_norms[j] :
sqrtf(fvec_norm_L2sqr(y + j * d, d));

norm = (norm == 0.0 ? 1.0 : norm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

(y_norms != nullptr) ?
y_norms[idsi[j]] :
sqrtf(fvec_norm_L2sqr(y + d * idsi[j], d));
norm = (norm == 0.0 ? 1.0 : norm);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@cydrain cydrain force-pushed the caiyd_dont_norm_all_zero_vectors_sol2 branch from 83a7cea to ae055f2 Compare September 2, 2024 02:28
@sre-ci-robot sre-ci-robot added size/S and removed size/M labels Sep 2, 2024
@cydrain cydrain force-pushed the caiyd_dont_norm_all_zero_vectors_sol2 branch 2 times, most recently from a3be7c2 to 13416c7 Compare September 2, 2024 03:58
@cydrain cydrain force-pushed the caiyd_dont_norm_all_zero_vectors_sol2 branch from 13416c7 to e328437 Compare September 2, 2024 06:34
@mergify mergify bot added the ci-passed label Sep 3, 2024
@alexanderguzhva
Copy link
Collaborator

/lgtm

@alexanderguzhva
Copy link
Collaborator

@cydrain does it solve milvus-io/milvus#35594 ?

@alexanderguzhva
Copy link
Collaborator

/unhold

@sre-ci-robot sre-ci-robot merged commit 14818a1 into zilliztech:main Sep 3, 2024
13 checks passed
@cydrain
Copy link
Collaborator Author

cydrain commented Sep 4, 2024

@cydrain does it solve milvus-io/milvus#35594 ?

Yes, milvus#35594 will be fixed with this PR

@cydrain cydrain deleted the caiyd_dont_norm_all_zero_vectors_sol2 branch September 4, 2024 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants