-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Multi-Vector Similarity Function #13991
base: main
Are you sure you want to change the base?
Conversation
I am thinking we can leverage the It'll need some work: a mechanism to address each vector value directly, and corresponding changes in VectorValues. I'm thinking maybe an "ordinal" for the multi-vector, and a "sub-ordinal" for values within the multi-vector. Both ints can be packed into a long for node value? Since I haven't chalked out all the details yet, I decided to remove the cc: @cpoerschke , @benwtrent |
for (float[] o : outerList) { | ||
float maxSim = Float.MIN_VALUE; | ||
for (float[] i : innerList) { | ||
maxSim = Float.max(maxSim, vectorSimilarityFunction.compare(o, i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add another compare
method with start and end indexes for both inner and outer, I guess we won't need to copy the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but it needs to go all the way down to VectorUtilSupport
, which I think should be a PR of its own.
} | ||
|
||
@Override | ||
public float aggregate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's so unfortunate that Java doesn't support generic primitive array and has to have duplicate code.
This is a small first change towards adding support for multi-vectors. We start with adding a
MultiVectorSimilarityFunction
that can handle (late) interaction across multiple vector values.This is the first of a series of splits for the larger prototype PR #13525