Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNDB-12922: Implement rerank_k in SAI ANN queries #1562

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

michaeljmarshall
Copy link
Member

@michaeljmarshall michaeljmarshall commented Feb 11, 2025

What is the issue

Fixes https://github.com/riptano/cndb/issues/12922

What does this PR fix and why was it fixed

This PR follows up on #1525 by integrating the CQL configured rerank_k in to SAI so that the ANN query can take the user input and let it influence query execution.

Key changes:

  • Bump MessagingService to VERSION_DS_11
  • Fixes implementation of MessagingService.instance().endpointsWithVersionBelow by using the keyspace variant, which is necessary for CNDB.
  • When configured, the rerank_k is used as the graph's rerankK parameter. The only detail worth mentioning here is that we ignore the segment proportionality computation if rerank_k is provided. This diverges from the smart default design, but not too greatly.
  • Added a guardrail for rerank_k to prevent it from exceeding 4 times the cassandra.sai.vector_search.max_top_k. This is debatable and easily changed, so please let me know if we want something different.
  • Made the in memory brute force cost comparison logic use the rerankK value instead of the limit. I am pretty sure this is correct, but please review closely.

Instead of validating rerank_k's value within the query, I added a test that ensures that as we increase the rerank k, we increase the recall.

CNDB pr: https://github.com/riptano/cndb/pull/13095

Before this, we had minor inconcistencies where
the in memory and the on disk indexes diverged.
This might have been intentional, but I'm not
certain, so changing this here to force the
discussion in the PR review.

Now, everything is consistently rerankK when
estimating the cost to search the graph in order
to determine if brute force is cheaper.
Copy link

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

Comment on lines 171 to 174
// Default to no warning and failure at 4 times the maxTopK value
int maxTopK = CassandraRelevantProperties.SAI_VECTOR_SEARCH_MAX_TOP_K.getInt();
enforceDefault(sai_ann_rerank_k_warn_threshold, v -> sai_ann_rerank_k_warn_threshold = v, -1, -1);
enforceDefault(sai_ann_rerank_k_failure_threshold, v -> sai_ann_rerank_k_failure_threshold = v, 4 * maxTopK, 4 * maxTopK);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach could be to cap the ratio of LIMIT to rerank_k.

Copy link

@adelapena adelapena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! I have dropped some comments about the messaging version bump and testing.

@adelapena
Copy link

@michaeljmarshall for the CNDB PR, besides the messaging version bump, we will need to add the new guardrail to this test (I learned it the hard way)

Comment on lines +82 to +86
public static ANNOptions fromMap(String keyspace, Map<String, String> map)
{
assert keyspace != null;
// ensure that all nodes in the cluster are in a version that supports ANN options, including this one
Set<InetAddressAndPort> badNodes = MessagingService.instance().endpointsWithVersionBelow(MessagingService.VERSION_DS_11);
Set<InetAddressAndPort> badNodes = MessagingService.instance().endpointsWithVersionBelow(keyspace, MessagingService.VERSION_DS_11);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adelapena - what do you think about moving this validation to the validate method? It seems odd to pass the keyspace in to this method since it is only needed for validation and not for building the object from the map. I left it as is for now since it is a minor detail.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, +1

@michaeljmarshall
Copy link
Member Author

@adelapena - will you please re-review this PR? Thanks!

Copy link

@adelapena adelapena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, with a few trivial suggestions. I haven't looked at the CNDB PR yet.

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1562 rejected by Butler


3 new test failure(s) in 7 builds
See build details here


Found 3 new test failures

Test Explanation Branch history Upstream history
...n.HandshakeTest.testSendCompatibleFutureVersion regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
...ngTest.shouldThrowOnOverloadLargeMessages[3/v3] regression 🔴🔵🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴🔵🔵🔴🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵

Found 15 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants