[Bug] query_namespaces can handle single result #421

jhamon · 2024-12-06T18:49:07Z

Problem

In order to merge results across multiple queries, the SDK must know which similarity metric an index is using. For dotproduct and cosine indexes, a larger score is better while for euclidean a smaller score is better. Unfortunately the data plane API does not currently expose the metric type and a separate call to the control plane to find out seems undesirable from a resiliency and performance perspective.

As a workaround, in the initial implementation of query_namespaces the SDK would infer the similarity metric needed to merge results by seeing whether the scores of query results were ascending or descending. This worked well, but imposes an implicit limitation that there must be at least 2 results returned.

We initially believed this would not be a problem but have since learned that applications using filtering can sometimes filter out all or most results. So an approach that has the user explicitly telling the SDK what similarity metric is being used is preferred to handle these edge cases with 1 or 0 results.

Solution

Add a required kwarg to query_namespaces to specify the index similarity metric.
Modify QueryResultsAggregator to use this similarity metric, and strip out code that was involved in inferring whether results were ascending or descending.
Adjust integration tests to pass new metric kwarg. Except for adding the new kwarg, query_namespaces integration tests did not need to change which indicates the underlying behavior is still working as before.

Type of Change

Bug fix (non-breaking change which fixes an issue)

jhamon · 2024-12-06T19:02:55Z

tests/integration/data/test_query_namespaces.py

@@ -145,22 +147,7 @@ def test_query_namespaces(self, idx):
        assert len(results6.matches) == 0
        assert results6.usage.read_units > 0

-    def test_invalid_top_k(self, idx):


Don't need this test anymore since top_k of 1 is now valid.

austin-denoble

LGTM, this seems like a reasonable way to do it. Plus, if the user has stored off their own index configuration details, or fetched them previously, they could just thread them through.

rohanshah18

Tested locally for sorting and duplicates order. LGTM!

In order to merge results across multiple queries, the SDK must know which similarity metric an index is using. For dotproduct and cosine indexes, a larger score is better while for euclidean a smaller score is better. Unfortunately the data plane API does not currently expose the metric type and a separate call to the control plane to find out seems undesirable from a resiliency and performance perspective. As a workaround, in the initial implementation of `query_namespaces` the SDK would infer the similarity metric needed to merge results by seeing whether the scores of query results were ascending or descending. This worked well, but imposes an implicit limitation that there must be at least 2 results returned. We initially believed this would not be a problem but have since learned that applications using filtering can sometimes filter out all or most results. So an approach that has the user explicitly telling the SDK what similarity metric is being used is preferred to handle these edge cases with 1 or 0 results. - Add a required kwarg to `query_namespaces` to specify the index similarity metric. - Modify `QueryResultsAggregator` to use this similarity metric, and strip out code that was involved in inferring whether results were ascending or descending. - Adjust integration tests to pass new metric kwarg. Except for adding the new kwarg, query_namespaces integration tests did not need to change which indicates the underlying behavior is still working as before. - [x] Bug fix (non-breaking change which fixes an issue)

Add type hints for metric kwarg

6d6f63e

jhamon commented Dec 6, 2024

View reviewed changes

Fix integration test

d7f406f

jhamon marked this pull request as ready for review December 6, 2024 20:43

jhamon requested review from austin-denoble, haruska, rohanshah18 and aulorbe December 6, 2024 20:44

austin-denoble approved these changes Dec 6, 2024

View reviewed changes

rohanshah18 approved these changes Dec 6, 2024

View reviewed changes

jhamon merged commit 5453aab into main Dec 6, 2024
85 checks passed

jhamon deleted the jhamon/query_namespaces_specify_metric branch December 6, 2024 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] query_namespaces can handle single result #421

[Bug] query_namespaces can handle single result #421

jhamon commented Dec 6, 2024 •

edited

Loading

jhamon Dec 6, 2024

austin-denoble left a comment

rohanshah18 left a comment

[Bug] query_namespaces can handle single result #421

[Bug] query_namespaces can handle single result #421

Conversation

jhamon commented Dec 6, 2024 • edited Loading

Problem

Solution

Type of Change

jhamon Dec 6, 2024

Choose a reason for hiding this comment

austin-denoble left a comment

Choose a reason for hiding this comment

rohanshah18 left a comment

Choose a reason for hiding this comment

jhamon commented Dec 6, 2024 •

edited

Loading