Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query_namespaces #409

Merged
merged 7 commits into from
Nov 13, 2024
Merged

Add query_namespaces #409

merged 7 commits into from
Nov 13, 2024

Conversation

jhamon
Copy link
Collaborator

@jhamon jhamon commented Oct 30, 2024

Problem

Sometimes people would like to run a query across multiple namespaces

Solution

Run a query for each namespace in parallel, then merge the results using a heap

from pinecone import Pinecone
import random

pc = Pinecone(api_key='api-key')

index = pc.Index(
    host="https://indexhost/",
    pool_threads=10
)

query_vec = [random.random()] * dimension

combined_results = index.query_namespaces(
    vector=query_vec,
    namespaces=["ns1", "ns2", "ns3", "ns4"],
    include_values=False,
    include_metadata=True,
    filter={"publication_date": {"$eq":"Last3Months"}},
    top_k=100
)

TODO

A grpc implementation of this will follow in a separate PR. I have WIP on it, but some mypy type issues were causing me headaches and I'd rather land this stuff first.

Type of Change

  • New feature (non-breaking change which adds functionality)

Test Plan

Added integration tests

@jhamon jhamon force-pushed the jhamon/query_namespaces_threadpool2 branch from 88be474 to 7941ee8 Compare November 11, 2024 13:44
@jhamon jhamon force-pushed the jhamon/query_namespaces_threadpool2 branch from 7fc2038 to 5da2610 Compare November 11, 2024 15:19
@jhamon jhamon marked this pull request as ready for review November 11, 2024 18:33
@jhamon jhamon requested a review from haruska November 11, 2024 18:33
Copy link
Contributor

@rohanshah18 rohanshah18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Started an internal discussion. LGTM!

Copy link
Contributor

@austin-denoble austin-denoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this felt easier to review than I thought it would be, nice job organizing things around these specific classes, and how things are handled by the query_namespaces function itself.

I left a comment about the code added to the generated core.

import time
import random

def retry_api_call(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just manually stubbing out retries in the generated code for now? Just curious, also regarding the print statement down there and whether it should be uncommented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just me throwing in something basic to get started. This is used to wrap __call_api, but I need to investigate the tuning of the constants and stuff to get the sleep intervals to sensible levels.

Also re: this being generated code, very soon I will be moving these elsewhere and not generating them, since the ApiClient and a couple other classes don't contain any generated content.

@jhamon jhamon merged commit e668c89 into main Nov 13, 2024
85 checks passed
@jhamon jhamon deleted the jhamon/query_namespaces_threadpool2 branch November 13, 2024 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants