-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config / init refactor for state encapsulation #220
Conversation
QueryResult = None | ||
"""@private""" | ||
FetchResult = None | ||
"""@private""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have context on these, but I can't imagine any scenario where returning a bunch of None
objects meaningfully preserves backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'm not really sure what this was for. I think it's probably fine to pull them out. Otherwise it would just be someone defining behavior themselves for these things anyways.
try: | ||
from .grpc.index_grpc import * | ||
from .data.grpc.index_grpc import * | ||
except ImportError: | ||
pass # ignore for non-[grpc] installations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These try
import statements led to some pretty confusing debug situations when stuff started failing due to me moving packages from one location to another since the intention is really only to ignore failures caused by people installing without grpc extras. We should revisit this pattern and make sure we're testing stuff both with and without grpc deps loaded.
pinecone/config/logging.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these constants out of old utils.py
since they were only being used in one place.
The rest of this I pulled out of We need a whole ticket to revisit and overhaul the logging approach in a holistic way.
pinecone/config/openapi.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything in here was pretty much copy/pasta out of the old config.py but wrapped inside a new factory class. I haven't done anything to verify these settings for openapi, but they worked in the past so hopefully they are sufficient for now.
|
||
class PineconeConfig(Config): | ||
def __init__(self, api_key: str = None, host: str = None, **kwargs): | ||
host = host or kwargs.get("host") or os.getenv("PINECONE_CONTROLLER_HOST") or DEFAULT_CONTROLLER_HOST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want to keep this or not, but in the past there was an env var called PINECONE_CONTROLLER_HOST
being used.
But as you can see, the only difference between a Config
instance and the PineconeConfig
subclass is a little bit of logic here about how to set the host value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure whether the env var should be maintained, but I suppose it's a nice convenience and not too much trouble to keep around.
class SingletonMeta(type): | ||
_instances = {} | ||
|
||
def __call__(cls, *args, **kwargs): | ||
if cls not in cls._instances: | ||
instance = super().__call__(*args, **kwargs) | ||
cls._instances[cls] = instance | ||
return cls._instances[cls] | ||
|
||
|
||
class IndexHostStore(metaclass=SingletonMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confess, I had help from ChatGPT on this. I still need to add tests to verify the singleton is working as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure I was using GPT for examples of Python singletons and this is the approach it gave me as well.
@@ -0,0 +1,7 @@ | |||
class CollectionDescription(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled this out of legacy manage.py
. Not sure if we really want to keep it or not long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure of this also. I also wasn't sure why it's doing an anonymous mapping to a dict rather than having a defined set of keys like in IndexDescription(NamedTuple)
.
Would these classes ultimately be useful to external users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether it's a dictionary or an extention of NamedTuple
, the usual motivation for a presenter object like is just to have greater control over what is shown to the user and discard any unwanted fields that may be emitted by the underlying generated code. I don't see why that's needed in this case though. Probably I will delete it soon but not in this diff.
@@ -0,0 +1,13 @@ | |||
from typing import NamedTuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled this out of legacy manage.py
. Not sure if we really want to keep it or not long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the logic in this file moved from manage.py
pinecone/control/pinecone.py
Outdated
db = response.database | ||
host = response.status.host | ||
|
||
self.index_host_store.set_host(self.config, name, "https://" + host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We proactively stick this host url in the singleton store if they happen to call describe_index before doing any index operations. Just to save an additional round trip.
@@ -134,7 +134,7 @@ def configure_index_with_http_info( | |||
:rtype: tuple(IndexMeta, status_code(int), headers(HTTPHeaderDict)) | |||
""" | |||
|
|||
_hosts = ["https://controller.{environment}.pinecone.io"] | |||
_hosts = ["https://api.pinecone.io"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this file are generated, derived from changes to the servers
field in our openapi spec stored in the other repository.
@@ -30,7 +30,7 @@ class IndexMetaDatabase(BaseModel): | |||
""" | |||
|
|||
name: StrictStr = Field(...) | |||
dimension: StrictStr = Field(...) | |||
dimension: StrictInt = Field(...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a generated change, based on an updated openapi spec.
@@ -0,0 +1,192 @@ | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these out of the giant index.grpc.py
file. No changes.
@@ -0,0 +1,33 @@ | |||
from .retry import RetryConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these out of the giant index.grpc.py
file. No changes.
@@ -0,0 +1,34 @@ | |||
from grpc._channel import _MultiThreadedRendezvous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these out of the giant index.grpc.py
file. No changes.
from functools import wraps | ||
from importlib.util import find_spec | ||
from typing import NamedTuple, Optional, Dict, Iterable, Union, List, Tuple, Any | ||
from typing import Optional, Dict, Iterable, Union, List, Tuple, Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted a bunch of supporting classes and functions into smaller files. Removed dependencies no longer in use by the class defined in this file. Otherwise, no changes.
@@ -0,0 +1,107 @@ | |||
import uuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these grpc-related utils out of a general utils.py
file so there wouldn't be any try/except import statements needed in here, and also so they are closer to where they are actually being used in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file got renamed and modified. Now it is tests/unit/test_control.py
@@ -0,0 +1,10 @@ | |||
import requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from pinecone/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from pinecone/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from pinecone/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from pinecone/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from pinecone/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved some constants closer to where they are actually being used, since they were only used in one spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from pinecone/utils.py
@@ -0,0 +1,88 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftovers, hopefully to be deleted soon pending investigation on vector_column_service.proto
def __enter__(self): | ||
return self | ||
|
||
def __exit__(self, exc_type, exc_value, traceback): | ||
self._api_client.close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods were the only thing I needed to add back as a consequence changing Index
to not subclass ApiClient
to satisfy one test (using with Index(...) as index
syntax). The python language docs show tthese must be implemented in order to use the with
syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's very helpful.
def __init__(self, api_key: str, host: str, pool_threads=1, **kwargs): | ||
api_key = api_key or kwargs.get("api_key") | ||
host = host or kwargs.get('host') | ||
pool_threads = pool_threads or kwargs.get("pool_threads") | ||
|
||
self._config = Config(api_key=api_key, host=host, **kwargs) | ||
|
||
api_client = ApiClient(configuration=self._config.OPENAPI_CONFIG, pool_threads=pool_threads) | ||
api_client.user_agent = get_user_agent() | ||
self._api_client = api_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a substantive change to adopt the new Config
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for taking this on! Organizationally and architecturally this feels much, much better to me. I think the refactor of the singleton approach and the ability for users to be more flexible when using the client is a big win.
Overall this looks good to go I think. I'd like to do some detailed testing around the changes here but will not have time this evening.
Definitely would like to get this merged to the feature branch and start iterating on it.
@@ -0,0 +1,7 @@ | |||
class CollectionDescription(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure of this also. I also wasn't sure why it's doing an anonymous mapping to a dict rather than having a defined set of keys like in IndexDescription(NamedTuple)
.
Would these classes ultimately be useful to external users?
QueryResult = None | ||
"""@private""" | ||
FetchResult = None | ||
"""@private""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'm not really sure what this was for. I think it's probably fine to pull them out. Otherwise it would just be someone defining behavior themselves for these things anyways.
openapi_config | ||
or kwargs.pop("openapi_config", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that openapi_config
would be passed in via method arg + also be in kwargs
with the same name? I guess this question is also for other areas where we do something similar, like in pinecone/data/index.py
in the __init__
and I'm just somewhat unclear on kwargs
.
class SingletonMeta(type): | ||
_instances = {} | ||
|
||
def __call__(cls, *args, **kwargs): | ||
if cls not in cls._instances: | ||
instance = super().__call__(*args, **kwargs) | ||
cls._instances[cls] = instance | ||
return cls._instances[cls] | ||
|
||
|
||
class IndexHostStore(metaclass=SingletonMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure I was using GPT for examples of Python singletons and this is the approach it gave me as well.
self._indexHosts = {} | ||
|
||
def _key(self, config: Config, index_name: str) -> str: | ||
return ":".join([config.API_KEY, index_name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hyper-nit: Using a delimiter + join
seems fine here, I suppose you could also use a template string for possibly easier readability, but obviously just imo:
return ":".join([config.API_KEY, index_name]) | |
return f"{config.API_KEY}:{index_name}" |
import copy | ||
|
||
__all__ = [ | ||
"Index", | ||
"FetchResponse", | ||
"ProtobufAny", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guessing this was removed as it doesn't seem necessarily useful being exposed for end-users, but lemme know if otherwise.
def __enter__(self): | ||
return self | ||
|
||
def __exit__(self, exc_type, exc_value, traceback): | ||
self._api_client.close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's very helpful.
|
||
class PineconeConfig(Config): | ||
def __init__(self, api_key: str = None, host: str = None, **kwargs): | ||
host = host or kwargs.get("host") or os.getenv("PINECONE_CONTROLLER_HOST") or DEFAULT_CONTROLLER_HOST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure whether the env var should be maintained, but I suppose it's a nice convenience and not too much trouble to keep around.
def validate_and_convert_errors(func): | ||
@wraps(func) | ||
def inner_func(*args, **kwargs): | ||
Config.validate() # raises exceptions in case of invalid config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens within the init of the Config
itself, right? Not really necessary to have it here.
assert Config.CONTROLLER_HOST == controller_host | ||
|
||
|
||
def test_resolution_order_kwargs_over_config_file(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we lost the tests around resolution order over a config file, but I'd assume that's fine given we'll need to add new testing around the config singleton and stuff anyways, and the config file setup isn't really documented.
Problem
Singleton config limits how people can use this SDK and makes testing more difficult. Also, we no longer should fetch projectId and construct data url out of configuration parts, to keep the client agnostic to the format of the data plane url.
Solution
Config
and a subclassPineconeConfig
.Config
is a vessel for api_key, host, and openapi configuration. It is host-agnostic because it is used by both the control plane (Pinecone
) and the data plane (Index
)PineconeConfig
is a convenience class to set the controller host forPinecone
.IndexHostStore
means we won't make unnecessary and repetitious calls to describe_index if a user creates multipleIndex
instances referring to the same index. In the old paradigm, this was accomplished with a piece of global state storing off projectId.Index
class constructor now requires host url instead of an index name. If you use the helper method on aPinecone
class instance to instantiate the object, it will manage fetching the host url for you. E.g.p = Pinecone(api_key='foo'); index = p.Index('my-index')
.manage.py
got moved and renamed topinecone/control/pinecone.py
. The functions such ascreate_index
,describe_index
, etc that previously were top-level package exports relying on global state to access config information have been turned into class methods on a newPinecone
class defined in this file.pinecone/index.py
got moved topinecone/data/index.py
. Thepinecone/data
package is a home for all classes used to access the data plane.grpc
related got shoved intopinecone/data/grpc
.pinecone/utils.py
and moved topinecone/data/grpc/utils.py
.deprecated
package. Probably will be deleted soon.pinecone/index_grpc.py
to split up into many smaller files insidepinecone/data/grpc/
package.pinecone/utils.py
got broken up into many smaller files inpinecone/utils/
Using it
Mostly working
describe_index
(call succeeds but need to follow up on source_collection field, would ideally return different object for different capacity_modes)Somewhat working:
create_index
(API call succeeds, but client blows up when response is not the expected shape)delete_index
(only when timeout=-1, since otherwise it has dependency onlist_indexes
)Broken:
list_indexes
(response not expected shape)Type of Change
Test Plan
Automated tests will be expanded in a future PR. A bunch of code moved around to different files or got split from big files into small files and I want to merge this part of the change to unlock parallel work by Austin various follow-ups.