-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IO-1746] Introduction of Pagination queries and changes to item_ids endpoint #707
Conversation
IO-1746 Change get item ids to use updated endpoint
|
@@ -166,21 +170,6 @@ def headers(self) -> Dict[str, str]: | |||
http_headers["Authorization"] = f"ApiKey {self.config.api_key}" | |||
return http_headers | |||
|
|||
@overload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally unsure why I even had this overload, it didn't matter for the typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly just experimenting with overloads etc. and got left in?
@@ -83,3 +85,6 @@ def __init__(self, value: Dict[str, str]) -> None: | |||
|
|||
def __str__(self) -> str: | |||
return "?" + "&".join(f"{k}={v}" for k, v in self.value.items()) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful for combining parameters in the core functions when required for pagination which I treat seperately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, nice addition actually. Gotta love a good dunder.
@@ -104,6 +106,12 @@ def _from_kwarg(cls, key: str, value: str) -> QueryFilter: | |||
modifier = None | |||
return QueryFilter(name=key, param=value, modifier=modifier) | |||
|
|||
def to_dict(self, ignore_modifier: bool = True) -> Dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly so they can be swapped in and out of QueryStrings for Paginated endpoints so that .where() syntax still works, for now ignores modifiers. I think this is not the most elegant solution though as eventually I think we'll want to probably do both server side query filter and client side once collected
@@ -154,17 +162,16 @@ def __init__( | |||
self.meta_params: dict = meta_params or {} | |||
self.client = client | |||
self.filters = filters or [] | |||
self.results: Optional[List[T]] = None | |||
self._changed_since_last: bool = True | |||
self.results: dict[int, T] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core change that allows for Pagination, but all existing functions have been swapped over to preserve the outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, so you went with the int hashtable style approach. 👍🏻
if "dataset_ids" in self.meta_params | ||
else self.meta_params["dataset_id"] | ||
) | ||
params: QueryString = reduce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I use the aforementioned QueryString + QueryString syntax, map reduce them with the page object and all the filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that python has itertools
, but I'd prefer a fluent interface like any sensible language like Rust
Anyway, good stuff!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments to address before merge but good stuff
@@ -166,21 +170,6 @@ def headers(self) -> Dict[str, str]: | |||
http_headers["Authorization"] = f"ApiKey {self.config.api_key}" | |||
return http_headers | |||
|
|||
@overload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly just experimenting with overloads etc. and got left in?
api_client: ClientCore, | ||
team_slug: str, | ||
dataset_id: Union[str, int], | ||
params: QueryString = QueryString({}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably refactor QueryString
to assume {}
at some point.
@@ -83,3 +85,6 @@ def __init__(self, value: Dict[str, str]) -> None: | |||
|
|||
def __str__(self) -> str: | |||
return "?" + "&".join(f"{k}={v}" for k, v in self.value.items()) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, nice addition actually. Gotta love a good dunder.
@@ -154,17 +162,16 @@ def __init__( | |||
self.meta_params: dict = meta_params or {} | |||
self.client = client | |||
self.filters = filters or [] | |||
self.results: Optional[List[T]] = None | |||
self._changed_since_last: bool = True | |||
self.results: dict[int, T] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, so you went with the int hashtable style approach. 👍🏻
self.client, filters=[*self.filters, filter], meta_params=self.meta_params | ||
) | ||
self.filters.append(filter) | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this better, it's much clearer and nicer to read for devs, but is the functionality not slightly different? Calling __class__
will be calling the initialiser and returning a newly (m)allocated class, and return self
will return the current class as ref.
I assume, basically, that the difference doesn't matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is slightly different yeah, it is modification in place rather than re-allocation, but I think that's a much more sensible approach here as most people won't be using the add dunder functionality except us anyway as internal functions, and it the modification in place saves time and memory.
darwin/future/core/types/query.py
Outdated
self.results = list(self._collect()) | ||
return len(self.results) | ||
self.results = {**self.results, **self._collect()} | ||
return len(self.results.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confident you've done this for a reason, but with self.results
being of type Dict[int, T]
, len(self.results) == len(self.results.keys()
doesn't it?
I might be missing something here, or maybe it's a STFU to mypy or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah you're right that's a simpler call and have pushed a fix
@@ -205,12 +212,12 @@ def __next__(self) -> T: | |||
|
|||
def __getitem__(self, index: int) -> T: | |||
if not self.results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this now need to also take into account self._changed_since_last
or are we only using that for results we have never had, and assuming they haven't changed in DB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.collect() already accounts for it, which is a wrapper that runs the _collect function that has to be written per meta class but does all the _changed_since_last and other logic
def __init__( | ||
self, | ||
client: ClientCore, | ||
filters: List[QueryFilter] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use this style of typing union whilst still meeting our version needs? It was introduced in 3.10.
I'm happy if other versions will just ignore this though, because frankly it looks a significant amount nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can if we use the from future import annotations at the top, which I do anyway for some imports around classes that return themselves.
darwin/future/core/types/query.py
Outdated
def __len__(self) -> int: | ||
if not self.completed: | ||
self.collect_all() | ||
return len(self.results.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same Q as before
if "dataset_ids" in self.meta_params | ||
else self.meta_params["dataset_id"] | ||
) | ||
params: QueryString = reduce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that python has itertools
, but I'd prefer a fluent interface like any sensible language like Rust
Anyway, good stuff!
Problem
item ids collection uses an older endpoint, need to swap it over to the newer list_item_ids endpoint, which then needs to filter into changes on the meta objects that implement these. This however necessitates the creation of a solution to pagination on both the Core/Meta levels.
Solution
Introduce some major changes to accomodate Pagination, including
darwin.future.data_objects.page.Page
which has defaults that we use on the APIPaginatedQuery[76]
will detect the required page to send and collect results if not storedlen(PaginatedQuery)
will collect_all and is something to note behaviour wise, this is a requirement to allowlist(PaginatedQuery)
to work, otherwise I would have made it Raise on an uncollected objectGeneric[MetaBase]
requirement and then just using UUID was a solution here, but causes lots of typing issues and would require a major change to how we structure the code, so was opted for a wrapper approachChangelog
New Item_ids behaviour and introduction of PaginatedQuery Objects inheritable for default pagination behaviour.