[IO-1746] Introduction of Pagination queries and changes to item_ids endpoint #707

Nathanjp91 · 2023-11-02T16:52:56Z

Problem

item ids collection uses an older endpoint, need to swap it over to the newer list_item_ids endpoint, which then needs to filter into changes on the meta objects that implement these. This however necessitates the creation of a solution to pagination on both the Core/Meta levels.

Solution

Introduce some major changes to accomodate Pagination, including

Core data object for keeping track of pages, darwin.future.data_objects.page.Page which has defaults that we use on the API
PaginatedQuery extension, which sets up pagination and changes relevant dunders to allow for pagination
- paginated objects are lazy loading, iterable and will remember the current page, only collecting when needed
- includes some new helper functions like collect_all to force total collection
- still allows for indexing, PaginatedQuery[76] will detect the required page to send and collect results if not stored
- len(PaginatedQuery) will collect_all and is something to note behaviour wise, this is a requirement to allow list(PaginatedQuery) to work, otherwise I would have made it Raise on an uncollected object
- Changes to underlying Query object required to allow mapping, reflected in changes to dunder functions (but not behaviour, which remains the same). Results now stored as a dictionary map of {index: result} indicating the offset location of the item
V7ID object now a wrapper for UUID
- Required to interact with Query objects, despite being overkill on UUID list.
- This approach will allow for other interaction later, like deletion of objects via IDs and any extensions we want to add
- Removal of the Generic[MetaBase] requirement and then just using UUID was a solution here, but causes lots of typing issues and would require a major change to how we structure the code, so was opted for a wrapper approach
Anything that uses item_ids prior now uses ItemIDQuery, the first PaginatedQuery object
Helper functions added to QueryString objects for adding and combining parameters together
Dogfooding changes useful for easier instantiation of objects and testing, creating default clients and other necessary objects when needed.

Changelog

New Item_ids behaviour and introduction of PaginatedQuery Objects inheritable for default pagination behaviour.

linear · 2023-11-02T16:52:59Z

IO-1746 Change get item ids to use updated endpoint

get_item_ids currently uses an old endpoint. This should be updated

Nathanjp91 · 2023-11-02T17:49:58Z

darwin/future/core/client.py

@@ -166,21 +170,6 @@ def headers(self) -> Dict[str, str]:
            http_headers["Authorization"] = f"ApiKey {self.config.api_key}"
        return http_headers

-    @overload


Totally unsure why I even had this overload, it didn't matter for the typing.

Possibly just experimenting with overloads etc. and got left in?

Nathanjp91 · 2023-11-02T17:51:56Z

darwin/future/core/types/common.py

@@ -83,3 +85,6 @@ def __init__(self, value: Dict[str, str]) -> None:

    def __str__(self) -> str:
        return "?" + "&".join(f"{k}={v}" for k, v in self.value.items())
+


Useful for combining parameters in the core functions when required for pagination which I treat seperately

Yeah, nice addition actually. Gotta love a good dunder.

Nathanjp91 · 2023-11-02T17:55:29Z

darwin/future/core/types/query.py

@@ -104,6 +106,12 @@ def _from_kwarg(cls, key: str, value: str) -> QueryFilter:
            modifier = None
        return QueryFilter(name=key, param=value, modifier=modifier)

+    def to_dict(self, ignore_modifier: bool = True) -> Dict[str, str]:


Mostly so they can be swapped in and out of QueryStrings for Paginated endpoints so that .where() syntax still works, for now ignores modifiers. I think this is not the most elegant solution though as eventually I think we'll want to probably do both server side query filter and client side once collected

Nathanjp91 · 2023-11-02T17:56:34Z

darwin/future/core/types/query.py

@@ -154,17 +162,16 @@ def __init__(
        self.meta_params: dict = meta_params or {}
        self.client = client
        self.filters = filters or []
-        self.results: Optional[List[T]] = None
-        self._changed_since_last: bool = True
+        self.results: dict[int, T] = {}


This is the core change that allows for Pagination, but all existing functions have been swapped over to preserve the outputs

Ah ok, so you went with the int hashtable style approach. 👍🏻

Nathanjp91 · 2023-11-02T17:59:34Z

darwin/future/meta/queries/item_id.py

+            if "dataset_ids" in self.meta_params
+            else self.meta_params["dataset_id"]
+        )
+        params: QueryString = reduce(


How I use the aforementioned QueryString + QueryString syntax, map reduce them with the page object and all the filters

I like that python has itertools, but I'd prefer a fluent interface like any sensible language ~~like Rust~~

Anyway, good stuff!

owencjones

Some comments to address before merge but good stuff

owencjones · 2023-11-06T09:46:49Z

darwin/future/core/client.py

@@ -166,21 +170,6 @@ def headers(self) -> Dict[str, str]:
            http_headers["Authorization"] = f"ApiKey {self.config.api_key}"
        return http_headers

-    @overload


Possibly just experimenting with overloads etc. and got left in?

owencjones · 2023-11-06T09:48:16Z

darwin/future/core/items/get.py

+    api_client: ClientCore,
+    team_slug: str,
+    dataset_id: Union[str, int],
+    params: QueryString = QueryString({}),


I should probably refactor QueryString to assume {} at some point.

owencjones · 2023-11-06T09:50:19Z

darwin/future/core/types/common.py

@@ -83,3 +85,6 @@ def __init__(self, value: Dict[str, str]) -> None:

    def __str__(self) -> str:
        return "?" + "&".join(f"{k}={v}" for k, v in self.value.items())
+


Yeah, nice addition actually. Gotta love a good dunder.

owencjones · 2023-11-06T09:57:55Z

darwin/future/core/types/query.py

@@ -154,17 +162,16 @@ def __init__(
        self.meta_params: dict = meta_params or {}
        self.client = client
        self.filters = filters or []
-        self.results: Optional[List[T]] = None
-        self._changed_since_last: bool = True
+        self.results: dict[int, T] = {}


Ah ok, so you went with the int hashtable style approach. 👍🏻

owencjones · 2023-11-06T10:06:51Z

darwin/future/core/types/query.py

-            self.client, filters=[*self.filters, filter], meta_params=self.meta_params
-        )
+        self.filters.append(filter)
+        return self


I like this better, it's much clearer and nicer to read for devs, but is the functionality not slightly different? Calling __class__ will be calling the initialiser and returning a newly (m)allocated class, and return self will return the current class as ref.

I assume, basically, that the difference doesn't matter?

it is slightly different yeah, it is modification in place rather than re-allocation, but I think that's a much more sensible approach here as most people won't be using the add dunder functionality except us anyway as internal functions, and it the modification in place saves time and memory.

owencjones · 2023-11-06T10:11:48Z

darwin/future/core/types/query.py

-            self.results = list(self._collect())
-        return len(self.results)
+            self.results = {**self.results, **self._collect()}
+        return len(self.results.keys())


I'm confident you've done this for a reason, but with self.results being of type Dict[int, T], len(self.results) == len(self.results.keys() doesn't it?

I might be missing something here, or maybe it's a STFU to mypy or similar?

nah you're right that's a simpler call and have pushed a fix

owencjones · 2023-11-06T10:28:23Z

darwin/future/core/types/query.py

@@ -205,12 +212,12 @@ def __next__(self) -> T:

    def __getitem__(self, index: int) -> T:
        if not self.results:


Does this now need to also take into account self._changed_since_last or are we only using that for results we have never had, and assuming they haven't changed in DB?

self.collect() already accounts for it, which is a wrapper that runs the _collect function that has to be written per meta class but does all the _changed_since_last and other logic

owencjones · 2023-11-06T10:36:50Z

darwin/future/core/types/query.py

+    def __init__(
+        self,
+        client: ClientCore,
+        filters: List[QueryFilter] | None = None,


Can we use this style of typing union whilst still meeting our version needs? It was introduced in 3.10.

I'm happy if other versions will just ignore this though, because frankly it looks a significant amount nicer.

I believe we can if we use the from future import annotations at the top, which I do anyway for some imports around classes that return themselves.

owencjones · 2023-11-06T10:56:12Z

darwin/future/core/types/query.py

+    def __len__(self) -> int:
+        if not self.completed:
+            self.collect_all()
+        return len(self.results.keys())


Same Q as before

owencjones · 2023-11-06T11:15:24Z

darwin/future/meta/queries/item_id.py

+            if "dataset_ids" in self.meta_params
+            else self.meta_params["dataset_id"]
+        )
+        params: QueryString = reduce(


I like that python has itertools, but I'd prefer a fluent interface like any sensible language ~~like Rust~~

Anyway, good stuff!

Nathanjp91 and others added 11 commits October 31, 2023 15:54

basic pagination

df0bd12

changes for meta pagination

633f040

paginated id query

6236927

WIP changes for pagination

d8d614b

pagination object [untested]

1ea5a9f

pagination objects completed

f762cca

test fixes

feb3c46

sensible defaults + test changes

aafae1f

base pagination collects all test

680ce1d

meta pagination tests

386838c

tweaks to useage

a38cf7d

Nathan Perkins added 3 commits November 2, 2023 16:56

removal of no longer needed exception

d506e53

linting changes

c5acd04

reverting 'sensible' defaults

0974fb1

Nathanjp91 commented Nov 2, 2023

View reviewed changes

owencjones approved these changes Nov 6, 2023

View reviewed changes

len changes

5707c97

Nathanjp91 merged commit d102e24 into master Nov 6, 2023
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IO-1746] Introduction of Pagination queries and changes to item_ids endpoint #707

[IO-1746] Introduction of Pagination queries and changes to item_ids endpoint #707

Nathanjp91 commented Nov 2, 2023 •

edited

Loading

linear bot commented Nov 2, 2023

Nathanjp91 Nov 2, 2023

owencjones Nov 6, 2023

Nathanjp91 Nov 2, 2023

owencjones Nov 6, 2023

Nathanjp91 Nov 2, 2023

Nathanjp91 Nov 2, 2023

owencjones Nov 6, 2023

Nathanjp91 Nov 2, 2023

owencjones Nov 6, 2023

owencjones left a comment

owencjones Nov 6, 2023

owencjones Nov 6, 2023

owencjones Nov 6, 2023

owencjones Nov 6, 2023

owencjones Nov 6, 2023

Nathanjp91 Nov 6, 2023

owencjones Nov 6, 2023

Nathanjp91 Nov 6, 2023

owencjones Nov 6, 2023

Nathanjp91 Nov 6, 2023

owencjones Nov 6, 2023

Nathanjp91 Nov 6, 2023

owencjones Nov 6, 2023

owencjones Nov 6, 2023

		@@ -83,3 +85,6 @@ def __init__(self, value: Dict[str, str]) -> None:

		def __str__(self) -> str:
		return "?" + "&".join(f"{k}={v}" for k, v in self.value.items())

		@@ -205,12 +212,12 @@ def __next__(self) -> T:

		def __getitem__(self, index: int) -> T:
		if not self.results:

[IO-1746] Introduction of Pagination queries and changes to item_ids endpoint #707

[IO-1746] Introduction of Pagination queries and changes to item_ids endpoint #707

Conversation

Nathanjp91 commented Nov 2, 2023 • edited Loading

Problem

Solution

Changelog

linear bot commented Nov 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owencjones left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Nathanjp91 commented Nov 2, 2023 •

edited

Loading