Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise Pagination Logic #107

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

acannistra
Copy link

Hi! Great library, very useful for us.

The Element84 STAC API /search endpoint in particular is returning malformed pagination URLs when used via POST. Here's an example of a URL present in the links object within the returned JSON:

https://earth-search.aws.element84.com/v0/search?datetime=2020-09-01%2F2020-10-14&intersects=%5Bobject%20Object%5D&collections[]=sentinel-s2-l2a-cogs&page=2&limit=500

This URL was returned after invoking search.Search.search(url=URL, collections=['sentinel-s2-l2a-cogs'], datetime="2020-09-01/2020-10-14", bbox=[-127.084,31.128,-106.699,49.8379]).items() from satsearch.

These malformed URLS fail to return the correct result-set, because the current logic simply retrieves results for the first page of the result-set N times, where N = ceil({total result set} / {max page limit}). For example, I had a result-set of 7159 images (from found()), but inspecting the results of items() revealed that 500 IDs were repeated 15 times . (15 = ceil(7159/500)).

To remedy this, I propose ignoring the malformed pagination URLs returned from the API and instead including the page paraemeter in the body of the request, which we increment at each iteration. This "auto-paginates" and ensures that the context of the query is sustained throughout the pagination process. Some experimentation with the API reveals that querying for pages outside the domain of the data (e.g. page 3 of a 1000-result query with a 500-item page limit) will still return a 200 response code, so to stop the pagination I simply ensure that features within the /search response is empty.

I can't find very much documentation for stac-server, so I don't really have much information about the workings of the API. However, this fix works for our use-case (executing large result-sets against the STAC api).

@pablotcarreira
Copy link

Worked for me. Thx for the fix.

@gjoseph92 gjoseph92 mentioned this pull request Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants