Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explorer STAC API search issue: only returning max of 20 items #575

Closed
robbibt opened this issue Feb 16, 2024 · 1 comment · Fixed by #578
Closed

Explorer STAC API search issue: only returning max of 20 items #575

robbibt opened this issue Feb 16, 2024 · 1 comment · Fixed by #578
Assignees

Comments

@robbibt
Copy link

robbibt commented Feb 16, 2024

A user on LinkedIn and @alexgleith have encountered a possible bug in our Explorer STAC search API (see link here).

If you do a super simple query of DEA's Sentinel-2 data from December 2023 to Feb 2024, you only get back data up to January 17, despite the data definitely existing:

import pystac_client, odc.stac

client = pystac_client.Client.open("https://explorer.sandbox.dea.ga.gov.au/stac")

# Search for items in the collection
collections = ["ga_s2am_ard_3", "ga_s2bm_ard_3"]
query = client.search(
    collections=collections,
    bbox=[146.04, -34.30, 146.05, -34.28],
    datetime="2023-12-01/2024-02-28",
)

# Search the STAC catalog for all items matching the query
[i.properties["datetime"] for i in query.get_items()]

image

It seems that by default, the query is only returning the first 20 items from the query. To get any extra data, the user has to manually provide a high limit, e.g.:

query = client.search(
    collections=collections,
    bbox=[146.04, -34.30, 146.05, -34.28],
    datetime="2023-12-01/2024-02-28",
    limit=1000,
)

image

This isn't typical behavior for STAC loading: normally when using pystac.client() it will automatically follow "next" page links to provide the user with all datasets matching their query - the user definitely isn't limited to a tiny amount like 20.

It looks to me that Explorer might be using the DEFAULT_PAGE_SIZE of 20 to define the absolute limit of datasets returned. This doesn't appear to follow the correct STAC API approach (see Slack conversation here and STAC API docs here). I can see this line which seems like it might the source of the issue - it seems to use DEFAULT_PAGE_SIZE if no limit is provided:

limit = request_args.get("limit", default=DEFAULT_PAGE_SIZE, type=int)

As it is, I think the current functionality is confusing to our users - they will naturally expect to get back all items matching their query (at least up to some sensibly high limit, definitely not 20), and only getting back half the time series is pretty unexpected.

@robbibt
Copy link
Author

robbibt commented Feb 16, 2024

For reference, doing a similar search on either RadiantEarth or Microsoft Planetary Computer's STAC APIs sucessfully returns all relevant datasets with no restrictive limit:

import pystac_client, odc.stac

catalogue = "https://planetarycomputer.microsoft.com/api/stac/v1"
catalogue = "https://earth-search.aws.element84.com/v1"

client = pystac_client.Client.open(catalogue)

# Search for items in the collection
collections = ["sentinel-2-l2a"]
query = client.search(
    collections=collections,
    bbox=[146.04, -34.30, 146.05, -34.28],
    datetime="2023-12-01/2024-02-28",
)

# Search the STAC catalog for all items matching the query
[i.properties["datetime"] for i in query.get_items()]

image

@Ariana-B Ariana-B linked a pull request Mar 5, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants