Explorer STAC API search issue: only returning max of 20 items #575

robbibt · 2024-02-16T01:09:09Z

A user on LinkedIn and @alexgleith have encountered a possible bug in our Explorer STAC search API (see link here).

If you do a super simple query of DEA's Sentinel-2 data from December 2023 to Feb 2024, you only get back data up to January 17, despite the data definitely existing:

import pystac_client, odc.stac

client = pystac_client.Client.open("https://explorer.sandbox.dea.ga.gov.au/stac")

# Search for items in the collection
collections = ["ga_s2am_ard_3", "ga_s2bm_ard_3"]
query = client.search(
    collections=collections,
    bbox=[146.04, -34.30, 146.05, -34.28],
    datetime="2023-12-01/2024-02-28",
)

# Search the STAC catalog for all items matching the query
[i.properties["datetime"] for i in query.get_items()]

It seems that by default, the query is only returning the first 20 items from the query. To get any extra data, the user has to manually provide a high limit, e.g.:

query = client.search(
    collections=collections,
    bbox=[146.04, -34.30, 146.05, -34.28],
    datetime="2023-12-01/2024-02-28",
    limit=1000,
)

This isn't typical behavior for STAC loading: normally when using pystac.client() it will automatically follow "next" page links to provide the user with all datasets matching their query - the user definitely isn't limited to a tiny amount like 20.

It looks to me that Explorer might be using the DEFAULT_PAGE_SIZE of 20 to define the absolute limit of datasets returned. This doesn't appear to follow the correct STAC API approach (see Slack conversation here and STAC API docs here). I can see this line which seems like it might the source of the issue - it seems to use DEFAULT_PAGE_SIZE if no limit is provided:

datacube-explorer/cubedash/_stac.py

Line 433 in 3cdcf98

limit = request_args.get("limit", default=DEFAULT_PAGE_SIZE, type=int)

As it is, I think the current functionality is confusing to our users - they will naturally expect to get back all items matching their query (at least up to some sensibly high limit, definitely not 20), and only getting back half the time series is pretty unexpected.

The text was updated successfully, but these errors were encountered:

robbibt · 2024-02-16T01:25:26Z

For reference, doing a similar search on either RadiantEarth or Microsoft Planetary Computer's STAC APIs sucessfully returns all relevant datasets with no restrictive limit:

import pystac_client, odc.stac

catalogue = "https://planetarycomputer.microsoft.com/api/stac/v1"
catalogue = "https://earth-search.aws.element84.com/v1"

client = pystac_client.Client.open(catalogue)

# Search for items in the collection
collections = ["sentinel-2-l2a"]
query = client.search(
    collections=collections,
    bbox=[146.04, -34.30, 146.05, -34.28],
    datetime="2023-12-01/2024-02-28",
)

# Search the STAC catalog for all items matching the query
[i.properties["datetime"] for i in query.get_items()]

robbibt assigned Ariana-B Feb 16, 2024

Ariana-B linked a pull request Mar 5, 2024 that will close this issue

Fix pagination with pystac-client >= 0.7.4 #578

Merged

Ariana-B closed this as completed in #578 Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explorer STAC API search issue: only returning max of 20 items #575

Explorer STAC API search issue: only returning max of 20 items #575

robbibt commented Feb 16, 2024 •

edited

Loading

robbibt commented Feb 16, 2024 •

edited

Loading

Explorer STAC API search issue: only returning max of 20 items #575

Explorer STAC API search issue: only returning max of 20 items #575

Comments

robbibt commented Feb 16, 2024 • edited Loading

robbibt commented Feb 16, 2024 • edited Loading

robbibt commented Feb 16, 2024 •

edited

Loading

robbibt commented Feb 16, 2024 •

edited

Loading