You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A user on LinkedIn and @alexgleith have encountered a possible bug in our Explorer STAC search API (see link here).
If you do a super simple query of DEA's Sentinel-2 data from December 2023 to Feb 2024, you only get back data up to January 17, despite the data definitely existing:
import pystac_client, odc.stac
client = pystac_client.Client.open("https://explorer.sandbox.dea.ga.gov.au/stac")
# Search for items in the collection
collections = ["ga_s2am_ard_3", "ga_s2bm_ard_3"]
query = client.search(
collections=collections,
bbox=[146.04, -34.30, 146.05, -34.28],
datetime="2023-12-01/2024-02-28",
)
# Search the STAC catalog for all items matching the query
[i.properties["datetime"] for i in query.get_items()]
It seems that by default, the query is only returning the first 20 items from the query. To get any extra data, the user has to manually provide a high limit, e.g.:
This isn't typical behavior for STAC loading: normally when using pystac.client() it will automatically follow "next" page links to provide the user with all datasets matching their query - the user definitely isn't limited to a tiny amount like 20.
It looks to me that Explorer might be using the DEFAULT_PAGE_SIZE of 20 to define the absolute limit of datasets returned. This doesn't appear to follow the correct STAC API approach (see Slack conversation here and STAC API docs here). I can see this line which seems like it might the source of the issue - it seems to use DEFAULT_PAGE_SIZE if no limit is provided:
As it is, I think the current functionality is confusing to our users - they will naturally expect to get back all items matching their query (at least up to some sensibly high limit, definitely not 20), and only getting back half the time series is pretty unexpected.
The text was updated successfully, but these errors were encountered:
For reference, doing a similar search on either RadiantEarth or Microsoft Planetary Computer's STAC APIs sucessfully returns all relevant datasets with no restrictive limit:
import pystac_client, odc.stac
catalogue = "https://planetarycomputer.microsoft.com/api/stac/v1"
catalogue = "https://earth-search.aws.element84.com/v1"
client = pystac_client.Client.open(catalogue)
# Search for items in the collection
collections = ["sentinel-2-l2a"]
query = client.search(
collections=collections,
bbox=[146.04, -34.30, 146.05, -34.28],
datetime="2023-12-01/2024-02-28",
)
# Search the STAC catalog for all items matching the query
[i.properties["datetime"] for i in query.get_items()]
A user on LinkedIn and @alexgleith have encountered a possible bug in our Explorer STAC search API (see link here).
If you do a super simple query of DEA's Sentinel-2 data from December 2023 to Feb 2024, you only get back data up to January 17, despite the data definitely existing:
It seems that by default, the query is only returning the first 20 items from the query. To get any extra data, the user has to manually provide a high limit, e.g.:
This isn't typical behavior for STAC loading: normally when using
pystac.client()
it will automatically follow "next" page links to provide the user with all datasets matching their query - the user definitely isn't limited to a tiny amount like 20.It looks to me that Explorer might be using the
DEFAULT_PAGE_SIZE
of 20 to define the absolute limit of datasets returned. This doesn't appear to follow the correct STAC API approach (see Slack conversation here and STAC API docs here). I can see this line which seems like it might the source of the issue - it seems to useDEFAULT_PAGE_SIZE
if no limit is provided:datacube-explorer/cubedash/_stac.py
Line 433 in 3cdcf98
As it is, I think the current functionality is confusing to our users - they will naturally expect to get back all items matching their query (at least up to some sensibly high limit, definitely not 20), and only getting back half the time series is pretty unexpected.
The text was updated successfully, but these errors were encountered: