Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection search #735

Merged
merged 25 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -620,3 +620,4 @@ $RECYCLE.BIN/
# Windows shortcuts
*.lnk

uv.lock
11 changes: 11 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,16 @@ endpoint, if supported.
:members:
:undoc-members:

Collection Search
-----------------

The `CollectionSearch` class represents a search of collections in a STAC API.

.. autoclass:: pystac_client.CollectionSearch
:members:
:undoc-members:
:member-order: bysource

Item Search
-----------

Expand All @@ -39,6 +49,7 @@ The `ItemSearch` class represents a search of a STAC API.
:undoc-members:
:member-order: bysource


STAC API IO
-----------

Expand Down
54 changes: 52 additions & 2 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ Python library.
CLI
~~~

Use the CLI to quickly make searches and output or save the results.
Use the CLI to quickly make item- or collection-level searches and
output or save the results.

The ``--matched`` switch performs a search with limit=1 so does not get
any Items, but gets the total number of matches which will be output to
Expand All @@ -18,6 +19,15 @@ the screen (if supported by the STAC API).
$ stac-client search https://earth-search.aws.element84.com/v1 -c sentinel-2-l2a --bbox -72.5 40.5 -72 41 --matched
3141 items matched

The ``--matched`` flag can also be used for collection search to get
the total number of collections that match your search terms.


.. code-block:: console

$ stac-client collections https://emc.spacebel.be --q sentinel-2 --matched
76 collections matched

If the same URL is to be used over and over, define an environment
variable to be used in the CLI call:

Expand Down Expand Up @@ -87,6 +97,26 @@ than once to use additional operators.
$ stac-client search ${STAC_API_URL} -c sentinel-2-l2a --bbox -72.5 40.5 -72 41 --datetime 2020-01-01/2020-01-31 --query "eo:cloud_cover<10" "eo:cloud_cover>5" --matched
4 items matched


Collection searches can also use multiple filters like this example
search for collections that include the term ``"biomass"`` and have
a spatial extent that intersects Scandinavia.

.. code-block:: console

$ stac-client collections https://emc.spacebel.be --q biomass --bbox 0.09 54.72 33.31 71.36 --matched
43 items matched

Since most STAC APIs have not yet implemented the `collection search
extension <https://github.com/stac-api-extensions/collection-search>`_,
``pystac-client`` will perform a limited client-side
filter on the full list of collections using only the ``bbox``,
``datetime``, and ``q`` (free-text search) parameters.
In the case that the STAC API does not support collection search, a
warning will be displayed to inform you that the filter is being
applied client-side.


Python
~~~~~~

Expand All @@ -99,7 +129,7 @@ specific STAC API (use the root URL):

client = Client.open("https://earth-search.aws.element84.com/v1")

Create a search:
Create an item-level search:

.. code-block:: python

Expand All @@ -125,3 +155,23 @@ The ``ItemCollection`` can then be saved as a GeoJSON FeatureCollection.

item_collection = search.item_collection()
item_collection.save_object('my_itemcollection.json')


Create a collection-level search:

.. code-block:: python

collection_search = client.collection_search(
q='"sentinel-2" OR "sentinel-1"',
)
print(f"{collection_search.matched()} collections found")


The ``collections()`` iterator method can be used to iterate through all
resulting collections.

.. code-block:: python

for collection in collection_search.collections():
print(collection.id)

50 changes: 47 additions & 3 deletions docs/tutorials/pystac-client-introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,9 @@
"cell_type": "code",
"execution_count": null,
"id": "98942e75",
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# STAC API root URL\n",
Expand Down Expand Up @@ -74,6 +76,48 @@
" print(collection)"
]
},
{
"cell_type": "markdown",
"id": "ebab2724-cab3-4fba-b25b-fdfb4e537014",
"metadata": {},
"source": [
"# Collection Search\n",
"\n",
"Sometimes, it can be challenging to identify which collection you want to work with. The `collection_search` method allows you to discover collections by applying search filters that will help you find the specific collection(s) you need. Since many STAC APIs have not implemented the collection search extension, `pystac-client` will perform a limited client-side filter if the API does not conform to the collection search spec."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a23a53ec-5b5f-421d-9f0e-01dbde8c3697",
"metadata": {},
"outputs": [],
"source": [
"collection_search = cat.collection_search(\n",
" q=\"ASTER\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "90b3d014-9c8f-4c5b-a94e-bfb7f17380ad",
"metadata": {},
"source": [
"The `collections` method lets you iterate through the results of the search so you can inspect the details of matching collections."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "006f13fd-5e58-4f3f-bd5a-707cd830caa1",
"metadata": {},
"outputs": [],
"source": [
"for result in collection_search.collections():\n",
" print(result.id, f\"{collection.description}\", sep=\"\\n\")\n",
" print(\"\\n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -233,7 +277,7 @@
"hash": "6b6313dbab648ff537330b996f33bf845c0da10ea77ae70864d6ca8e2699c7ea"
},
"kernelspec": {
"display_name": "Python 3.9.11 ('.venv': venv)",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -247,7 +291,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.11"
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
114 changes: 106 additions & 8 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -229,10 +229,10 @@ creating your :class:`Client<pystac_client.Client>`.
CollectionClient
++++++++++++++++

STAC APIs may optionally implement a ``/collections`` endpoint as describe in the
STAC APIs may optionally implement a ``/collections`` endpoint as described in the
`STAC API - Collections spec
<https://github.com/radiantearth/stac-api-spec/tree/master/collections>`__. This endpoint
allows clients to search or inspect items within a particular collection.
<https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/ogcapi-features#stac-api---collections>`__.
This endpoint allows clients to search or inspect items within a particular collection.

.. code-block:: python

Expand All @@ -245,7 +245,7 @@ allows clients to search or inspect items within a particular collection.
PySTAC will get items by iterating through all children until it gets to an ``item`` link.
PySTAC client will use the API endpoint instead: `/collections/<collection_id>/items`
(as long as `STAC API - Item Search spec
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search>`__ is supported).
<https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/item-search>`__ is supported).

.. code-block:: python

Expand All @@ -254,15 +254,113 @@ PySTAC client will use the API endpoint instead: `/collections/<collection_id>/i
Note that calling list on this iterator will take a really long time since it will be retrieving
every itme for the whole ``"sentinel-2-l2a"`` collection.

CollectionSearch
++++++++++++++++

STAC API services may optionally implement a ``/collections`` endpoint as described in the
`STAC API - Collections spec
<https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/ogcapi-features#stac-api---collections>`__.
The ``/collections`` endpoint can be extended with the
`STAC API - Collection Search Extension <https://github.com/stac-api-extensions/collection-search>`__
which adds the capability to apply filter parameters to the collection-level metadata.
See the `Query Parameters and Fields
<https://github.com/stac-api-extensions/collection-search?tab=readme-ov-file#query-parameters-and-fields>`__
from that spec for details on the meaning of each parameter.

The :meth:`pystac_client.Client.collection_search` method provides an interface for making
requests to a service's "collections" endpoint. This method returns a
:class:`pystac_client.CollectionSearch` instance.

.. code-block:: python

>>> from pystac_client import Client
>>> catalog = Client.open('https://planetarycomputer.microsoft.com/api/stac/v1')
>>> results = catalog.collection_search(
... q="biomass",
... datetime="2022/.."
... )

Instances of :class:`~pystac_client.CollectionSearch` have a handful of methods for
getting matching collections as Python objects. The right method to use depends on
how many of the matches you want to consume (a single collection at a time, a
page at a time, or everything) and whether you want plain Python dictionaries
representing the collections, or :class:`pystac.Collection` objects.

The following table shows the :class:`~pystac_client.CollectionSearch` methods for fetching
matches, according to which set of matches to return and whether to return them as
``pystac`` objects or plain dictionaries.

====================== ======================================================= ===============================================================
Matches to return PySTAC objects Plain dictionaries
====================== ======================================================= ===============================================================
**Single collections** :meth:`~pystac_client.CollectionSearch.collections` :meth:`~pystac_client.CollectionSearch.collections_as_dicts`
**Pages** :meth:`~pystac_client.CollectionSearch.pages` :meth:`~pystac_client.CollectionSearch.pages_as_dicts`
**Everything** :meth:`~pystac_client.CollectionSearch.collection_list` :meth:`~pystac_client.CollectionSearch.collection_list_as_dict`
====================== ======================================================= ===============================================================

Additionally, the ``matched`` method can be used to access result metadata about
how many total items matched the query:

* :meth:`CollectionSearch.matched <pystac_client.CollectionSearch.matched>`: returns the number
of hits (collections) for this search. If the API supports the STAC API Context Extension this
value will be returned directly from a search result with ``limit=1``. Otherwise ``pystac-client``
will count the results and return a value with an associated warning.

.. code-block:: python

>>> for collection in results.collections():
... print(item.id)
fia
modis-13Q1-061
modis-13A1-061
sentinel-3-olci-lfr-l2-netcdf

The :meth:`~pystac_client.CollectionSearch.collections` and related methods handle retrieval of
successive pages of results
by finding links with a ``"rel"`` type of ``"next"`` and parsing them to construct the
next request. The default
implementation of this ``"next"`` link parsing assumes that the link follows the spec for
an extended STAC link as
described in the
`STAC API - Collections: Collection Paging <https://github.com/radiantearth/stac-api-spec/blob/main/ogcapi-features/README.md#collection-pagination>`__
section.

Alternatively, the Collections can be returned as a list, where each
list is one page of results retrieved from search:

.. code-block:: python

>>> for page in results.pages():
... for collection in page.collections():
... print(collection.id)
fia
modis-13Q1-061
modis-13A1-061
sentinel-3-olci-lfr-l2-netcdf

If you do not need the :class:`pystac.Collection` instances, you can instead use
:meth:`CollectionSearch.collections_as_dicts <pystac_client.CollectionSearch.collections_as_dicts>`
to retrieve dictionary representation of the collections, without incurring the cost of
creating the Collection objects.

.. code-block:: python

>>> for collection_dict in results.collections_as_dicts():
... print(collection_dict["id"])
fia
modis-13Q1-061
modis-13A1-061
sentinel-3-olci-lfr-l2-netcdf

ItemSearch
++++++++++

STAC API services may optionally implement a ``/search`` endpoint as describe in the
`STAC API - Item Search spec
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search>`__. This
<https://github.com/radiantearth/stac-api-spec/tree/main/item-search`__. This
endpoint allows clients to query STAC Items across the entire service using a variety
of filter parameters. See the `Query Parameter Table
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search#query-parameter-table>`__
<https://github.com/radiantearth/stac-api-spec/tree/main/item-search#query-parameter-table>`__
from that spec for details on the meaning of each parameter.

The :meth:`pystac_client.Client.search` method provides an interface for making
Expand All @@ -280,10 +378,10 @@ requests to a service's "search" endpoint. This method returns a
... )

Instances of :class:`~pystac_client.ItemSearch` have a handful of methods for
getting matching items into Python objects. The right method to use depends on
getting matching items as Python objects. The right method to use depends on
how many of the matches you want to consume (a single item at a time, a
page at a time, or everything) and whether you want plain Python dictionaries
representing the items, or proper ``pystac`` objects.
representing the items, or :class:`pystac.Item` objects.

The following table shows the :class:`~pystac_client.ItemSearch` methods for fetching
matches, according to which set of matches to return and whether to return them as
Expand Down
2 changes: 2 additions & 0 deletions pystac_client/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
__all__ = [
"Client",
"CollectionClient",
"CollectionSearch",
"ConformanceClasses",
"ItemSearch",
"Modifiable",
Expand All @@ -10,6 +11,7 @@
from pystac_client._utils import Modifiable
from pystac_client.client import Client
from pystac_client.collection_client import CollectionClient
from pystac_client.collection_search import CollectionSearch
from pystac_client.conformance import ConformanceClasses
from pystac_client.item_search import ItemSearch
from pystac_client.version import __version__
Loading
Loading