Skip to content

Commit

Permalink
Collection search (#735)
Browse files Browse the repository at this point in the history
* create BaseSearch class

* add collection search functionality

* update tests

* replace Z with +00:00 in datetime strings for Python 3.10

* move ItemSearch back to search.py

* reject extra args if client-side filter

* fix matched method

* fix error for collection search support check

* moar tests!

* add warning expectation to test

* add collection search functionality to cli

* add collection search examples to quickstart

* quote search tokens with special characters

* add collection search example to intro notebook

* add CollectionSearch entry to api docs

* clean up client docs a bit

* add collection_list_as_dict method

* add CollectionSearch section to usage docs

* fix lint error

* reinstate item_search.py

* clean up warnings, tests

* address review comments

* improve matched logic

* actually clean up matched logic

* update changelog
  • Loading branch information
hrodmn authored Oct 15, 2024
1 parent 44aa3a5 commit 3fe2670
Show file tree
Hide file tree
Showing 34 changed files with 25,741 additions and 595 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -620,3 +620,4 @@ $RECYCLE.BIN/
# Windows shortcuts
*.lnk

uv.lock
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]

### Added

- Support for collection search via `CollectionSearch` class and associated client methods [#735](https://github.com/stac-utils/pystac-client/pull/735)

### Removed

- Python 3.9 support [#724](https://github.com/stac-utils/pystac-client/pull/724)
Expand Down
11 changes: 11 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,16 @@ endpoint, if supported.
:members:
:undoc-members:

Collection Search
-----------------

The `CollectionSearch` class represents a search of collections in a STAC API.

.. autoclass:: pystac_client.CollectionSearch
:members:
:undoc-members:
:member-order: bysource

Item Search
-----------

Expand All @@ -39,6 +49,7 @@ The `ItemSearch` class represents a search of a STAC API.
:undoc-members:
:member-order: bysource


STAC API IO
-----------

Expand Down
54 changes: 52 additions & 2 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ Python library.
CLI
~~~

Use the CLI to quickly make searches and output or save the results.
Use the CLI to quickly make item- or collection-level searches and
output or save the results.

The ``--matched`` switch performs a search with limit=1 so does not get
any Items, but gets the total number of matches which will be output to
Expand All @@ -18,6 +19,15 @@ the screen (if supported by the STAC API).
$ stac-client search https://earth-search.aws.element84.com/v1 -c sentinel-2-l2a --bbox -72.5 40.5 -72 41 --matched
3141 items matched
The ``--matched`` flag can also be used for collection search to get
the total number of collections that match your search terms.


.. code-block:: console
$ stac-client collections https://emc.spacebel.be --q sentinel-2 --matched
76 collections matched
If the same URL is to be used over and over, define an environment
variable to be used in the CLI call:

Expand Down Expand Up @@ -87,6 +97,26 @@ than once to use additional operators.
$ stac-client search ${STAC_API_URL} -c sentinel-2-l2a --bbox -72.5 40.5 -72 41 --datetime 2020-01-01/2020-01-31 --query "eo:cloud_cover<10" "eo:cloud_cover>5" --matched
4 items matched
Collection searches can also use multiple filters like this example
search for collections that include the term ``"biomass"`` and have
a spatial extent that intersects Scandinavia.

.. code-block:: console
$ stac-client collections https://emc.spacebel.be --q biomass --bbox 0.09 54.72 33.31 71.36 --matched
43 items matched
Since most STAC APIs have not yet implemented the `collection search
extension <https://github.com/stac-api-extensions/collection-search>`_,
``pystac-client`` will perform a limited client-side
filter on the full list of collections using only the ``bbox``,
``datetime``, and ``q`` (free-text search) parameters.
In the case that the STAC API does not support collection search, a
warning will be displayed to inform you that the filter is being
applied client-side.


Python
~~~~~~

Expand All @@ -99,7 +129,7 @@ specific STAC API (use the root URL):
client = Client.open("https://earth-search.aws.element84.com/v1")
Create a search:
Create an item-level search:

.. code-block:: python
Expand All @@ -125,3 +155,23 @@ The ``ItemCollection`` can then be saved as a GeoJSON FeatureCollection.
item_collection = search.item_collection()
item_collection.save_object('my_itemcollection.json')
Create a collection-level search:

.. code-block:: python
collection_search = client.collection_search(
q='"sentinel-2" OR "sentinel-1"',
)
print(f"{collection_search.matched()} collections found")
The ``collections()`` iterator method can be used to iterate through all
resulting collections.

.. code-block:: python
for collection in collection_search.collections():
print(collection.id)
50 changes: 47 additions & 3 deletions docs/tutorials/pystac-client-introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,9 @@
"cell_type": "code",
"execution_count": null,
"id": "98942e75",
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# STAC API root URL\n",
Expand Down Expand Up @@ -74,6 +76,48 @@
" print(collection)"
]
},
{
"cell_type": "markdown",
"id": "ebab2724-cab3-4fba-b25b-fdfb4e537014",
"metadata": {},
"source": [
"# Collection Search\n",
"\n",
"Sometimes, it can be challenging to identify which collection you want to work with. The `collection_search` method allows you to discover collections by applying search filters that will help you find the specific collection(s) you need. Since many STAC APIs have not implemented the collection search extension, `pystac-client` will perform a limited client-side filter if the API does not conform to the collection search spec."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a23a53ec-5b5f-421d-9f0e-01dbde8c3697",
"metadata": {},
"outputs": [],
"source": [
"collection_search = cat.collection_search(\n",
" q=\"ASTER\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "90b3d014-9c8f-4c5b-a94e-bfb7f17380ad",
"metadata": {},
"source": [
"The `collections` method lets you iterate through the results of the search so you can inspect the details of matching collections."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "006f13fd-5e58-4f3f-bd5a-707cd830caa1",
"metadata": {},
"outputs": [],
"source": [
"for result in collection_search.collections():\n",
" print(result.id, f\"{collection.description}\", sep=\"\\n\")\n",
" print(\"\\n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -233,7 +277,7 @@
"hash": "6b6313dbab648ff537330b996f33bf845c0da10ea77ae70864d6ca8e2699c7ea"
},
"kernelspec": {
"display_name": "Python 3.9.11 ('.venv': venv)",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -247,7 +291,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.11"
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
114 changes: 106 additions & 8 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -229,10 +229,10 @@ creating your :class:`Client<pystac_client.Client>`.
CollectionClient
++++++++++++++++

STAC APIs may optionally implement a ``/collections`` endpoint as describe in the
STAC APIs may optionally implement a ``/collections`` endpoint as described in the
`STAC API - Collections spec
<https://github.com/radiantearth/stac-api-spec/tree/master/collections>`__. This endpoint
allows clients to search or inspect items within a particular collection.
<https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/ogcapi-features#stac-api---collections>`__.
This endpoint allows clients to search or inspect items within a particular collection.

.. code-block:: python
Expand All @@ -245,7 +245,7 @@ allows clients to search or inspect items within a particular collection.
PySTAC will get items by iterating through all children until it gets to an ``item`` link.
PySTAC client will use the API endpoint instead: `/collections/<collection_id>/items`
(as long as `STAC API - Item Search spec
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search>`__ is supported).
<https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/item-search>`__ is supported).

.. code-block:: python
Expand All @@ -254,15 +254,113 @@ PySTAC client will use the API endpoint instead: `/collections/<collection_id>/i
Note that calling list on this iterator will take a really long time since it will be retrieving
every itme for the whole ``"sentinel-2-l2a"`` collection.

CollectionSearch
++++++++++++++++

STAC API services may optionally implement a ``/collections`` endpoint as described in the
`STAC API - Collections spec
<https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/ogcapi-features#stac-api---collections>`__.
The ``/collections`` endpoint can be extended with the
`STAC API - Collection Search Extension <https://github.com/stac-api-extensions/collection-search>`__
which adds the capability to apply filter parameters to the collection-level metadata.
See the `Query Parameters and Fields
<https://github.com/stac-api-extensions/collection-search?tab=readme-ov-file#query-parameters-and-fields>`__
from that spec for details on the meaning of each parameter.

The :meth:`pystac_client.Client.collection_search` method provides an interface for making
requests to a service's "collections" endpoint. This method returns a
:class:`pystac_client.CollectionSearch` instance.

.. code-block:: python
>>> from pystac_client import Client
>>> catalog = Client.open('https://planetarycomputer.microsoft.com/api/stac/v1')
>>> results = catalog.collection_search(
... q="biomass",
... datetime="2022/.."
... )
Instances of :class:`~pystac_client.CollectionSearch` have a handful of methods for
getting matching collections as Python objects. The right method to use depends on
how many of the matches you want to consume (a single collection at a time, a
page at a time, or everything) and whether you want plain Python dictionaries
representing the collections, or :class:`pystac.Collection` objects.

The following table shows the :class:`~pystac_client.CollectionSearch` methods for fetching
matches, according to which set of matches to return and whether to return them as
``pystac`` objects or plain dictionaries.

====================== ======================================================= ===============================================================
Matches to return PySTAC objects Plain dictionaries
====================== ======================================================= ===============================================================
**Single collections** :meth:`~pystac_client.CollectionSearch.collections` :meth:`~pystac_client.CollectionSearch.collections_as_dicts`
**Pages** :meth:`~pystac_client.CollectionSearch.pages` :meth:`~pystac_client.CollectionSearch.pages_as_dicts`
**Everything** :meth:`~pystac_client.CollectionSearch.collection_list` :meth:`~pystac_client.CollectionSearch.collection_list_as_dict`
====================== ======================================================= ===============================================================

Additionally, the ``matched`` method can be used to access result metadata about
how many total items matched the query:

* :meth:`CollectionSearch.matched <pystac_client.CollectionSearch.matched>`: returns the number
of hits (collections) for this search. If the API supports the STAC API Context Extension this
value will be returned directly from a search result with ``limit=1``. Otherwise ``pystac-client``
will count the results and return a value with an associated warning.

.. code-block:: python
>>> for collection in results.collections():
... print(item.id)
fia
modis-13Q1-061
modis-13A1-061
sentinel-3-olci-lfr-l2-netcdf
The :meth:`~pystac_client.CollectionSearch.collections` and related methods handle retrieval of
successive pages of results
by finding links with a ``"rel"`` type of ``"next"`` and parsing them to construct the
next request. The default
implementation of this ``"next"`` link parsing assumes that the link follows the spec for
an extended STAC link as
described in the
`STAC API - Collections: Collection Paging <https://github.com/radiantearth/stac-api-spec/blob/main/ogcapi-features/README.md#collection-pagination>`__
section.

Alternatively, the Collections can be returned as a list, where each
list is one page of results retrieved from search:

.. code-block:: python
>>> for page in results.pages():
... for collection in page.collections():
... print(collection.id)
fia
modis-13Q1-061
modis-13A1-061
sentinel-3-olci-lfr-l2-netcdf
If you do not need the :class:`pystac.Collection` instances, you can instead use
:meth:`CollectionSearch.collections_as_dicts <pystac_client.CollectionSearch.collections_as_dicts>`
to retrieve dictionary representation of the collections, without incurring the cost of
creating the Collection objects.

.. code-block:: python
>>> for collection_dict in results.collections_as_dicts():
... print(collection_dict["id"])
fia
modis-13Q1-061
modis-13A1-061
sentinel-3-olci-lfr-l2-netcdf
ItemSearch
++++++++++

STAC API services may optionally implement a ``/search`` endpoint as describe in the
`STAC API - Item Search spec
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search>`__. This
<https://github.com/radiantearth/stac-api-spec/tree/main/item-search`__. This
endpoint allows clients to query STAC Items across the entire service using a variety
of filter parameters. See the `Query Parameter Table
<https://github.com/radiantearth/stac-api-spec/tree/master/item-search#query-parameter-table>`__
<https://github.com/radiantearth/stac-api-spec/tree/main/item-search#query-parameter-table>`__
from that spec for details on the meaning of each parameter.

The :meth:`pystac_client.Client.search` method provides an interface for making
Expand All @@ -280,10 +378,10 @@ requests to a service's "search" endpoint. This method returns a
... )
Instances of :class:`~pystac_client.ItemSearch` have a handful of methods for
getting matching items into Python objects. The right method to use depends on
getting matching items as Python objects. The right method to use depends on
how many of the matches you want to consume (a single item at a time, a
page at a time, or everything) and whether you want plain Python dictionaries
representing the items, or proper ``pystac`` objects.
representing the items, or :class:`pystac.Item` objects.

The following table shows the :class:`~pystac_client.ItemSearch` methods for fetching
matches, according to which set of matches to return and whether to return them as
Expand Down
2 changes: 2 additions & 0 deletions pystac_client/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
__all__ = [
"Client",
"CollectionClient",
"CollectionSearch",
"ConformanceClasses",
"ItemSearch",
"Modifiable",
Expand All @@ -10,6 +11,7 @@
from pystac_client._utils import Modifiable
from pystac_client.client import Client
from pystac_client.collection_client import CollectionClient
from pystac_client.collection_search import CollectionSearch
from pystac_client.conformance import ConformanceClasses
from pystac_client.item_search import ItemSearch
from pystac_client.version import __version__
Loading

0 comments on commit 3fe2670

Please sign in to comment.