Add Xarray sub-package #1013

vincentsarago · 2024-10-29T21:18:25Z

overtake #1007

To Do

reviews
add Docs

src/titiler/xarray/titiler/xarray/main.py

vincentsarago · 2024-10-29T21:21:23Z

src/titiler/xarray/titiler/xarray/io.py

+    if "x" not in da.dims and "y" not in da.dims:
+        try:
+            latitude_var_name = next(
+                x for x in ["lat", "latitude", "LAT", "LATITUDE", "Lat"] if x in da.dims


do we need to support other variable name?

I'd say no. Dataset with other names will likely not be regular lat/lon grids and fail in other ways.

src/titiler/xarray/titiler/xarray/dependencies.py

vincentsarago · 2024-10-29T21:23:39Z

src/titiler/xarray/titiler/xarray/factory.py

+
+
+@define(kw_only=True)
+class TilerFactory(BaseTilerFactory):


By sub-classing titiler.core.factory.TilerFactory we avoid re-writing code

But do we need to redefine all of the class attributes (e.g. stats_dependency) that are already defined in titiler.core.factory.TilerFactory?

vincentsarago · 2024-10-29T21:24:10Z

src/titiler/xarray/titiler/xarray/factory.py

+
+    # remove some attribute from init
+    img_preview_dependency: Type[DefaultDependency] = field(init=False)
+    add_preview: bool = field(init=False)


we remove those 2 attributes because we don't support /preview endpoints

src/titiler/xarray/pyproject.toml

vincentsarago · 2024-10-29T21:26:46Z

src/titiler/xarray/titiler/xarray/factory.py

+            return Response(content, media_type=media_type)
+
+    # custom /statistics endpoints (remove /statistics - GET)
+    def statistics(self):


☝️ IMO having a full dataset /statistics in a bit dangerous (as for the /preview endpoints) which is why we support only geojson statistics

maxrjones

Thanks for all your work here, @vincentsarago!

This is a very opinionated take, but I think titiler-xarray would be best off with two separate routes, each with its own set of optional dependencies. The first route would be zarr, which would open Zarr and virtual Zarr datasets using xarray.open_zarr. The second route would be md, which would opening any dataset readable by xarray.open_dataset.

The primary reason I think we should do this is that it would enable us to incentivize virtualizing datasets into zarr, which would lead to much faster tile generation. We could do this by:

Having all query parameters in the zarr route only relevant for open_zarr, simplifying API usage.
Automatically detect virtual datasets, removing the need for the reference parameter.
Lightening the image size for titiler-xarray deployments only using zarr because other readers would not be installed (and eventually obstore and/or icechunk could be used instead of the fsspec dependencies)

This would also simplify non-zarr usage for the following reasons:

Zarr specific parameters (e.g., group, consolidated) would not be included in endpoints in the md route
We could use xarray's automatic backend detection rather than writing our own in titiler/xarray/io.py

I also think isolating Zarr usage would simplify the eventual support of the GeoZarr and multiscales specifications.

src/titiler/xarray/tests/fixtures/pyramid.zarr/.zgroup

vincentsarago · 2024-10-31T09:03:07Z

Thanks @maxrjones 🙏

I see what you're saying. The goal of having a single Reader was to handle all the non-COG dataset so splitting in to two separate reader/set of endpoints would not meat the goal.

This is a very opinionated take, but I think titiler-xarray would be best off with two separate routes, each with its own set of optional dependencies. The first route would be zarr, which would open Zarr and virtual Zarr datasets using xarray.open_zarr. The second route would be md, which would opening any dataset readable by xarray.open_dataset.

We can absolutely use xarray.open_zarr instead of xarray.open_dataset here when reading a zarr

We could use xarray's automatic backend detection rather than writing our own in titiler/xarray/io.py

How so? https://github.com/developmentseed/titiler/pull/1013/files#diff-dd6fab5d1e55a1d860ff8bd2190f145f2574100f734d382cee56c48bd7a7f1f5R43-R49 ?

If I follow your think, it seems we would need a titiler.multidim and a titiler.zarr packages 🤷

What if we make the dependencies optional? I'm going to open a PR on top of this one to try some things

hrodmn

This is great! The concept of creating pyramids in a zarr store was new to me, then I googled around and found @maxrjones's notebook 😆.

It is great to have the io methods standardized here so we can import them in titiler.cmr and other applications.

src/titiler/xarray/tests/test_factory.py

src/titiler/xarray/titiler/xarray/io.py

Co-authored-by: Henry Rodman <[email protected]>

* use xarray.open_zarr and make aiohttp and s3fs optional * add support for references * tests prefixed protocol * use tmp_dir for reference * add parquet support * remove kerchunk support

…o feature/add-xarray-package

vincentsarago · 2024-11-04T22:00:22Z

src/titiler/xarray/titiler/xarray/extensions.py

+        ):
+            """return available variables."""
+            with self.dataset_opener(src_path, **io_params.as_dict()) as ds:
+                return list(ds.data_vars)  # type: ignore


easier to put this into an extension than to have a list_variable in a Class Method (because in the class method the dataset_opener is customizable

vincentsarago · 2024-11-04T22:01:59Z

src/titiler/xarray/titiler/xarray/io.py

+        ...
+
+
+def xarray_open_dataset(  # noqa: C901


I've moved everything within the xarray_open_dataset function to ease the customization and also because I've made som dependencies optional

so this is a fine default option for some applications, but it is easier to customize now because all the user has to do is write a new dataset_opener - nice!

vincentsarago · 2024-11-04T22:04:48Z

src/titiler/xarray/titiler/xarray/io.py

+    variable: str = attr.ib()
+
+    # xarray.Dataset options
+    opener: Callable[..., xarray.Dataset] = attr.ib(default=xarray_open_dataset)


For now the opener MUST be a callable that take 4 arguments:

src_path: str

group: Any

decode_times: bool

~~cache_client~~

we might change this in the future

* remove cache layer * Update src/titiler/xarray/README.md Co-authored-by: Aimee Barciauskas <[email protected]> * add tile example --------- Co-authored-by: Aimee Barciauskas <[email protected]>

hrodmn

This all looks great, thanks for all of your work moving xarray support further upstream! I left a few small comments but nothing blocking.

hrodmn · 2024-11-05T17:29:15Z

src/titiler/xarray/titiler/xarray/io.py

+        ...
+
+
+def xarray_open_dataset(  # noqa: C901


so this is a fine default option for some applications, but it is easier to customize now because all the user has to do is write a new dataset_opener - nice!

hrodmn · 2024-11-05T17:45:46Z

src/titiler/xarray/titiler/xarray/dependencies.py

+            description="RasterIO resampling algorithm. Defaults to `nearest`.",
+        ),
+    ] = None


Does the default behavior get set somewhere else or do we need to set it to something besides None here?

it defaults to what rio-tiler has (nearest)

hrodmn · 2024-11-05T17:49:58Z

src/titiler/xarray/titiler/xarray/factory.py

+    dataset_dependency: Type[DefaultDependency] = DatasetParams
+
+    # Tile/Tilejson/WMTS Dependencies  (Not used in titiler.xarray)
+    tile_dependency: Type[DefaultDependency] = DefaultDependency


should this default to TileParams instead of DefaultDependency?

no because there is not buffer nor padding for the XarrayReader's tile method

hrodmn · 2024-11-05T17:51:09Z

src/titiler/xarray/titiler/xarray/factory.py

+
+
+@define(kw_only=True)
+class TilerFactory(BaseTilerFactory):


But do we need to redefine all of the class attributes (e.g. stats_dependency) that are already defined in titiler.core.factory.TilerFactory?

vincentsarago · 2024-11-05T19:01:40Z

But do we need to redefine all of the class attributes (e.g. stats_dependency) that are already defined in titiler.core.factory.TilerFactory?

@hrodmn for most of them we don't have to

j08lue

Reviewed the io logic, please see comments below. Nothing blocking.

src/titiler/xarray/titiler/xarray/io.py

j08lue · 2024-11-06T10:32:38Z

src/titiler/xarray/titiler/xarray/io.py

+    if "x" not in da.dims and "y" not in da.dims:
+        try:
+            latitude_var_name = next(
+                x for x in ["lat", "latitude", "LAT", "LATITUDE", "Lat"] if x in da.dims


I'd say no. Dataset with other names will likely not be regular lat/lon grids and fail in other ways.

src/titiler/xarray/titiler/xarray/io.py

Co-authored-by: Jonas <[email protected]>

…o feature/add-xarray-package

maxrjones

I'm a bit confused by the pyramid and reference components of the code. Is the current plan to include those in this PR or save them for later development? fwiw I would suggest the latter since there's been movement in GeoZarr multiscales and virtualizarr/icechunk since the original extension was developed

src/titiler/xarray/pyproject.toml

Co-authored-by: Max Jones <[email protected]>

vincentsarago · 2024-11-06T23:10:55Z

@maxrjones we've removed the reference part but kept the group option. There is no pyramid support per-say just some tests fixtures which is used to test the group option

vincentsarago · 2024-11-07T09:23:09Z

src/titiler/xarray/titiler/xarray/io.py

+                    "consolidated": False,
+                    "backend_kwargs": {"consolidated": False},
+                }
+            )


@maxrjones oh you were talking about this! I think we can let this go 🙏

I'll create a new issue to add kerchunk reference!

sketch

13351bb

vincentsarago commented Oct 29, 2024

View reviewed changes

src/titiler/xarray/titiler/xarray/main.py Outdated Show resolved Hide resolved

vincentsarago commented Oct 29, 2024

View reviewed changes

src/titiler/xarray/titiler/xarray/dependencies.py Outdated Show resolved Hide resolved

vincentsarago commented Oct 29, 2024

View reviewed changes

src/titiler/xarray/pyproject.toml Show resolved Hide resolved

vincentsarago commented Oct 29, 2024

View reviewed changes

This comment was marked as outdated.

Sign in to view

vincentsarago added 2 commits October 30, 2024 13:18

add tests

faaaa3e

add pyramid tests

d9ea7d2

This comment was marked as resolved.

Sign in to view

remove multiscale option

80f3350

vincentsarago marked this pull request as ready for review October 30, 2024 15:05

maxrjones reviewed Oct 30, 2024

View reviewed changes

src/titiler/xarray/tests/fixtures/pyramid.zarr/.zgroup Show resolved Hide resolved

hrodmn reviewed Oct 31, 2024

View reviewed changes

src/titiler/xarray/tests/test_factory.py Show resolved Hide resolved

src/titiler/xarray/titiler/xarray/io.py Show resolved Hide resolved

src/titiler/xarray/titiler/xarray/io.py Show resolved Hide resolved

vincentsarago and others added 4 commits October 31, 2024 18:12

Update src/titiler/xarray/tests/test_factory.py

691eeed

Co-authored-by: Henry Rodman <[email protected]>

use xarray.open_zarr and make aiohttp and s3fs optional (#1016)

d0804ec

* use xarray.open_zarr and make aiohttp and s3fs optional * add support for references * tests prefixed protocol * use tmp_dir for reference * add parquet support * remove kerchunk support

Merge branch 'main' of https://github.com/developmentseed/titiler int…

0cc3a58

…o feature/add-xarray-package

create variable extension

7312a82

vincentsarago commented Nov 4, 2024

View reviewed changes

add aiohttp

f50b400

vincentsarago commented Nov 4, 2024

View reviewed changes

remove cache layer (#1019)

df7bdf6

* remove cache layer * Update src/titiler/xarray/README.md Co-authored-by: Aimee Barciauskas <[email protected]> * add tile example --------- Co-authored-by: Aimee Barciauskas <[email protected]>

vincentsarago requested review from maxrjones, abarciauskas-bgse and hrodmn November 5, 2024 17:06

hrodmn approved these changes Nov 5, 2024

View reviewed changes

j08lue reviewed Nov 6, 2024

View reviewed changes

vincentsarago and others added 6 commits November 6, 2024 13:00

Update src/titiler/xarray/titiler/xarray/io.py

eb7aec5

Co-authored-by: Jonas <[email protected]>

Update src/titiler/xarray/titiler/xarray/io.py

7472f88

Co-authored-by: Jonas <[email protected]>

Update src/titiler/xarray/titiler/xarray/io.py

67a05ae

Co-authored-by: Jonas <[email protected]>

lint

b284019

Merge branch 'main' of https://github.com/developmentseed/titiler int…

021f609

…o feature/add-xarray-package

fix zarr pyramid tests

b69591c

vincentsarago force-pushed the feature/add-xarray-package branch from da6ce9b to b69591c Compare November 6, 2024 12:29

maxrjones reviewed Nov 6, 2024

View reviewed changes

src/titiler/xarray/pyproject.toml Show resolved Hide resolved

Update src/titiler/xarray/pyproject.toml

115cfc8

Co-authored-by: Max Jones <[email protected]>

j08lue mentioned this pull request Nov 7, 2024

Support for Google Cloud Storage URLs #1021

Closed

vincentsarago commented Nov 7, 2024

View reviewed changes

vincentsarago added 2 commits November 7, 2024 10:48

refactor dependencies

97eb648

update docs

af7bede

j08lue linked an issue Nov 7, 2024 that may be closed by this pull request

Support for Google Cloud Storage URLs #1021

Closed

vincentsarago merged commit d33d60a into main Nov 7, 2024
10 checks passed

vincentsarago deleted the feature/add-xarray-package branch November 7, 2024 10:11

hrodmn mentioned this pull request Nov 26, 2024

use factory from titiler.xarray developmentseed/titiler-xarray#72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Xarray sub-package #1013

Add Xarray sub-package #1013

vincentsarago commented Oct 29, 2024 •

edited

Loading

vincentsarago Oct 29, 2024

j08lue Nov 6, 2024

vincentsarago Oct 29, 2024

hrodmn Nov 5, 2024

vincentsarago Oct 29, 2024

vincentsarago Oct 29, 2024 •

edited

Loading

This comment was marked as outdated.

This comment was marked as resolved.

maxrjones left a comment

vincentsarago commented Oct 31, 2024

hrodmn left a comment

vincentsarago Nov 4, 2024

vincentsarago Nov 4, 2024

hrodmn Nov 5, 2024

vincentsarago Nov 4, 2024 •

edited

Loading

hrodmn left a comment

hrodmn Nov 5, 2024

hrodmn Nov 5, 2024

vincentsarago Nov 5, 2024

hrodmn Nov 5, 2024

vincentsarago Nov 5, 2024

hrodmn Nov 5, 2024

vincentsarago commented Nov 5, 2024

j08lue left a comment

j08lue Nov 6, 2024

maxrjones left a comment

vincentsarago commented Nov 6, 2024

vincentsarago Nov 7, 2024

vincentsarago Nov 7, 2024

Add Xarray sub-package #1013

Add Xarray sub-package #1013

Conversation

vincentsarago commented Oct 29, 2024 • edited Loading

To Do

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincentsarago Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

This comment was marked as outdated.

This comment was marked as resolved.

maxrjones left a comment

Choose a reason for hiding this comment

vincentsarago commented Oct 31, 2024

hrodmn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincentsarago Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

hrodmn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincentsarago commented Nov 5, 2024

j08lue left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxrjones left a comment

Choose a reason for hiding this comment

vincentsarago commented Nov 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincentsarago commented Oct 29, 2024 •

edited

Loading

vincentsarago Oct 29, 2024 •

edited

Loading

vincentsarago Nov 4, 2024 •

edited

Loading