From 5ae45d6664301bb2fb17d75b68b92ae6300a4286 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Fri, 10 Nov 2023 13:10:32 +0100 Subject: [PATCH 01/14] add design doc (wip) --- design_doc.md | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 design_doc.md diff --git a/design_doc.md b/design_doc.md new file mode 100644 index 0000000..a2d5513 --- /dev/null +++ b/design_doc.md @@ -0,0 +1,71 @@ +# XDGGS - Design document + +Xarrays extension for DGGS. Technical specifiactions. + +## Goals + +The goal of the `xddgs` library is to facilitate working with multiple Discrete Global Grid Systems (DGGSs) via a unified, high-level and user-friendly API that is deeply integrated with [Xarray](https://xarray.dev). + +Examples of common DGGS features that `xdggs` should provide or facilitate: + +- convert a DGGS from/to another grid (e.g., a DGGS, a latitude/longitude rectilinear grid, a raster grid, an unstructured mesh) +- convert a DGGS from/to vector data (points, lines, polygons) +- convert between different cell id representations of a same DGGS (e.g., uint64 vs. string) +- select data on a DGGS by cell ids or by geometries (spatial indexing) +- change DGGS resolution (upgrade or downgrade) +- re-organize cell ids (e.g., spatial shuffling / partitioning) +- plotting + +Conversion between DGGS and other grids or vector features may requires specific interpolation or regridding methods. + +`xdggs` should leverage the current recommended Xarray extension mechanisms ([apply_ufunc](https://docs.xarray.dev/en/stable/examples/apply_ufunc_vectorize_1d.html), [accessors](https://docs.xarray.dev/en/stable/internals/extending-xarray.html), [custom indexes](https://docs.xarray.dev/en/stable/internals/how-to-create-custom-index.html)) and possibly the future ones (e.g., variable encoders/decoders) to provide DGGS-specific functionality on top of Xarray core features. + +`xdggs` should facitiltate interoperability with other existing Xarray extensions (e.g., [xvec](https://github.com/xarray-contrib/xvec) for vector data or [uxarray](https://github.com/UXARRAY/uxarray) for unstructured grids). + +`xdggs` should also leverage the existing implementation of various DGGSs exposed to Python via 3rd-party libraries, here below referred as "backends". Preferrably, those backends would expose DGGS functionality in an efficient way as vectorized functions (e.g., working with NumPy arrays). + +`xdggs` should try to follow standards and/or conventions defined for DGGS (see below) but MAY depart from them for practical reasons (e.g., common practices in popular DGGS libraries). + +`xddgs` should also try to fulfill user needs in both GIS and Earth-System communities, which may each use DGGS in slightly different ways (see below). + +When possible, `xddgs` operations should scale to fine resolutions (millions of cells) leveraging Xarray interoperability with Dask. This might not be always possible, though. Some operations (spatial indexing) may be hard to support at scale, which shouldn't be considered as high-priority. + +## Non-Gloals + +`xdggs` should focus on providing the core DGGS functionality and operations that are listed above. Higher-level operations that can be implemented by combining together those core operations are out-of-scope and should be implemented in downstream libraries. + +`xddgs` should try not re-inventing the wheel and delegate to Xarray API when possible. + +`xdggs` does not implement any particular DGGS from scratch. `xdggs` does not aim at providing _all the functionality provided by each grid_ (e.g., some functionality may be very specific to one DGGS and not supported by other DGGSs, or some functionality may not be available yet in one DGGS Python backend). + +Although some DGGS may handle both the spatial and temporal domains in a joint fashion, `xdggs` focuses primarily on the spatial domain. The temporal domain is considered as orthogonal and already benefits from many core features provided by Xarray. + +## Discrete Global Grid Systems + +### Standards and Conventions + +### Backends (Python) + +## Representation of DGGS data in xdggs + +`xdggs` represents a DGGS as an Xarray Dataset or DataArray containing a 1-dimensional coordinate with cell ids as labels and with grid name & parameters as attributes. This coordinate is indexed using a custom, Xarray-compatible `DGGSIndex`. + +`xdggs` does not support a Dataset or DataArray with multiple coordinates indexed with a `DGGSIndex`. + +The cell ids in the 1-dimensional coordinate are all relative to the _exact same_ grid (i.e., same system, same parameter values and same resolution). For simplicity, `xdggs` does not support mixed-resolutions cell ids in the same coordinate. + +### DGGSIndex + +`xdggs.DGGSIndex` is the base class for all Xarray DGGS-aware indexes. It inherits from `xarray.indexes.Index` and has the following specifications: + +- It encapsulates an `xarray.indexes.PandasIndex` built from cell ids so that selection and alignment by cell id is possible +- It might also eventually encapsulate a spatial index (RTree, KDTree) to enable data selection by geometries, e.g., find nearest cell centroids, extract all cells intersecting a polygon, etc. + - Alternatively, spatial indexing might be enabled by explicit conversion of cells to vector geometries and then by reusing the Xarray spatial indexes available in [xvec](https://github.com/xarray-contrib/xvec) +- It partially implements the Xarray Index API to enable DGGS-aware alignment and selection + - Calls are most often redirected to the encapsulated `PandasIndex` + - Some pre/post checks or processing may be done, e.g., to prevent the alignment of two indexes that are not on the exact same grid. +- + +## Conversion + +## Data Selection From 3bb295cd6bf2d6a120411b5070f5188274414762 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Fri, 10 Nov 2023 14:08:13 +0100 Subject: [PATCH 02/14] wip --- design_doc.md | 57 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 49 insertions(+), 8 deletions(-) diff --git a/design_doc.md b/design_doc.md index a2d5513..387cf55 100644 --- a/design_doc.md +++ b/design_doc.md @@ -9,7 +9,7 @@ The goal of the `xddgs` library is to facilitate working with multiple Discrete Examples of common DGGS features that `xdggs` should provide or facilitate: - convert a DGGS from/to another grid (e.g., a DGGS, a latitude/longitude rectilinear grid, a raster grid, an unstructured mesh) -- convert a DGGS from/to vector data (points, lines, polygons) +- convert a DGGS from/to vector data (points, lines, polygons, envelopes) - convert between different cell id representations of a same DGGS (e.g., uint64 vs. string) - select data on a DGGS by cell ids or by geometries (spatial indexing) - change DGGS resolution (upgrade or downgrade) @@ -24,15 +24,15 @@ Conversion between DGGS and other grids or vector features may requires specific `xdggs` should also leverage the existing implementation of various DGGSs exposed to Python via 3rd-party libraries, here below referred as "backends". Preferrably, those backends would expose DGGS functionality in an efficient way as vectorized functions (e.g., working with NumPy arrays). -`xdggs` should try to follow standards and/or conventions defined for DGGS (see below) but MAY depart from them for practical reasons (e.g., common practices in popular DGGS libraries). +`xdggs` should try to follow standards and/or conventions defined for DGGS (see below). However, we may need to depart from them for practical reasons (e.g., common practices in popular DGGS libraries that do not fit well with the proposed standards). Strict adherence to a standard is welcome but shouldn't be enforced by all means. `xddgs` should also try to fulfill user needs in both GIS and Earth-System communities, which may each use DGGS in slightly different ways (see below). -When possible, `xddgs` operations should scale to fine resolutions (millions of cells) leveraging Xarray interoperability with Dask. This might not be always possible, though. Some operations (spatial indexing) may be hard to support at scale, which shouldn't be considered as high-priority. +When possible, `xddgs` operations should scale to fine resolutions (millions of cells) leveraging Xarray interoperability with Dask. This might not be always possible, though. Some operations (spatial indexing) may be hard to support at scale and it shouldn't be a high development priority. ## Non-Gloals -`xdggs` should focus on providing the core DGGS functionality and operations that are listed above. Higher-level operations that can be implemented by combining together those core operations are out-of-scope and should be implemented in downstream libraries. +`xdggs` should focus on providing the core DGGS functionality and operations that are listed above. Higher-level operations that can be implemented by combining together those core operations are out-of-scope and should be implemented in downstream libraries. Likewise, there may be many ways of resampling a grid to a DGGS ; `xdggs` should support the most common methods but not try to support _all of them_. `xddgs` should try not re-inventing the wheel and delegate to Xarray API when possible. @@ -48,11 +48,11 @@ Although some DGGS may handle both the spatial and temporal domains in a joint f ## Representation of DGGS data in xdggs -`xdggs` represents a DGGS as an Xarray Dataset or DataArray containing a 1-dimensional coordinate with cell ids as labels and with grid name & parameters as attributes. This coordinate is indexed using a custom, Xarray-compatible `DGGSIndex`. +`xdggs` represents a DGGS as an Xarray Dataset or DataArray containing a 1-dimensional coordinate with cell ids as labels and with grid name, resolution & parameters (optional) as attributes. This coordinate is indexed using a custom, Xarray-compatible `DGGSIndex`. -`xdggs` does not support a Dataset or DataArray with multiple coordinates indexed with a `DGGSIndex`. +`xdggs` does not support a Dataset or DataArray with multiple coordinates indexed with a `DGGSIndex` (only one DGGS per object is supported). -The cell ids in the 1-dimensional coordinate are all relative to the _exact same_ grid (i.e., same system, same parameter values and same resolution). For simplicity, `xdggs` does not support mixed-resolutions cell ids in the same coordinate. +The cell ids in the 1-dimensional coordinate are all relative to the _exact same_ grid, i.e., same grid system, same grid parameter values and same grid resolution! For simplicity, `xdggs` does not support cell ids of mixed-resolutions in the same coordinate. ### DGGSIndex @@ -64,8 +64,49 @@ The cell ids in the 1-dimensional coordinate are all relative to the _exact same - It partially implements the Xarray Index API to enable DGGS-aware alignment and selection - Calls are most often redirected to the encapsulated `PandasIndex` - Some pre/post checks or processing may be done, e.g., to prevent the alignment of two indexes that are not on the exact same grid. -- +- The `DGGSIndex.__init__()` constructor only requires cell ids and the name of the cell (array) dimension +- The `DGGSIndex.from_variables()` factory method parses the attributes of the given cell ids coordinates and creates the right index object (subclass) accordingly +- It declares a few abstract methods for grid-aware operations (e.g., convert between cell id and lat/lon point or geometry, etc.) + - They can be implemented in subclasses (see below) + - They are either called from within the DGGSIndex or from the `.dggs` Dataset/DataArray accessors + +Each DGGS supported in `xddgs` has its own subclass of `DGGSIndex`, e.g., + +- `HealpixIndex` for Healpix +- `H3Index` for H3 +- ... + +A DGGSIndex can be set directly from a cell ids coordinate using the Xarray API: + +```python +import xarray as xr +import xdggs + +ds = xr.Dataset( + coords={"cell": ("cell", [...], {"grid_name": "h3", "resolution": 3})} +) + +# auto-detect grid system and parameters +ds.set_xindex("cell", xdggs.DGGSIndex) + +# set the grid system and parameters manually +ds.set_xindex("cell", xdggs.H3Index, resolution=3) +``` + +The DGGSIndex is set automatically when converting a gridded or vector dataset to a DGGS dataset (see below). ## Conversion ## Data Selection + +## Plotting + +Three approaches are possible (non-mutually exclusive): + +1. convert cell data into gridded or raster data (choose grid/raster resolution depending on the resolution of the rendered figure) and then reuse existing python plotting libraries (matplotlib, cartopy) maybe through xarray plotting API +2. convert cell data into vector data and plot the latter via, e.g., [xvec](https://github.com/xarray-contrib/xvec) or [geopandas](https://github.com/geopandas/geopandas) API +3. leverage libraries that support plotting DGGS data, e.g., [lonboard](https://github.com/developmentseed/lonboard) enables interactive plotting in Jupyter via deck.gl, which has support of H3 and S2 cell data. + +The first and last approaches may be efficient in plotting large DGGS data. For approach 1, we might want to investigate using [datashader](https://github.com/holoviz/datashader) to set both the resolution and raster extent dynamically. For approach 3 (lonboard), we would only need to transfer cell ids (tokens) and cell data and then let deck.gl render the cells efficiently in the web browser using the GPU. + +Although the second approach may not scale as best as the other ones, it is versatile and may produce nice looking graphics. From 949842f95fee30a27fc654c75678fb14c76525a2 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Fri, 10 Nov 2023 15:19:59 +0100 Subject: [PATCH 03/14] wip --- design_doc.md | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/design_doc.md b/design_doc.md index 387cf55..fdf6e25 100644 --- a/design_doc.md +++ b/design_doc.md @@ -26,7 +26,7 @@ Conversion between DGGS and other grids or vector features may requires specific `xdggs` should try to follow standards and/or conventions defined for DGGS (see below). However, we may need to depart from them for practical reasons (e.g., common practices in popular DGGS libraries that do not fit well with the proposed standards). Strict adherence to a standard is welcome but shouldn't be enforced by all means. -`xddgs` should also try to fulfill user needs in both GIS and Earth-System communities, which may each use DGGS in slightly different ways (see below). +`xddgs` should also try to support applications in both GIS and Earth-System communities, which may each use DGGS in slightly different ways (see examples below). When possible, `xddgs` operations should scale to fine resolutions (millions of cells) leveraging Xarray interoperability with Dask. This might not be always possible, though. Some operations (spatial indexing) may be hard to support at scale and it shouldn't be a high development priority. @@ -42,10 +42,52 @@ Although some DGGS may handle both the spatial and temporal domains in a joint f ## Discrete Global Grid Systems +A Discrete Global Grid System (DGGS) can be roughly defined as a partitioning or tesselation of the entire Earth's surface into a finite number of "cells" or "zones". The shape and the properties of these cells generally vary from one DGGS to another. Most DGGSs are also hierarchical, i.e., the cells are aranged on recursively on multiple levels or resolutions. Follow the links in the subsection below for a more strict and detailled definition of a DGGS. + +DGGSs may be used in various ways, e.g., + +- Applications in Earth-system modelling seem to use DGGS as grids of contiguous, fixed-resolution cells covering the entire Earth's surface or a region of interest (figure 1). This makes easier the analysis of simulation outputs on large extents of the Earth's surface. DGGS may also be used as pyramid data (multiple stacked datasets at different resolutions) +- Applications in GIS often consist of using DGGS to display aggregated (vector) data as a collection of cells with a more complex spatial distribution (sparse) and sometimes with mixed resolutions (figures 2 and 3). + +![figure1](https://user-images.githubusercontent.com/4160723/281698490-31cb5ce8-64db-4bbf-a0d9-a8d6597bb2df.png) +Figure 1: DGGS data as contiguous cells of fixed resolution ([source](https://danlooo.github.io/DGGS.jl/)) + +![figure2](https://github.com/benbovy/xdggs/assets/4160723/430fd646-220a-4027-8212-1d927bb339ba) + +Figure 2: Data aggreated on DGGS (H3) sparsely distributed cells of fixed resolution ([source](https://medium.com/@jesse.b.nestler/how-to-convert-h3-cell-boundaries-to-shapely-polygons-in-python-f7558add2f63)). + +![image](https://github.com/benbovy/xdggs/assets/4160723/f2e4ec02-d88e-475e-9067-e93cf185923e) + +Figure 3: Raster data converted as DGGS (H3) cells of mixed resolutions ([source](https://github.com/nmandery/h3ronpy)). + ### Standards and Conventions +There no released standard yet regarding DGGS. However, there is a group working on a draft of OGC API for DGGS: https://github.com/opengeospatial/ogcapi-discrete-global-grid-systems. + +Another draft of DGGS specification can be found here: https://github.com/danlooo/dggs-data-spec. + ### Backends (Python) +Several Python packages are currently available for handling certain DGGSs. They mostly consist of Python bindings of DGGS implementations written in C/C++/Rust. Here is a list (probably incomplete): + +- [healpy](https://github.com/healpy/healpy): Python bindings of [HealPix](https://healpix.sourceforge.io/) + - mostly vectorized +- [rhealpixdggs-py](https://github.com/manaakiwhenua/rhealpixdggs-py): Python/Numpy implementation of rHEALPix +- [h3-py](https://github.com/uber/h3-py): "official" Python bindings of [H3](https://h3geo.org/) + - experimental and incomplete vectorized version of H3's API (removed in the forthcoming v4 release?) +- [h3pandas](https://github.com/DahnJ/H3-Pandas): integration of h3-py (non-vectorized) with pandas and geopandas +- [h3ronpy](https://github.com/nmandery/h3ronpy): Python bindings of [h3o](https://github.com/HydroniumLabs/h3o) (Rust implementation of H3) + - provides high-level features (conversion, etc.) working with arrow, numpy (?), pandas/geopandas and polars +- [s2geometry](https://github.com/google/s2geometry): Python bindings generated with SWIG + - not vectorized nor very "pythonic" + - plans to switch to pybind11 (no time frame given) +- [spherely](https://github.com/benbovy/spherely): Python bindings of S2, mostly copying shapely's API + - provides numpy-like universal functions + - not yest ready for use +- [dggrid4py](https://github.com/allixender/dggrid4py): Python wrapper for [DGGRID](https://github.com/sahrk/DGGRID) + - DGGRID implements many DGGS variants! + - DGGRID current design makes it hardly reusable from within Python in an optimal way (the dggrid wrapper communicates with DGGRID through OS processes and I/O generated files) + ## Representation of DGGS data in xdggs `xdggs` represents a DGGS as an Xarray Dataset or DataArray containing a 1-dimensional coordinate with cell ids as labels and with grid name, resolution & parameters (optional) as attributes. This coordinate is indexed using a custom, Xarray-compatible `DGGSIndex`. From df328c18292843c99f0d9e7ed3a63712f220d55e Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Fri, 10 Nov 2023 16:22:57 +0100 Subject: [PATCH 04/14] more wip --- design_doc.md | 118 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 115 insertions(+), 3 deletions(-) diff --git a/design_doc.md b/design_doc.md index fdf6e25..3273af2 100644 --- a/design_doc.md +++ b/design_doc.md @@ -13,6 +13,7 @@ Examples of common DGGS features that `xdggs` should provide or facilitate: - convert between different cell id representations of a same DGGS (e.g., uint64 vs. string) - select data on a DGGS by cell ids or by geometries (spatial indexing) - change DGGS resolution (upgrade or downgrade) +- operations between similar DGGS (with auto-alignment) - re-organize cell ids (e.g., spatial shuffling / partitioning) - plotting @@ -88,7 +89,7 @@ Several Python packages are currently available for handling certain DGGSs. They - DGGRID implements many DGGS variants! - DGGRID current design makes it hardly reusable from within Python in an optimal way (the dggrid wrapper communicates with DGGRID through OS processes and I/O generated files) -## Representation of DGGS data in xdggs +## Representation of DGGS Data in Xdggs `xdggs` represents a DGGS as an Xarray Dataset or DataArray containing a 1-dimensional coordinate with cell ids as labels and with grid name, resolution & parameters (optional) as attributes. This coordinate is indexed using a custom, Xarray-compatible `DGGSIndex`. @@ -137,9 +138,120 @@ ds.set_xindex("cell", xdggs.H3Index, resolution=3) The DGGSIndex is set automatically when converting a gridded or vector dataset to a DGGS dataset (see below). -## Conversion +## Conversion from/to DGGS -## Data Selection +DGGS data may be created from various sources, e.g., + +- regridded from a latitude/longitude rectilinear grid +- regridded from an unstructured grid +- regridded and reprojected from a raster +- aggregated from vector point data +- filled from polygon data + +Conversely, DGGS data may be converted to various forms, e.g., + +- regridded on a latitude/longitude rectilinear grid +- rasterized (resampling / projection) +- conversion to vector point data (cell centroids) +- conversion to vector polygon data (cell boundaries) + +Here is a tentative API based on Dataset/DataArray `.dggs` accessors (note: other options are discussed in [this issue](https://github.com/benbovy/xdggs/issues/13)): + +```python +# "convert" directly from existing cell ids coordinate to DGGS +# basically an alias to ds.set_xindex(..., DGGSIndex) +ds.dggs.from_cell_ids(...) + +# convert from lat/lon grid +ds.dggs.from_latlon_grid(...) + +# convert from raster +ds.dggs.from_raster(...) + +# convert from point data +ds.dggs.from_points(...) + +# convert from point data (with aggregation) +ds.dggs.from_points_aggregate(...) + +# convert from point data (with aggregation using Xarray API) +ds.dggs.from_points(...).groupby(...).mean() + +# convert to lat/lon grid +ds.dggs.to_latlon_grid(...) + +# convert to raster +ds.dggs.to_raster(...) + +# convert to points (cell centroids) +ds.dggs.to_points(...) + +# convert to polygons (cell boundaries) +ds.dggs.to_polygons(...) +``` + +In the API methods above, the "dggs" accessor name serves as a prefix. + +Those methods are all called from an existing xarray Dataset (DataArray) and should all return another Dataset (DataArray): + +- Xarray has built-in support for regular grids +- for rasters, we could return objects that are [rioxarray](https://github.com/corteva/rioxarray)-friendly +- for vector data, we could return objects that are [xvec](https://github.com/xarray-contrib/xvec)-friendly (coordinate of shapely objects) +- etc. + +## Extracting DGGS Cell Geometries + +DGGS cell geometries could be extracted using the conversion methods proposed above. Alternatively, it would be convenient to get them directly as xarray DataArrays so that we can for example manually assign them as coordinates. + +The API may look like: + +```python +# return a DataArray of DGGS cell centroids as shapely.POINT objects +ds.dggs.cell_centroids() + +# return two DataArrays of DGGS cell centroids as lat/lon coordinates +ds.dggs.cell_centroids_coords() + +# return a DataArray of DGGS cell boundaries as shapely.POLYGON objects +ds.dggs.cell_boundaries() + +# return a DataArray of DGGS cell envelopes as shapely.POLYGON objects +ds.dggs.cell_envelopes() +``` + +## Indexing and Selecting DGGS Data + +### Selection by Cell IDs + +The simplest way to select DGGS data is by cell ids. This can be done directly using Xarray's API (`.sel()`): + +```python +ds.sel(cell=value) +``` + +where `value` can be a single cell id (integer or string/token?) or an array-like of cell ids. This is easily supported by the DGGSIndex encapsulating a PandasIndex. We might also want to support other `value` types, e.g., + +- assuming that DGGS cell ids are defined such that contiguous cells in space have contiguous id values, we could provide a `slice` to define a range of cell ids. +- DGGSIndex might implement some DGGS-aware logic such that it auto-detects if the given input cells are parent cells (lower DGGS resolution) and then selects all child cells accordingly. + +We might want to select cell neighbors (i.e., return a new Dataset/DataArray with a new neighbor dimension), probably via a specific API (`.dggs` accessors). + +### Selection by Geometries (Spatial Indexing) + +Another useful way of selecting DGGS data is from input geometries (spatial queries), e.g., + +- Select all cells that are the closest to a collection of data points +- Select all cells that intersects with or are fully contained in a polygon + +This kind of selection requires spatial indexes as this can not be done with a pandas index (see [this issue](https://github.com/benbovy/xdggs/issues/16)). + +If we support spatial indexing directly in `xdggs`, we can hardly reuse Xarray's `.sel()` for spatial queries as `ds.sel(cell=shapely.Polygon(...))` would look quite confusing. Perhaps better would be to align with [xvec](https://github.com/xarray-contrib/xvec) and have a separate `.dggs.query()` method. + +Alternatively, we could just get away with the conversion and cell geometry extraction methods proposed above and leave spatial indexes/queries to [xvec](https://github.com/xarray-contrib/xvec). + +## Operations between similar DGGS (alignment) + +TODO ## Plotting From 156b125a280701ce5f2fba08fea603d10839870b Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Fri, 10 Nov 2023 16:48:15 +0100 Subject: [PATCH 05/14] add comment on zone vs. cell vs. pixel (grid unit) --- design_doc.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/design_doc.md b/design_doc.md index 3273af2..4c7d178 100644 --- a/design_doc.md +++ b/design_doc.md @@ -67,6 +67,8 @@ There no released standard yet regarding DGGS. However, there is a group working Another draft of DGGS specification can be found here: https://github.com/danlooo/dggs-data-spec. +There are some discrepancies between the proposed standards and popular DGGS libraries (H3, S2, HealPIX). For example regarding the term used to define a grid unit: The two specifications above use "zone", S2/H3 use "cell" and HealPIX uses "pixel". Although in this document we use "cell", the term to choose for `xddgs` is still open for discussion. + ### Backends (Python) Several Python packages are currently available for handling certain DGGSs. They mostly consist of Python bindings of DGGS implementations written in C/C++/Rust. Here is a list (probably incomplete): From b0be9e7163ae6ab237e53f9b5b6e349fc144ac4b Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Fri, 10 Nov 2023 16:50:07 +0100 Subject: [PATCH 06/14] typo --- design_doc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design_doc.md b/design_doc.md index 4c7d178..459e728 100644 --- a/design_doc.md +++ b/design_doc.md @@ -63,7 +63,7 @@ Figure 3: Raster data converted as DGGS (H3) cells of mixed resolutions ([source ### Standards and Conventions -There no released standard yet regarding DGGS. However, there is a group working on a draft of OGC API for DGGS: https://github.com/opengeospatial/ogcapi-discrete-global-grid-systems. +There is no released standard yet regarding DGGS. However, there is a group working on a draft of OGC API for DGGS: https://github.com/opengeospatial/ogcapi-discrete-global-grid-systems. Another draft of DGGS specification can be found here: https://github.com/danlooo/dggs-data-spec. From 0a7f1ab3147c8956d76e597d1a55716ae38ff01e Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Mon, 13 Nov 2023 09:44:11 +0100 Subject: [PATCH 07/14] Update design_doc.md Co-authored-by: Justus Magin --- design_doc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design_doc.md b/design_doc.md index 459e728..0db9eb6 100644 --- a/design_doc.md +++ b/design_doc.md @@ -1,6 +1,6 @@ # XDGGS - Design document -Xarrays extension for DGGS. Technical specifiactions. +Xarrays extension for DGGS. Technical specifications. ## Goals From 4699c2a8c9798dcce1fab21f23496c49b4054f89 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Mon, 13 Nov 2023 09:39:36 +0100 Subject: [PATCH 08/14] typos xddgs -> xdggs --- design_doc.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/design_doc.md b/design_doc.md index 0db9eb6..ef5b357 100644 --- a/design_doc.md +++ b/design_doc.md @@ -4,7 +4,7 @@ Xarrays extension for DGGS. Technical specifications. ## Goals -The goal of the `xddgs` library is to facilitate working with multiple Discrete Global Grid Systems (DGGSs) via a unified, high-level and user-friendly API that is deeply integrated with [Xarray](https://xarray.dev). +The goal of the `xdggs` library is to facilitate working with multiple Discrete Global Grid Systems (DGGSs) via a unified, high-level and user-friendly API that is deeply integrated with [Xarray](https://xarray.dev). Examples of common DGGS features that `xdggs` should provide or facilitate: @@ -27,15 +27,15 @@ Conversion between DGGS and other grids or vector features may requires specific `xdggs` should try to follow standards and/or conventions defined for DGGS (see below). However, we may need to depart from them for practical reasons (e.g., common practices in popular DGGS libraries that do not fit well with the proposed standards). Strict adherence to a standard is welcome but shouldn't be enforced by all means. -`xddgs` should also try to support applications in both GIS and Earth-System communities, which may each use DGGS in slightly different ways (see examples below). +`xdggs` should also try to support applications in both GIS and Earth-System communities, which may each use DGGS in slightly different ways (see examples below). -When possible, `xddgs` operations should scale to fine resolutions (millions of cells) leveraging Xarray interoperability with Dask. This might not be always possible, though. Some operations (spatial indexing) may be hard to support at scale and it shouldn't be a high development priority. +When possible, `xdggs` operations should scale to fine resolutions (millions of cells) leveraging Xarray interoperability with Dask. This might not be always possible, though. Some operations (spatial indexing) may be hard to support at scale and it shouldn't be a high development priority. ## Non-Gloals `xdggs` should focus on providing the core DGGS functionality and operations that are listed above. Higher-level operations that can be implemented by combining together those core operations are out-of-scope and should be implemented in downstream libraries. Likewise, there may be many ways of resampling a grid to a DGGS ; `xdggs` should support the most common methods but not try to support _all of them_. -`xddgs` should try not re-inventing the wheel and delegate to Xarray API when possible. +`xdggs` should try not re-inventing the wheel and delegate to Xarray API when possible. `xdggs` does not implement any particular DGGS from scratch. `xdggs` does not aim at providing _all the functionality provided by each grid_ (e.g., some functionality may be very specific to one DGGS and not supported by other DGGSs, or some functionality may not be available yet in one DGGS Python backend). @@ -67,7 +67,7 @@ There is no released standard yet regarding DGGS. However, there is a group work Another draft of DGGS specification can be found here: https://github.com/danlooo/dggs-data-spec. -There are some discrepancies between the proposed standards and popular DGGS libraries (H3, S2, HealPIX). For example regarding the term used to define a grid unit: The two specifications above use "zone", S2/H3 use "cell" and HealPIX uses "pixel". Although in this document we use "cell", the term to choose for `xddgs` is still open for discussion. +There are some discrepancies between the proposed standards and popular DGGS libraries (H3, S2, HealPIX). For example regarding the term used to define a grid unit: The two specifications above use "zone", S2/H3 use "cell" and HealPIX uses "pixel". Although in this document we use "cell", the term to choose for `xdggs` is still open for discussion. ### Backends (Python) @@ -115,7 +115,7 @@ The cell ids in the 1-dimensional coordinate are all relative to the _exact same - They can be implemented in subclasses (see below) - They are either called from within the DGGSIndex or from the `.dggs` Dataset/DataArray accessors -Each DGGS supported in `xddgs` has its own subclass of `DGGSIndex`, e.g., +Each DGGS supported in `xdggs` has its own subclass of `DGGSIndex`, e.g., - `HealpixIndex` for Healpix - `H3Index` for H3 From dec5cf87203ef4f655e131bc2821ff4c9a81dcff Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Mon, 13 Nov 2023 09:47:28 +0100 Subject: [PATCH 09/14] Update design_doc.md Co-authored-by: Justus Magin --- design_doc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design_doc.md b/design_doc.md index ef5b357..7bb59b7 100644 --- a/design_doc.md +++ b/design_doc.md @@ -43,7 +43,7 @@ Although some DGGS may handle both the spatial and temporal domains in a joint f ## Discrete Global Grid Systems -A Discrete Global Grid System (DGGS) can be roughly defined as a partitioning or tesselation of the entire Earth's surface into a finite number of "cells" or "zones". The shape and the properties of these cells generally vary from one DGGS to another. Most DGGSs are also hierarchical, i.e., the cells are aranged on recursively on multiple levels or resolutions. Follow the links in the subsection below for a more strict and detailled definition of a DGGS. +A Discrete Global Grid System (DGGS) can be roughly defined as a partitioning or tessellation of the entire Earth's surface into a finite number of "cells" or "zones". The shape and the properties of these cells generally vary from one DGGS to another. Most DGGSs are also hierarchical, i.e., the cells are arranged on recursively on multiple levels or resolutions. Follow the links in the subsection below for a more strict and detailed definition of a DGGS. DGGSs may be used in various ways, e.g., From 16bce8734d8e5ea6b90b99e0151e3ee6dec83820 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Mon, 13 Nov 2023 09:47:46 +0100 Subject: [PATCH 10/14] Update design_doc.md Co-authored-by: Justus Magin --- design_doc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design_doc.md b/design_doc.md index 7bb59b7..c21ce8d 100644 --- a/design_doc.md +++ b/design_doc.md @@ -47,7 +47,7 @@ A Discrete Global Grid System (DGGS) can be roughly defined as a partitioning or DGGSs may be used in various ways, e.g., -- Applications in Earth-system modelling seem to use DGGS as grids of contiguous, fixed-resolution cells covering the entire Earth's surface or a region of interest (figure 1). This makes easier the analysis of simulation outputs on large extents of the Earth's surface. DGGS may also be used as pyramid data (multiple stacked datasets at different resolutions) +- Applications in Earth-system modelling seem to use DGGS as grids of contiguous, fixed-resolution cells covering the entire Earth's surface or a region of interest (figure 1). This makes the analysis of simulation outputs on large extents of the Earth's surface easier. DGGS may also be used as pyramid data (multiple stacked datasets at different resolutions) - Applications in GIS often consist of using DGGS to display aggregated (vector) data as a collection of cells with a more complex spatial distribution (sparse) and sometimes with mixed resolutions (figures 2 and 3). ![figure1](https://user-images.githubusercontent.com/4160723/281698490-31cb5ce8-64db-4bbf-a0d9-a8d6597bb2df.png) From 7928a145939cabbac74448a4518cabdd5c9d0110 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Mon, 13 Nov 2023 09:48:52 +0100 Subject: [PATCH 11/14] Update design_doc.md Co-authored-by: Justus Magin --- design_doc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design_doc.md b/design_doc.md index c21ce8d..52f058d 100644 --- a/design_doc.md +++ b/design_doc.md @@ -86,7 +86,7 @@ Several Python packages are currently available for handling certain DGGSs. They - plans to switch to pybind11 (no time frame given) - [spherely](https://github.com/benbovy/spherely): Python bindings of S2, mostly copying shapely's API - provides numpy-like universal functions - - not yest ready for use + - not yet ready for use - [dggrid4py](https://github.com/allixender/dggrid4py): Python wrapper for [DGGRID](https://github.com/sahrk/DGGRID) - DGGRID implements many DGGS variants! - DGGRID current design makes it hardly reusable from within Python in an optimal way (the dggrid wrapper communicates with DGGRID through OS processes and I/O generated files) From 9d2bc53b8116ec5bec0a0c75cad64c1bb6ae9e47 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Mon, 13 Nov 2023 10:53:43 +0100 Subject: [PATCH 12/14] clarify how vertical vs. horizontal scaling --- design_doc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/design_doc.md b/design_doc.md index 52f058d..f8dd4d8 100644 --- a/design_doc.md +++ b/design_doc.md @@ -29,7 +29,7 @@ Conversion between DGGS and other grids or vector features may requires specific `xdggs` should also try to support applications in both GIS and Earth-System communities, which may each use DGGS in slightly different ways (see examples below). -When possible, `xdggs` operations should scale to fine resolutions (millions of cells) leveraging Xarray interoperability with Dask. This might not be always possible, though. Some operations (spatial indexing) may be hard to support at scale and it shouldn't be a high development priority. +When possible, `xdggs` operations should scale to fine DGGS resolutions (millions of cells). This can be done vertically using backends with vectorized bindings of DGGS implementations written in low-level languages and/or horizontally leveraging Xarray interoperability with Dask. Some operations like spatial indexing may be hard to scale horizontally, though. For the latter, we should probably focus `xdggs` development first towards good vertical scaling before figuring out how they can be scaled horizontally (for reference, see [dask-geopandas](https://github.com/geopandas/dask-geopandas) and [spatialpandas](https://github.com/holoviz/spatialpandas)). ## Non-Gloals From 5a97e2c7c405bc8bc9cfdfd801b1cd330046a5ab Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Mon, 13 Nov 2023 11:10:27 +0100 Subject: [PATCH 13/14] rephrase plotting large DGGS data. --- design_doc.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/design_doc.md b/design_doc.md index f8dd4d8..66001da 100644 --- a/design_doc.md +++ b/design_doc.md @@ -263,6 +263,4 @@ Three approaches are possible (non-mutually exclusive): 2. convert cell data into vector data and plot the latter via, e.g., [xvec](https://github.com/xarray-contrib/xvec) or [geopandas](https://github.com/geopandas/geopandas) API 3. leverage libraries that support plotting DGGS data, e.g., [lonboard](https://github.com/developmentseed/lonboard) enables interactive plotting in Jupyter via deck.gl, which has support of H3 and S2 cell data. -The first and last approaches may be efficient in plotting large DGGS data. For approach 1, we might want to investigate using [datashader](https://github.com/holoviz/datashader) to set both the resolution and raster extent dynamically. For approach 3 (lonboard), we would only need to transfer cell ids (tokens) and cell data and then let deck.gl render the cells efficiently in the web browser using the GPU. - -Although the second approach may not scale as best as the other ones, it is versatile and may produce nice looking graphics. +The 3rd approach (lonboard) is efficient for plotting large DGGS data: we would only need to transfer cell ids (tokens) and cell data and then let deck.gl render the cells efficiently in the web browser using the GPU. For approach 1, we might want to investigate using [datashader](https://github.com/holoviz/datashader) to set both the resolution and raster extent dynamically. Likewise for approach 2, we could dynamically downgrade the DGGS resolution and aggregate the data before converting it into vector data in order to allow (interactive) plotting of large DGGS data. From f47739d6ee7ee63450a0cdf935b35b869c48ebb8 Mon Sep 17 00:00:00 2001 From: Benoit Bovy Date: Mon, 13 Nov 2023 11:51:27 +0100 Subject: [PATCH 14/14] add sections on hierarchical DGGS and alignment --- design_doc.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/design_doc.md b/design_doc.md index 66001da..02fe3dd 100644 --- a/design_doc.md +++ b/design_doc.md @@ -251,9 +251,17 @@ If we support spatial indexing directly in `xdggs`, we can hardly reuse Xarray's Alternatively, we could just get away with the conversion and cell geometry extraction methods proposed above and leave spatial indexes/queries to [xvec](https://github.com/xarray-contrib/xvec). +## Handling hierarchical DGGS + +Even though the DGGS coordinate of a Dataset (DataArray) is limited to cell ids of same resolution (no mixed-resolutions), `xdggs` can still provide functionality to deal with the hierarchical aspect of DGGSs. + +Selection by parent cell ids may be in example (see section above). Another example would be to have utility methods to explicitly change the grid resolution (see [issue #18](https://github.com/benbovy/xdggs/issues/18) for more details and discussion). + ## Operations between similar DGGS (alignment) -TODO +Computation involving multiple DGGS datasets (or dataarrays) often requires to align them together. Sometimes this can be trivial (same DGGS with same resolution and parameter values) but in other cases this can be complex (requires regridding or a change of DGGS resolution). + +In Xarray, alignment of datasets (dataarrays) is done primarily via their indexes. Since a DGGSIndex wraps a PandasIndex, it is easy to support alignment by cells ids (trivial case). At the very least, a DGGSIndex should raise an error when trying to align cell ids that do not refer to the exact same DGGS (i.e., same system, resolution and parameter values). For the complex cases, it may be preferable to handle them manually instead of trying to make the DGGSIndex perform the alignment automatically. Regridding and/or changing the resolution of a DGGS (+ data aggregation) often highly depend on the use-case so it might be hard to find a default behavior. Also performing those operations automatically and implicitly would probably feel too magical. That being said, in order to help alignment `xdggs` may provide some utility methods to change the grid resolution (see section above) and/or to convert from one DGGS to another. ## Plotting