Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design docs #20

Merged
merged 14 commits into from
Nov 13, 2023
Merged

Design docs #20

merged 14 commits into from
Nov 13, 2023

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Nov 10, 2023

I've tried to put together thoughts and ideas found in the issues of this repository and in the sprint doc (https://hackmd.io/UBM5L6YNRlG73e3eVo6vOg) in a design document.

This is still work in progress (although already quite detailed). Feedback and suggestions are very welcome!

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial feedback with a focus on typos. Anything that requires more discussion we can move to the discussions thread you were planning to create.

Three approaches are possible (non-mutually exclusive):

1. convert cell data into gridded or raster data (choose grid/raster resolution depending on the resolution of the rendered figure) and then reuse existing python plotting libraries (matplotlib, cartopy) maybe through xarray plotting API
2. convert cell data into vector data and plot the latter via, e.g., [xvec](https://github.com/xarray-contrib/xvec) or [geopandas](https://github.com/geopandas/geopandas) API
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might become sufficiently performant (I think James Bednar had a demo of hvplot doing something like this somewhere), so we might also want to consider it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean hvplot plotting vector data like https://hvplot.holoviz.org/reference/geopandas/polygons.html?

Copy link
Collaborator

@keewis keewis Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this: https://discourse.holoviz.org/t/dealing-high-resolution-vector-plots-xarray-holoviz/6310

Edit: but in that case it's a vector plot, not arbitrary geometries, so maybe not quite there yet?
Edit2: and this (scatter plot): https://discourse.holoviz.org/t/show-individual-points-when-zoomed-else-datashade/2204/14?u=ahuang11

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the links! Yes it looks like a similar approach to what I suggest by using Datashader with rasterization of DGGS to a dynamic resolution and extent. This could actually be applied for polygon plots: downgrade the DGGS resolution (#18), then aggregate data, then convert it to vector data, then plot it. It would be nice to have some high-level API to do that interactively instead of requiring the user doing all of those steps manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased the paragraph below in 5a97e2c to reflect that comment.

design_doc.md Outdated Show resolved Hide resolved

Examples of common DGGS features that `xdggs` should provide or facilitate:

- convert a DGGS from/to another grid (e.g., a DGGS, a latitude/longitude rectilinear grid, a raster grid, an unstructured mesh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the discussion on Thursday, we pretty much agreed that this doesn't need to live in xdggs (though as part of the work on xdggs we need to ensure they exist). xesmf and pyresample seem like good candidates.

Edit: maybe that's what you meant with the section in the "non-goals"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I mean here is that xdggs should provide the API for doing these operations (or at least some core API that should facilitate it). xdggs does not necessarily need to implement the logic: if there is a well-maintained 3rd party library that does that well, it is preferable to rely on it instead of reinventing the wheel.

About xesmf and pyresample, do they handle complex cell geometries (non-rectilinear) on the sphere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1st sentence of the "goals" section is particularly important: "unified, high-level and user-friendly API that is deeply integrated with Xarray". It is all about the API so I expect that xdggs would mainly consist of adapter code that extend the Xarray API and calls routines from 3rd party libs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a API design issue: does it make sense to provide a interpolate accessor method if we simply forward to another library? Oftentimes, using that library directly is more efficient, since at least in the case of xesmf it allows applying the same multiple times without a recomputation of weights. (I don't think there's a universal answer here, it depends on the library and how it works)

About xesmf and pyresample, do they handle complex cell geometries (non-rectilinear) on the sphere?

not sure, but the point is that if they don't we could (try to) extend them instead of implementing this ourselves. Note that those two libraries were examples, if there's a different library that does this already I wouldn't mind using that instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is all about the API so I expect that xdggs would mainly consist of adapter code that extend the Xarray API and calls routines from 3rd party libs.

this would certainly be very convenient, and I didn't think of alternatives, yet (just to have more options to choose from). That will take me some time, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a API design issue: does it make sense to provide a interpolate accessor method if we simply forward to another library?

Perhaps not an interpolate accessor but conversion methods in a dggs accessor yes. I think it often makes sense since we're not only regridding but we're also conforming the output to some conventional (or standard?) DGGS representation used by xdggs (+ building/setting an index).

It is also about discoverability. We could certainly add tutorials showing how to regrid lat/lon 2D grids to DGGS by directly calling xesmf, pyreseample, etc., and then set the DGGS index. But I find it quite conventional to have some high-level API layer in xdggs to do the regridding and conversion all at once, since it is a very common operation.

Want to create a new DGGS Dataset from another Dataset? Just type ds.dggs.from_<tab-completion> and pick up the conversion method that is relevant for your dataset :).

at least in the case of xesmf it allows applying the same multiple times without a recomputation of weights.

I think we could arrange the xdggs API so that it allows reusing interpolation weights. That doesn't seem a big issue to me.

if there's a different library that does this already I wouldn't mind using that instead.

Totally agreed, we are on the same page!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I also tried to state in the doc (non-goals section): "Likewise, there may be many ways of resampling a grid to a DGGS ; xdggs should support the most common methods but not try to support all of them."

While I think we should really have conversion methods for converting lat/lon 2D grid to DGGS and vice-versa that handle 80% of the use-cases, it shouldn't prevent reusing directly xesmf, pyresample, etc. directly for the more advanced use cases.

design_doc.md Outdated

`xddgs` should also try to support applications in both GIS and Earth-System communities, which may each use DGGS in slightly different ways (see examples below).

When possible, `xddgs` operations should scale to fine resolutions (millions of cells) leveraging Xarray interoperability with Dask. This might not be always possible, though. Some operations (spatial indexing) may be hard to support at scale and it shouldn't be a high development priority.
Copy link
Collaborator

@keewis keewis Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's also the idea that whenever necessary (i.e. if the python implementation turns out to be too slow), slow operations need to be written in a lower-level language

Edit: though if we can delegate all the low-level stuff we might get away with a pure-python package here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for low-level languages / vectorized bindings I also tried to emphasize that in the "goals" and "backends" sections.

Here this sentence is more about horizontal scaling (I can clarify that). While there are some operations that are trivial to support with Dask, other operations like spatial indexing / partitioning / shuffling / sorting are much harder to implement in a distributed fashion. That is not impossible, though, I think we could for example look at dask-geopandas. This is pretty hard work, so I wanted to write that it is probably wiser to first try supporting spatial indexing for small amounts of data and then after see how we can scale it horizontally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this paragraph in 9d2bc53

design_doc.md Outdated Show resolved Hide resolved
design_doc.md Outdated Show resolved Hide resolved
design_doc.md Outdated Show resolved Hide resolved
design_doc.md Outdated Show resolved Hide resolved

`xdggs.DGGSIndex` is the base class for all Xarray DGGS-aware indexes. It inherits from `xarray.indexes.Index` and has the following specifications:

- It encapsulates an `xarray.indexes.PandasIndex` built from cell ids so that selection and alignment by cell id is possible
Copy link
Collaborator

@keewis keewis Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to change that for hierarchical access by cell id

Edit: this is probably discussed immediately below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean exactly by "hierarchical access by cell id"? Do you have an example in mind?

In the "selection by cell ids" section there's a small reference to selection using ids on another resolution. Maybe we could make it work with a PandasIndex? (i.e., get parent/child cell ids before/after passing ids as query to the pandas index).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that's what I meant (hence the edit).

benbovy and others added 5 commits November 13, 2023 09:44
Co-authored-by: Justus Magin <[email protected]>
Co-authored-by: Justus Magin <[email protected]>
Co-authored-by: Justus Magin <[email protected]>
Co-authored-by: Justus Magin <[email protected]>
@keewis
Copy link
Collaborator

keewis commented Nov 13, 2023

it looks like the remaining points will take a while to discuss, so I don't mind merging this now and continuing the discussion elsewhere (your choice).

@benbovy
Copy link
Member Author

benbovy commented Nov 13, 2023

Many thanks @keewis for the feedback. I addressed all your comments that were easily fixable.

I'm +1 for merging this now and continue the discussion in the Discussions section of this repository. Follow-up PRs are also welcome to improve the document!

@keewis
Copy link
Collaborator

keewis commented Nov 13, 2023

great, this looks good to me, and let's proceed as you suggested.

Edit: though you'll have to merge yourself, as I don't have the permissions to do that

@benbovy benbovy merged commit 7f501fd into main Nov 13, 2023
@benbovy
Copy link
Member Author

benbovy commented Nov 13, 2023

Discussion opened there: #22

@benbovy benbovy deleted the design-docs branch November 27, 2023 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants