-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design docs #20
Design docs #20
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial feedback with a focus on typos. Anything that requires more discussion we can move to the discussions thread you were planning to create.
Three approaches are possible (non-mutually exclusive): | ||
|
||
1. convert cell data into gridded or raster data (choose grid/raster resolution depending on the resolution of the rendered figure) and then reuse existing python plotting libraries (matplotlib, cartopy) maybe through xarray plotting API | ||
2. convert cell data into vector data and plot the latter via, e.g., [xvec](https://github.com/xarray-contrib/xvec) or [geopandas](https://github.com/geopandas/geopandas) API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might become sufficiently performant (I think James Bednar had a demo of hvplot
doing something like this somewhere), so we might also want to consider it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean hvplot plotting vector data like https://hvplot.holoviz.org/reference/geopandas/polygons.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant this: https://discourse.holoviz.org/t/dealing-high-resolution-vector-plots-xarray-holoviz/6310
Edit: but in that case it's a vector plot, not arbitrary geometries, so maybe not quite there yet?
Edit2: and this (scatter plot): https://discourse.holoviz.org/t/show-individual-points-when-zoomed-else-datashade/2204/14?u=ahuang11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the links! Yes it looks like a similar approach to what I suggest by using Datashader with rasterization of DGGS to a dynamic resolution and extent. This could actually be applied for polygon plots: downgrade the DGGS resolution (#18), then aggregate data, then convert it to vector data, then plot it. It would be nice to have some high-level API to do that interactively instead of requiring the user doing all of those steps manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rephrased the paragraph below in 5a97e2c to reflect that comment.
|
||
Examples of common DGGS features that `xdggs` should provide or facilitate: | ||
|
||
- convert a DGGS from/to another grid (e.g., a DGGS, a latitude/longitude rectilinear grid, a raster grid, an unstructured mesh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the discussion on Thursday, we pretty much agreed that this doesn't need to live in xdggs
(though as part of the work on xdggs
we need to ensure they exist). xesmf
and pyresample
seem like good candidates.
Edit: maybe that's what you meant with the section in the "non-goals"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what I mean here is that xdggs
should provide the API for doing these operations (or at least some core API that should facilitate it). xdggs
does not necessarily need to implement the logic: if there is a well-maintained 3rd party library that does that well, it is preferable to rely on it instead of reinventing the wheel.
About xesmf
and pyresample
, do they handle complex cell geometries (non-rectilinear) on the sphere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1st sentence of the "goals" section is particularly important: "unified, high-level and user-friendly API that is deeply integrated with Xarray". It is all about the API so I expect that xdggs
would mainly consist of adapter code that extend the Xarray API and calls routines from 3rd party libs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a API design issue: does it make sense to provide a interpolate
accessor method if we simply forward to another library? Oftentimes, using that library directly is more efficient, since at least in the case of xesmf
it allows applying the same multiple times without a recomputation of weights. (I don't think there's a universal answer here, it depends on the library and how it works)
About
xesmf
andpyresample
, do they handle complex cell geometries (non-rectilinear) on the sphere?
not sure, but the point is that if they don't we could (try to) extend them instead of implementing this ourselves. Note that those two libraries were examples, if there's a different library that does this already I wouldn't mind using that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is all about the API so I expect that xdggs would mainly consist of adapter code that extend the Xarray API and calls routines from 3rd party libs.
this would certainly be very convenient, and I didn't think of alternatives, yet (just to have more options to choose from). That will take me some time, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a API design issue: does it make sense to provide a interpolate accessor method if we simply forward to another library?
Perhaps not an interpolate accessor but conversion methods in a dggs accessor yes. I think it often makes sense since we're not only regridding but we're also conforming the output to some conventional (or standard?) DGGS representation used by xdggs
(+ building/setting an index).
It is also about discoverability. We could certainly add tutorials showing how to regrid lat/lon 2D grids to DGGS by directly calling xesmf, pyreseample, etc., and then set the DGGS index. But I find it quite conventional to have some high-level API layer in xdggs to do the regridding and conversion all at once, since it is a very common operation.
Want to create a new DGGS Dataset from another Dataset? Just type ds.dggs.from_<tab-completion>
and pick up the conversion method that is relevant for your dataset :).
at least in the case of xesmf it allows applying the same multiple times without a recomputation of weights.
I think we could arrange the xdggs
API so that it allows reusing interpolation weights. That doesn't seem a big issue to me.
if there's a different library that does this already I wouldn't mind using that instead.
Totally agreed, we are on the same page!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I also tried to state in the doc (non-goals section): "Likewise, there may be many ways of resampling a grid to a DGGS ; xdggs should support the most common methods but not try to support all of them."
While I think we should really have conversion methods for converting lat/lon 2D grid to DGGS and vice-versa that handle 80% of the use-cases, it shouldn't prevent reusing directly xesmf, pyresample, etc. directly for the more advanced use cases.
design_doc.md
Outdated
|
||
`xddgs` should also try to support applications in both GIS and Earth-System communities, which may each use DGGS in slightly different ways (see examples below). | ||
|
||
When possible, `xddgs` operations should scale to fine resolutions (millions of cells) leveraging Xarray interoperability with Dask. This might not be always possible, though. Some operations (spatial indexing) may be hard to support at scale and it shouldn't be a high development priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's also the idea that whenever necessary (i.e. if the python implementation turns out to be too slow), slow operations need to be written in a lower-level language
Edit: though if we can delegate all the low-level stuff we might get away with a pure-python package here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for low-level languages / vectorized bindings I also tried to emphasize that in the "goals" and "backends" sections.
Here this sentence is more about horizontal scaling (I can clarify that). While there are some operations that are trivial to support with Dask, other operations like spatial indexing / partitioning / shuffling / sorting are much harder to implement in a distributed fashion. That is not impossible, though, I think we could for example look at dask-geopandas. This is pretty hard work, so I wanted to write that it is probably wiser to first try supporting spatial indexing for small amounts of data and then after see how we can scale it horizontally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this paragraph in 9d2bc53
|
||
`xdggs.DGGSIndex` is the base class for all Xarray DGGS-aware indexes. It inherits from `xarray.indexes.Index` and has the following specifications: | ||
|
||
- It encapsulates an `xarray.indexes.PandasIndex` built from cell ids so that selection and alignment by cell id is possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might need to change that for hierarchical access by cell id
Edit: this is probably discussed immediately below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean exactly by "hierarchical access by cell id"? Do you have an example in mind?
In the "selection by cell ids" section there's a small reference to selection using ids on another resolution. Maybe we could make it work with a PandasIndex? (i.e., get parent/child cell ids before/after passing ids as query to the pandas index).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, that's what I meant (hence the edit).
Co-authored-by: Justus Magin <[email protected]>
Co-authored-by: Justus Magin <[email protected]>
Co-authored-by: Justus Magin <[email protected]>
Co-authored-by: Justus Magin <[email protected]>
it looks like the remaining points will take a while to discuss, so I don't mind merging this now and continuing the discussion elsewhere (your choice). |
Many thanks @keewis for the feedback. I addressed all your comments that were easily fixable. I'm +1 for merging this now and continue the discussion in the |
great, this looks good to me, and let's proceed as you suggested. Edit: though you'll have to merge yourself, as I don't have the permissions to do that |
Discussion opened there: #22 |
I've tried to put together thoughts and ideas found in the issues of this repository and in the sprint doc (https://hackmd.io/UBM5L6YNRlG73e3eVo6vOg) in a design document.
This is still work in progress (although already quite detailed). Feedback and suggestions are very welcome!