-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add array SVG image and table to _repr_html_ #9301
base: main
Are you sure you want to change the base?
Conversation
If jinja is too heavy of a dependency, the next best thing would probably to see if this could go into https://github.com/benbovy/xarray-fancy-repr. Would love to hear any thoughts from @benbovy 🙏 |
Thanks @maxrjones ! I'd be +0.5 on this if folks think it makes the display clearer. For maintainability — to what extent is |
This is great @maxrjones!
FYI this is no longer a dask-specific question, as cubed also has these html array reprs (code taken from dask but adjusted enough that it doesn't make sense for them to live in the same place). |
The primary modification was adding labels along axes for the dimension names and adjusting offsets to allow space for those labels. Although not used right now, I also added flexibility to the colors with the idea that it would be nice to use different colors for coords vs. data variables. All of that could possibly be handled by modifying the SVGs returned from Dask. I know cubed more directly vendors this code for its array |
😆 Tom our wavelengths are way too similar today, but you're just faster! I swear I didn't see your comment before hitting enter, helpful to know that you don't directly vendor it though |
The PR which added the html repr to cubed is here: cubed-dev/cubed#216
Looking back at the PR now I have a bit lower confidence in this claim. Generally though I think having a separate repo just to avoid duplicating code in N=2 places seems like overkill, but if similar code lived in xarray for non-chunked arrays it would be N=3... |
iirc dask doesn't like to have any external dependencies, so right now the main potential users would be xarray and cubed. But maybe it'd be worthwhile asking somewhere if it could be useful as an optional NumPy / CuPy dependency that monkeypatches I'm wondering what the plan was with the formatting modules and NamedArray? because the SVG representation could be useful for the NamedArrays which might motivate not including it in modules that would get left behind in Xarray. |
This is just one template, I bet you can do it pretty easily without jinja. My main concern is that at first glance, this looks like a dask array, and that's going to confuse medium->experienced users quite a bit. To that end: we should definitely change some colors, explicitly add the array type, and maybe change some aspect ratio parameters to make it look different visually. It'd also be nice to figure out how to show the actual values since we have them. It'd be nice not to have to expand one more level to see any small arrays. |
Agreed with @dcherian. Also things like dimension labels would be nice to have for dask-baked variables as well. That's probably beyond the scope of this PR, but I wonder if a more general solution would be to have an additional icon at the right of the variable inline repr (there is still some room for that) to show the same svg representation for all duck arrays, assuming these all have the common metadata we need for this representation: shape, etc. So we can leave the current "database" icon for showing the original repr of the wrapped array. |
Thanks for all the feedback, everyone! I'll have some time next week to work on the next iteration based on these recommendations. My goals will be:
I will also explore the feasibility of two additional ideas:
|
I'd strongly support exploring this, actually! While the original reprs of Dask arrays, Numpy arrays, etc. have limitations in the context of Xarray, users are familiar with it. For array containers / wrappers like In terms of UI/UX a separate icon button next to the other ones has the advantage of not introducing another nested level compared to the In terms of implementation a separate icon is pretty easily maintainable. It shouldn't be hard to add a new one as the html formatting has been written in a pretty composable way already, e.g., it would look like adding those lines in def summarize_variable(name, var, is_index=False, dtype=None) -> str:
...
cube_id = "cube-" + str(uuid.uuid4())
...
cube_repr = svg_cube_repr_html(variable)
...
cube_icon = _icon("icon-cube")
return (
...
f"<input id='{cube_id}' class='xr-var-cube-in' type='checkbox'>"
f"<label for='{cube_id}' title='Show/Hide data repr'>"
f"{cube_icon}</label>"
...
f"<div class='xr-var-cube'>{cube_repr}</div>"
) One concern I have with this approach, though, is the performance impact of generating and including another SVG image for each variable in the static html repr, which might become problematic for big Datasets with many variables. Such UI element could be generated on-demand as it is hidden by default, however there is no way to do that other than having a dynamic widget like in https://github.com/benbovy/xarray-fancy-repr. That all being said, thanks @maxrjones for working on this PR, overall I think it would be a nice improvement! |
Thanks for all these thoughts! A solution to the performance impact could be to make adding that separate icon configurable via |
In the SciPy tutorial, I found that the lack of visual representation for NumPy backed arrays made it more difficult for new users to understand the Xarray data model. This PR uses Dask's approach (started in dask/dask#4794) to add a table with Bytes, Shape, and Data type, along with a SVG for any array whose backend doesn't already implement
_repr_html_
. The code's pretty rough (e.g., pre-commit fails) and the layout could use improvement but I wanted to find out if it's of interest before working on it more.Relates to #8690, but only for html representation.
Here's a notebook as an example and here's what the repr currently looks like:
cc @scottyhq @JessicaS11 @TomNicholas
whats-new.rst
api.rst