Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interactive plotting with lonboard #67

Merged
merged 32 commits into from
Oct 14, 2024
Merged

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Oct 2, 2024

As described in #11, this allows us to plot a single map using lonboard. This works by computing the cell boundaries in advance, then use lonboard.SolidPolygonLayer to visualize them.

In the future, we might be able to extend this by attaching sliders for any additional dimensions, as lonboard is really fast when it comes to changing the color of the polygons.

Also, as a future task, we should try to allow making use of the native support for H3 cell ids in deck.gl (and extend that to support healpix and maybe other DGGS), which should get rid of a lot of the data transfer between the python kernel and the browser, and computing cell boundaries is also very expensive.

Still missing:

  • tests

xdggs/accessor.py Outdated Show resolved Hide resolved

Returns
-------
map : lonboard.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you have docs yet, but if you use sphinx or mkdocs (with mkdocstrings), lonboard implements intersphinx, so you should be able to set it up so the return value is a hyperlink to the lonboard docs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the hint. I'll probably use sphinx to set up the docs, so I'll definitely make use of that!

cmap="viridis",
center=None,
):
import geopandas as gpd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you have interest, it would be possible to implement this conversion in a way that doesn't require GeoPandas (or pyarrow), by using raw numpy buffers and arro3, a more minimal implementation of Arrow in Python that Lonboard adds as a dependency in the next version.

This would also be faster to render as it doesn't have to go through GeoPandas (and shapely geometries).

But it would be a little more complex code here. I can give you some pointers if you want.

Copy link
Collaborator Author

@keewis keewis Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely be interested.

Right now, what we're doing is ask the user to pre-compute cell vertices from the cell ids (which can have various forms for different DGGS, but most often are int64 or uint64 arrays), then convert them into shapely polygons. This is one of the things that take the longest other than converting to geopandas and constructing the SolidPolygonLayer, so if we could instead extend deck.gl to directly support them (like it supports H3 ids), that would be best. However, I suppose that will either take a long time and require someone to actually do the work, especially since there's so many different kinds of DGGS (and I'm not confident in my TypeScript / JavaScript skills so I won't be able to work on this myself anytime soon).

So as the next best step, should we construct geoarrow polygons from the cell vertex coordinates, possibly using arro3? If you have any examples I'd be happy to try figuring out how to do that.

Copy link
Contributor

@kylebarron kylebarron Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of the day, we need to create vertex information in Lonboard/deck.gl for rendering. I believe that even deck.gl's H3 layer does not render the h3 cells on the GPU from an ID, but rather uses those IDs on the CPU in JS to create the vertex geometry.

For now, I think it's simplest to convert cell ids to Polygon geometries in Python, and then we can use the SolidPolygonLayer/PolygonLayer directly.

A GeoArrow Polygon array is just:

  • a contiguous buffer of all the coordinates in all the polygons
  • plus an offset array that points into the coordinates array and which says which coordinates are in each ring
  • plus an offset array that points into the previous ring offsets array, and which says how many rings are in each geometry.

Here's a quick code example that might help illustrate:

import arro3.core
import h3.api.numpy_int as h3
import lonboard
import numpy as np
from arro3.core import Array, Field, fixed_size_list_array, list_array

cell = h3.geo_to_h3(-10, 15, 4)
disk = h3.k_ring(cell, 2)

# Overallocates in case there are any pentagons
coords_per_hexagon = 7
total_coords = len(disk) * coords_per_hexagon
# Allocate coordinate buffer
coords = np.zeros((total_coords, 2), dtype=np.float64)
# Allocate ring offsets. The length should be one greater than the number of rings
ring_offsets = np.zeros(len(disk) + 1, dtype=np.int32)
# In the case of h3 cells, we know that every cell will be a simple polygon with a
# single ring, so we can just use `arange`
geom_offsets = np.arange(len(disk) + 1, dtype=np.int32)

coord_offset = 0
for i, h in enumerate(disk):
    boundary = h3.h3_to_geo_boundary(h, geo_json=True)
    num_coords = len(boundary)
    coords[coord_offset : coord_offset + num_coords, :] = boundary
    coord_offset += num_coords
    ring_offsets[i + 1] = coord_offset

# shrink the coordinates array, in case there are any pentagons
last_coord = ring_offsets[-1]
coords = coords[:last_coord, :]


polygon_array = list_array(geom_offsets, list_array(ring_offsets, coords))
# We need to tag the array with extension metadata (`geoarrow.polygon`) so that Lonboard knows that this is a geospatial column.
polygon_array_with_geo_meta = polygon_array.cast(
    polygon_array.field.with_metadata({"ARROW:extension:name": "geoarrow.polygon"})
)
lonboard.viz(polygon_array_with_geo_meta)

(Note that you need to use latest main of arro3, because I just implemented support for using numpy-backed memory for Rust Arrow arrays kylebarron/arro3#204, kylebarron/arro3#208)

image

(Note that this simple implementation probably breaks over the antimeridian.)

h3-py is not itself vectorized, so converting from h3 cell ids to polygon vertices is quite slow. In the longer term, I'd like to update https://github.com/nmandery/h3ronpy to have a fast way to convert from h3 cell ids to GeoArrow polygon array (or MultiPolygonArray in the case of cells that cross the antimeridian)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great @kylebarron !

I believe that even deck.gl's H3 layer does not render the h3 cells on the GPU from an ID, but rather uses those IDs on the CPU in JS to create the vertex geometry.

Interesting. Do you know if deck.gl has some clever optimizations for plotting H3/S2 like, e.g., compute on the fly the cell polygon geometries and/or aggregating data (parent cells), depending on the current zoom level / display extent? This might still have some benefits for plotting very large datasets (common use case) interactively. Or maybe Lonboard allows doing such optimizations on the server side (widget callback)?

I just implemented support for using numpy-backed memory for Rust Arrow arrays

I guess this could really help better integration between arrow and xarray in general? 👀

Copy link
Collaborator Author

@keewis keewis Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the longer term, I'd like to update https://github.com/nmandery/h3ronpy to have a fast way to convert from h3 cell ids to GeoArrow polygon array

h3ronpy has h3ronpy.arrow.vector.cells_to_wkb_polygons which would require you to decode WKB, so not sure if that's better.

In any case, I've translated my example from #11 to use your example (while borrowing code from lonboard.viz to construct the arrow table and construct a SolidPolygonLayer):

import xarray as xr
import arro3.core
import healpy as hp
import lonboard
import numpy as np
from matplotlib import colormaps
from arro3.core import Array, Field, fixed_size_list_array, list_array, Schema, Table, ChunkedArray

resolution = 10

ds = (
    xr.Dataset(
        coords={
            "cell_ids": (
                "cells",
                np.arange(12 * 4**resolution),
                {
                    "grid_name": "healpix",
                    "resolution": resolution,
                    "indexing_scheme": "nested",
                },
            )
        }
    )
    .pipe(xdggs.decode)
    .pipe(lambda ds: ds.merge(ds.dggs.cell_centers()))
    .assign(data=lambda ds: np.sin(np.radians(ds["latitude"])))
)

level = 10
cell_ids = np.arange(12 * 4**level)


def cell_ids2vertices(cell_ids, level, indexing_scheme):
    nest = indexing_scheme == "nested"
    nside = 2**level

    boundary_vectors = hp.boundaries(nside, cell_ids, step=1, nest=nest)
    lon, lat = hp.vec2ang(np.moveaxis(boundary_vectors, 1, -1), lonlat=True)
    lon_ = (lon + 180) % 360 - 180
    boundaries = np.reshape(np.stack((lon_, lat), axis=-1), (-1, 4, 2))

    # fix the dateline / prime meridian issue
    lon_ = boundaries[..., 0]
    to_fix = abs(np.max(lon_, axis=-1) - np.min(lon_, axis=-1)) > 300
    fixed_lon = lon_[to_fix, :] + 360
    boundaries[to_fix, :, 0] = fixed_lon

    return boundaries


def cell_ids2cell_boundaries_geoarrow(cell_ids, level, indexing_scheme):
    vertices = cell_ids2vertices(cell_ids, level, indexing_scheme)
    boundaries = np.concatenate([vertices, vertices[:, :1, :]], axis=1)
    
    coords = np.reshape(boundaries, (-1, 2))
    coords_per_pixel = boundaries.shape[1]
    geom_offsets = np.arange(cell_ids.size + 1, dtype="int32")
    ring_offsets = geom_offsets * coords_per_pixel

    polygon_array = list_array(geom_offsets, list_array(ring_offsets, coords))
    # We need to tag the array with extension metadata (`geoarrow.polygon`) so that Lonboard knows that this is a geospatial column.
    polygon_array_with_geo_meta = polygon_array.cast(
        polygon_array.field.with_metadata({"ARROW:extension:name": "geoarrow.polygon"})
    )
    return polygon_array_with_geo_meta

polygons = cell_ids2cell_boundaries_geoarrow(cell_ids, level=level, indexing_scheme="nested")
values = np.linspace(0, 1, cell_ids.size)
cmap = colormaps["viridis"]
colors = apply_continuous_cmap(values, cmap)

array = Array.from_arrow(polygons)
field = array.field.with_name("geometry")
schema = Schema([field])
table = Table.from_arrays([array], schema=schema)

num_rows = len(array)
if num_rows <= np.iinfo(np.uint8).max:
    arange_col = Array.from_numpy(np.arange(num_rows, dtype=np.uint8))
elif num_rows <= np.iinfo(np.uint16).max:
    arange_col = Array.from_numpy(np.arange(num_rows, dtype=np.uint16))
elif num_rows <= np.iinfo(np.uint32).max:
    arange_col = Array.from_numpy(np.arange(num_rows, dtype=np.uint32))
else:
    arange_col = Array.from_numpy(np.arange(num_rows, dtype=np.uint64))

table = table.append_column("row_index", ChunkedArray([arange_col])).append_column("values", ChunkedArray([Array.from_numpy(values)]))

layer = lonboard.SolidPolygonLayer(table=table, get_fill_color=colors)
map = lonboard.Map(layer)

(this also doesn't fully resolve the dateline issue)

which takes ~19 s to complete, plus about 10s to render. The version that converts to geopandas takes ~54s plus another ~10s to render, so this is a huge improvement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's cool! I should figure out how to use pyinstrument. Was it a one-line thing to create that?

├─ 11.166 SolidPolygonLayer.__init__  lonboard/_layer.py:1498
│     [26 frames hidden]  lonboard, ipywidgets, pyarrow, <__arr...
│        8.384 ParquetWriter.write_table  pyarrow/parquet/core.py:1084
  1. That's a lot of time just to write Parquet. I wouldn't have expected it to be that slow. For a long time I've thought about having a toggle for the user to switch off parquet encoding (e.g. if the Python environment is on the same machine as the browser). We go through Parquet currently because it creates a much smaller download in the case of remote Python environments.
  2. I'm curious what those other frames are. It still shouldn't be 3 seconds of overhead

Copy link
Collaborator Author

@keewis keewis Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a one-line thing to create that?

it has different modes for how you can run it (see the user guide). I put the code above in a python script, then ran it with python -m pyinstrument lonboard_example.py, but you could also use the context manager:

with pyinstrument.profile():
    ...

or the jupyter / ipython cell magic %%pyinstrument (load first with %load_ext pyinstrument)

Copy link
Collaborator Author

@keewis keewis Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what those other frames are. It still shouldn't be 3 seconds of overhead

the ones beneath SolidPolygonLayer? Here you go:

├─ 11.018 SolidPolygonLayer.__init__  lonboard/_layer.py:1498
│  └─ 11.018 SolidPolygonLayer.__init__  lonboard/_layer.py:266
│     ├─ 9.058 SolidPolygonLayer.__init__  lonboard/_layer.py:80
│     │  └─ 9.058 SolidPolygonLayer.__init__  lonboard/_base.py:16
│     │     └─ 9.058 SolidPolygonLayer.__init__  ipywidgets/widgets/widget.py:500
│     │           [1 frames hidden]  ipywidgets
│     │              8.991 SolidPolygonLayer.get_state  ipywidgets/widgets/widget.py:589
│     │              ├─ 8.330 serialize_table  lonboard/_serialization.py:72
│     │              │  └─ 8.330 serialize_table_to_parquet  lonboard/_serialization.py:25
│     │              │     └─ 8.321 ParquetWriter.write_batch  pyarrow/parquet/core.py:1068
│     │              │           [0 frames hidden]  
│     │              │              8.321 ParquetWriter.write_table  pyarrow/parquet/core.py:1084
│     │              └─ 0.661 serialize_accessor  lonboard/_serialization.py:58
│     │                 └─ 0.661 serialize_pyarrow_column  lonboard/_serialization.py:52
│     │                    └─ 0.661 serialize_table_to_parquet  lonboard/_serialization.py:25
│     │                       └─ 0.641 ParquetWriter.write_batch  pyarrow/parquet/core.py:1068
│     │                             [1 frames hidden]  pyarrow
│     └─ 1.958 default_geoarrow_viewport  lonboard/_layer.py:215
│        └─ 1.865 total_bounds  lonboard/_geoarrow/ops/bbox.py:36
│           └─ 1.865 _total_bounds_nest_2  lonboard/_geoarrow/ops/bbox.py:80
│              └─ 1.865 _coords_bbox  lonboard/_geoarrow/ops/bbox.py:55
│                 ├─ 0.939 amax  <__array_function__ internals>:177
│                 │     [3 frames hidden]  numpy, <built-in>
│                 └─ 0.926 amin  <__array_function__ internals>:177
│                       [3 frames hidden]  numpy, <built-in>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the most recent commits integrate the code we talked about into xdggs. If you have time to look at I'd appreciate another review, but either way thanks for all the help, this is already so much faster than the code I had before.

Copy link
Contributor

@kylebarron kylebarron Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the next version of geoarrow.rust.core will have constructor functions: geoarrow/geoarrow-rs#810. And so you can pass the arrays directly into geoarrow.rust.core.polygons. Using those functions would ensure that the arrays generated are fully spec-compatible.

(As it is, you're generating data whose physical buffers are the same as geoarrow arrays, but you're missing some metadata like the correct field names for nested arrays and the correct nullability flags for nested arrays. Lonboard allows that input but it's not 100% to the geoarrow spec, especially if we enforce it in issues like geoarrow/geoarrow#64)

Copy link
Collaborator Author

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for the review comments, @kylebarron!

xdggs/accessor.py Outdated Show resolved Hide resolved

Returns
-------
map : lonboard.Map
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the hint. I'll probably use sphinx to set up the docs, so I'll definitely make use of that!

cmap="viridis",
center=None,
):
import geopandas as gpd
Copy link
Collaborator Author

@keewis keewis Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely be interested.

Right now, what we're doing is ask the user to pre-compute cell vertices from the cell ids (which can have various forms for different DGGS, but most often are int64 or uint64 arrays), then convert them into shapely polygons. This is one of the things that take the longest other than converting to geopandas and constructing the SolidPolygonLayer, so if we could instead extend deck.gl to directly support them (like it supports H3 ids), that would be best. However, I suppose that will either take a long time and require someone to actually do the work, especially since there's so many different kinds of DGGS (and I'm not confident in my TypeScript / JavaScript skills so I won't be able to work on this myself anytime soon).

So as the next best step, should we construct geoarrow polygons from the cell vertex coordinates, possibly using arro3? If you have any examples I'd be happy to try figuring out how to do that.

@keewis keewis requested a review from benbovy October 8, 2024 14:04
xdggs/h3.py Outdated Show resolved Hide resolved
xdggs/h3.py Outdated Show resolved Hide resolved
xdggs/plotting.py Outdated Show resolved Hide resolved
xdggs/h3.py Outdated Show resolved Hide resolved
xdggs/h3.py Outdated Show resolved Hide resolved
@tinaok
Copy link
Contributor

tinaok commented Oct 8, 2024

Hi, thanks @keewis @kylebarron ,
We will try to use this functionality to solve this issue destination-earth/DestinE_ESA_GFTS#56
@keewis showed us the example notebook here https://github.com/tinaok/xdggs_examples/blob/main/visualisation/xdggs-explore-demo.ipynb
Apart from that I can't use bounding box of latitude and longitude with .sel , my main wish list is to be able to use the global view, it would be super cool if we can demo it with DestinE ClimateDT healpix global earth data...

cc: @yellowcap @olafveerman

ci/environment.yml Outdated Show resolved Hide resolved
@keewis
Copy link
Collaborator Author

keewis commented Oct 12, 2024

looks like the most recent commits got the tests to pass. I'll merge later this evening.

For future PRs:

  • figure out how to install arro3-core from conda-forge on osx-arm
  • add a legend to the map
  • implement sliders for multidimensional data (which replace get_fill_color / the table of the layer)
  • use the constructor methods of geoarrow.rust once they are available on a released version
  • support non-standard names for the cell id coordinate
  • allow customizing the coordinates included in the hover table

@kylebarron
Copy link
Contributor

  • figure out how to install arro3-core from conda-forge on osx-arm

I made conda-forge/arro3-core-feedstock#7

There must be something in the recipe that's preventing the recipe from building for osx-arm, but I don't know what it is. If you had interest in exploring why or how to fix it that would be great

@tinaok
Copy link
Contributor

tinaok commented Oct 13, 2024

looks like the most recent commits got the tests to pass. I'll merge later this evening.

Yeserday when i tried i saw index of healpix id in the figure, but i do not see it.
Is this normal? How can I have healpix id together with lat/lon/data value?

スクリーンショット 2024-10-13 19 04 36

https://github.com/tinaok/xdggs_examples/blob/main/visualisation/xdggs-explore-demo.ipynb

@keewis
Copy link
Collaborator Author

keewis commented Oct 14, 2024

good catch, @tinaok, I appear to have accidentally dropped the cell ids from the hover table. I've extended the list of tasks left for future PRs, but for now I'll go ahead and merge.

@keewis keewis merged commit 774821d into xarray-contrib:main Oct 14, 2024
10 checks passed
@keewis keewis deleted the lonboard branch October 14, 2024 19:08
@keewis keewis restored the lonboard branch October 14, 2024 19:33
@keewis keewis mentioned this pull request Oct 16, 2024
6 tasks
@keewis keewis deleted the lonboard branch October 18, 2024 21:18
@keewis keewis mentioned this pull request Oct 21, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants