-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example using xarray atmospheric data #32
base: main
Are you sure you want to change the base?
Conversation
@tennlee here's an example of grabbing the coordinate metadata from xarray and passing it to napari. Unfortunately, I'm finding the resampling in xarray to be suuuuper slow. Here's what I mean: In [6]: ds = xr.open_dataset('spec_hum.nc', chunks={'time': 1})
/Users/jni/micromamba/envs/all/lib/python3.11/site-packages/xarray/core/dataset.py:282: UserWarning: The specified chunks separate the stored chunks along dimension "time" starting at index 1. This could degrade performance. Instead, consider rechunking after loading.
In [26]: ds_reg = ds.interp(coords={'time': np.arange(np.array(ds.time[0]), np.a
...: rray(ds.time[-1]), np.array(np.diff(ds.time[:2]))[0])}, method='nearest
...: ')
In [39]: %timeit arr = np.asarray(ds_reg.spec_hum[0])
16.4 s ± 214 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [40]: %timeit arr = np.asarray(ds.spec_hum[0])
518 ms ± 94.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [41]: %timeit arr = np.asarray(ds.spec_hum[0, 0])
124 ms ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [42]: %timeit arr = np.asarray(ds_reg.spec_hum[0, 0])
34.4 s ± 7.88 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [43]: %%timeit
...: arr = np.asarray(ds.spec_hum[0, 0])
...: idxs = np.meshgrid(np.arange(arr.shape[0]), np.arange(arr.shape[1]), in
...: dexing='ij')
...: arr_res = ndi.map_coordinates(arr, idxs, order=0)
...:
...:
486 ms ± 25.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) What's happening behind the scenes in napari is:
(it's also annoying that the model data starts at 0100 and the measurements at 0000 😂) It's still usable if you turn on async mode in napari, which you can do either with the NAPARI_ASYNC=1 environment variable, or by setting the experimental > render images asynchronously checkbox in the preferences (Cmd+, on Mac when the viewer is in focus). Overall though, this is a very cool dataset. I like that it shows off napari's ability to overlay data with different time steps and extents, too. We could also load the temperature volumes and treat them the same way, and make them invisible by default. (Pass visible=False to the layer.) Ideally, I'd like to save (a) the model resampled at 1h intervals, and (b) the measurements as geozarr or some format that xarray can natively read backed by zarr. If we put that online somewhere useful, this could go into the napari sample gallery (must be able to be built without downloading a massive dataset). Any ideas? |
Ah, @tennlee I figured out why the sampling is uneven in the raw data, now that I displayed it without resampling in its own viewer. If you hit play on the first viewer (the grayscale one), you will see that partway through the playback it speeds up. So the time interval increases as you get further into the model — which I guess makes sense since the model wouldn't be able to do hourly precision by that point anyway? |
I might have to get to this on the weekend. Dealing with 3 days of backlog apparently takes time. :) |
Just a note for something I thought would be neat to try. If there's a callback option in napari, it would be interesting to try to install |
yes, this is easy to do! You can do Having said that, in this case I would probably define a dask array based on the scores computation and display that, rather than hook a lower-dimensional array up to the current point. |
(Also btw — no rush on this, other than the excitement of finding a cool reason to collaborate. 😃 As I mentioned over Signal, I'm happy to come in to Melb some day to continue sprinting, when the time is right for you!) |
Cool. Well I've done the setup and reproduced the issue, which is a good first step. |
The chunking is totally optional. I'm not surprised at those results @tennlee because without the chunks argument
At that point, you have a NumPy array and everything should be super fast indeed. But your process will be using heaps of RAM, which is undesirable. |
This seems relevant: https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes |
Oof, found the culprit:
from pydata/xarray#6799 |
Given this limitation I think the solution for making this example work nicely is to save the interpolated array to zarr and host it somewhere. |
Well this is exciting: In [4]: ds = xr.open_zarr('https://zarr-data.xyz/thredds-20241104-spec_hum-resampled.zarr')
In [5]: ds
Out[5]:
<xarray.Dataset> Size: 12GB
Dimensions: (theta_lvl: 4, lat: 1536, lon: 2048, time: 239)
Coordinates:
* lat (lat) float64 12kB 89.94 89.82 89.71 ... -89.71 -89.82 -89.94
* lon (lon) float64 16kB 0.08789 0.2637 0.4395 ... 359.6 359.7 359.9
* theta_lvl (theta_lvl) float32 16B 20.0 53.34 100.0 160.0
* time (time) datetime64[ns] 2kB 2024-11-04T01:00:00 ... 2024-11-13T2...
Data variables:
A_theta (theta_lvl) float32 16B dask.array<chunksize=(1,), meta=np.ndarray>
B_theta (theta_lvl) float32 16B dask.array<chunksize=(1,), meta=np.ndarray>
spec_hum (time, theta_lvl, lat, lon) float32 12GB dask.array<chunksize=(1, 1, 1536, 2048), meta=np.ndarray>
Attributes:
Conventions: CF-1.5,ACDD-1.3
base_date: 20241104
base_time: 0
date_created: 20241104
expt_id: 0001
institution: Australian Bureau of Meteorology
modl_vrsn: ACCESS-G
source: APS3
summary: forecast data
title: forecast data |
Example to view xarray atmospheric model data in napari.
The example data is too large to display in our gallery, so I'm making the PR
against my own repo for discussion. We can try the same approach with a smaller
dataset, or with a dataset in a cloud-native format like zarr, where therefore
the gallery would only need to load a single timepoint.
Code to download the data in this example:
As noted in the example itself, napari's axes ([plane], row, column), with the
origin at the top left, match NumPy arrays but are unsuitable for latitude
data, which starts at 90 at the top and ends at -90 at the bottom. The polarity
of the latitude is therefore inverted for plotting, but we must add the ability
in napari to describe the geometry of the world space relative to canvas space.