Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gain familiarity with zarr #5

Open
6 tasks
tjcrone opened this issue Feb 16, 2018 · 0 comments
Open
6 tasks

Gain familiarity with zarr #5

tjcrone opened this issue Feb 16, 2018 · 0 comments

Comments

@tjcrone
Copy link
Owner

tjcrone commented Feb 16, 2018

Zarr is a storage layer for multidimensional numerical arrays. It allows array data to be chunked, compressed, and read to / written from a range of different storage mediums. Via gcsfs, zarr can store its data directly in google cloud storage buckets. Xarray recently gained the ability to read and write to zarr stores. We will need to figure out how to use zarr in Azure.

The following combination of packages will likely provide an ideal way to work with CAMHD datasets in the cloud: xarray + dask + zarr + gcsfs (or whatever the equivalent is on Azure).

To get spun up on zarr, take the following steps:

  • Skim the zarr documentation
  • Skim the gcsfs documentation
  • Practice creating zarr arrays and storing them to disk
  • Practice writing and reading zarr arrays to google cloud storage via gcsfs
  • Practice using xarray to write / read datasets to zarr stores on disk
  • Practice using xarray to write / read datasets to zarr stores in google cloud storage via gcsfs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant