-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
temperature on Flask Glacier for UNAVCO #1
Comments
This notebook @ecglazer wrote puts does the first item above. Something to be careful about before we write it to the google bucket is the chunk sizes:
@ecglazer, what do you get when you run
? |
and I'll send you the token separately, so it doesnt go into this public repo |
just made a pull request with a new version of the notebook that splits the dataset into appropriately sized chunks - they're each about 79 MiB. when i try to write to the google bucket, it crashes the kernel. the daily resolution dataset is ~94 GB, and the 3-hourly resolution dataset is ~7 GB. this is when using the pangeo cloud. |
Thanks for working on this! It's useful to link to the pull request here so we see the code. Why is the higher resolution dataset smaller in volume? Are you using a dask cluster? |
No problem! Here is the pull request: #3 The higher res dataset is smaller because it only covers a few years (2016-2021) and only includes t2m, while the daily dataset covers 1979-2018 and includes several variables. I'm not familiar with how to use a dask cluster, but I can look into it |
@ecglazer just showed me that the dask cluster is crashing when try to write even small versions of the RACMO data to zarr. This could be due to the dataset being made up of many many dataarrays. But I think the most likely issue is that the netcdfs that are being read are stored in @ecglazer's pangeo notebook workspace. A better option is probably to put the netcdfs in the google bucket first, then read them from there. As described here I will also need to add you as a user in the google cloud account. |
You will also need to download and install the google cloud command-line interface: https://cloud.google.com/sdk/gcloud#download_and_install_the |
Thanks for help, @jkingslake . I put all of the daily data in the Google bucket here: https://console.cloud.google.com/storage/browser/ldeo-glaciology/RACMO/RACMO_daily_by_var |
That's great. Have you managed to lazily load them? |
Yes, I'm able to lazily load each dataset in Jupyter, but the notebook memory still fills up quickly when I try to save to zarr (I tried with just 100 timesteps of one dataarray) |
hmm, ok. The only thing I have found so far that might help is that the following fails in the same way you were finding.
This loads the data as one big chunk, which I think is why it fails. While the following successfully provides the mean value of t2m (265.21317 K)
This chunks the data as it is loaded and makes it possible to sppread the computation between multiple workers - i used 20 in this text and it took a couple of minutes. |
update:
Incidentally, taking the mean of all the t2m data with A notebook demonstrating all this can be found here: https://github.com/ldeo-glaciology/AntPen_NSF_NERC/blob/chunk_edits/RACMO_Loading_JK.ipynb |
p.s. I am using the LEAP pangeo. |
@ecglazer, did you end up making any more progress on this? |
@ecglazer, I have added a notebook on writing the full AP racmo data to zarr: https://github.com/ldeo-glaciology/AntPen_NSF_NERC/blob/racmo_zarr_JK/merging_RACMO_vars.ipynb The full AP racmo dataset can be loaded with
When you get a chance could you go through the directories in https://console.cloud.google.com/storage/browser/ldeo-glaciology/RACMO/ and delete what you dont need anymore? |
@ecglazer, dod you get a chance to go through the directories in https://console.cloud.google.com/storage/browser/ldeo-glaciology/RACMO/ and delete what you dont need anymore? |
Use AP RACMO output to compute an annual average air temperature and a seasonal climatology.
We can use this as impetus to put the AP RACMO data that @ecglazer has in the google bucket.
The text was updated successfully, but these errors were encountered: