-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Data Guide #162
Refactor Data Guide #162
Conversation
for more information, see https://pre-commit.ci
👋 Thanks for opening this PR! The Cookbook will be automatically built with GitHub Actions. To see the status of your deployment, click below. |
Ok I finally finished this major refactor of the docs. @SammyAgrawal @norlandrhagen if you could review the preview above (once it is done building) that would be tremendously helpful. If you think in particular the Catalog and Ingestion parts are helpful enough, we can finally unblock the official release of the catalog! |
Nice work @jbusecke! A few things:
![]() Maybe it can link directly to the pangeo-forge-recipes docs. Same with the Zarr link.
|
I really like the addition of the data guide, I think that is really awesome!
There are some small edits I want to make (language cleanup) and will circle back on some organizing stuff. Overall I think this is great though! |
Thanks for all the feedback @norlandrhagen and @SammyAgrawal. An overall issue is that some of the text is hella old, as you pointed out, and we should work to replace it slowly, not sure if we want to do it all in one?
I personally see the libary = catalog + data storage. Is there some way to highlight this?
This is sort of emblematic of that issue, we need to find a way to reconcile the old 'vision' language with what is actually implemented. I for now kept most of the old language, but I agree this is a bit confusing.
@norlandrhagen It would be great if you could add an admonition with some detail on that here!
Good question here. I was thinking that we might add a short description in the reference at some point + a link. That might be better for people to quickly look up things while staying in the same website. But on the other hand I see your point...not quite sure how to handle this TBH.
Fixed. Thanks for spotting this.
Certainly room for improvement, but I would love to split that into another PR. Could you lead that effort (maybe just start with an issue linked to this?).
I have some ideas, but am not 100% sure, but I am happy to work on this. I think of these more as "how do we deal with data", e.g. always ask before using someones data, always try to be as open as possible, etc.
This is really relevant! Can you be a bit more specific what you find confusing? Inline comments/suggestions on the code/text would be most helpful here. Thank you. |
Can we merge this? |
book/guides/data_guide.md
Outdated
To start ingesting a dataset follow these steps: | ||
|
||
1. Add a new [dataset_request](https://github.com/leap-stc/data-management/issues/new?assignees=&labels=dataset&projects=&template=new_dataset.yaml&title=New+Dataset+%5BDataset+Name%5D) in the [data-management](https://github.com/leap-stc/data-management) repo so there is a central place where people can suggest datasets for ingestion and follow progress. | ||
1. Start a feedstock for your dataset. We organize any kind of data that is part of the [](explanation.architecture.data-library) in its own repository under the `leap-stc` github organization. Please use [our Template](https://github.com/leap-stc/LEAP_template_feedstock) to get started. Based on the 3 [types of data](explanation.data-policy.types) we host in the [](explanation.architecture.data-library) there are different ways of ingesting data (with specific instructions provided in the [template feedstock](https://github.com/leap-stc/LEAP_template_feedstock)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an information dense paragraph with almost every sentence linking to something. Could we break it up into maybe two bullet points with more explanation?
Like: "start a feedstock for your dataset. A feedstock is ____".
I think too many links also make things scarier? Creates impression that there are lots of moving parts; maybe we could explain/ link to "data library" at the top and then not have the text linked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest sentence structure like:
- Start a feedstock for your dataset. Any kind of data that is part of the is organized via its own repository under the
leap-stc
github organization. Pangeo Forge recipes are deployed via Github Actions in these repositories, referred to as feedstocks. Please use our Template to get started.
(Separate the data specific parameters into third bullet point)
3. Based on the 3 types of data we host in the there are different ways of ingesting data.
- LEAP Curated: data from an existing (public, egress-free) ARCO dataset should be linked to the
- LEAP ingested: data which exists in legacy formats (e.g. netcdf) that is transformed into an ARCO copy on . The preferred method to do that is to use .
- (Work in Progress): Creating a virtual zarr store from existing publically hosted legacy format data (e.g. netcdf)
- (with specific instructions provided in the template feedstock):
for more information, see https://pre-commit.ci
Merging this now. |
This is an effort to update the data-guide with some new instructions reflecting the pgf ingestion pipeline and the now available catalog.
Closes Define LEAP Affiliated course and link in EDU guide. #169
Reorganize content more strictly following the diviso categories (might factor that out as another PR). A sketched out plan
![IMG_3380](https://private-user-images.githubusercontent.com/14314623/343982849-72fa4c10-3df1-4ba7-8d6e-66b45fe24549.jpeg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMDg4MTMsIm5iZiI6MTczOTMwODUxMywicGF0aCI6Ii8xNDMxNDYyMy8zNDM5ODI4NDktNzJmYTRjMTAtM2RmMS00YmE3LThkNmUtNjZiNDVmZTI0NTQ5LmpwZWc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQyMTE1MTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xYzg0MGQzMGVmNTQzMDFmNDliNzk4ZDdiYWJlZTY3Yjk4ZDE4MDljMmE1YjNlMTQ2OTU4Y2E4YWViYzNjZmFkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.OYUKEvAW5jtPfON7xdfZ0Qhwe4do8hPfTLviPs3Hdk8)
Break up Data and Compute Guide
Create all the empty links referenced in here.
Break up and move the auth/transfer instructions