Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request review of metadata #1

Open
MichaelTiemannOSC opened this issue Oct 8, 2021 · 4 comments
Open

request review of metadata #1

MichaelTiemannOSC opened this issue Oct 8, 2021 · 4 comments
Assignees
Labels

Comments

@MichaelTiemannOSC
Copy link
Contributor

Please review and comment on the metadata implementation here:

https://github.com/os-climate/wri-gppd-ingestion-pipeline/blob/master/notebooks/WRI-gppd-ingest.ipynb

Relates to: os-climate/os_c_data_commons#48

@MichaelTiemannOSC
Copy link
Contributor Author

I have pushed a new branch that adapts the pyarrow ideas that Vincent shared yesterday: https://github.com/os-climate/wri-gppd-ingestion-pipeline/tree/metadata-v1

Please have a look and comment.

@caldeirav
Copy link
Contributor

I have now moved the metadata implementation to DBT pipelines - as OpenMetadata is able to ingest metadata from catalog.json which is generated and versioned when generating DBT documentation

@MichaelTiemannOSC
Copy link
Contributor Author

I just reviewed the notebook and see that it is unchanged since October 2021. It needs a complete overhaul in terms of credentials.env variable names, use of osc-ingest-tools, among other things. What's the best way to both true up this notebook to modern standards and true up to new DBT+OpenMetadata?

@caldeirav
Copy link
Contributor

caldeirav commented Aug 6, 2022

I suggest deprecating the older notebook as I am essentially rebuilding the pipeline from scatch. But keep it around so you can have a look when I complete the end-to-end flow, as you may want to make some functional changes (note: most of the data processing that was in the notebook should be in DBT now).

I have already checked in the notebooks for extraction and loading, with the data transformation now being shifted to DBT together with metadata ingestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants