This repository contains code for sending metadata from Mozilla-specific platforms to a DataHub instance.
The production instance is https://mozilla.acryl.io
The recipes that handle Looker and BigQuery metadata are managed via UI Ingestion and stored by SRE. Ask in
#data-help
for assistance.
To bootstrap the custom platforms we ingest metadata for, run the platform_recipe.dhub.yaml
recipe:
$ DATAHUB_GMS_URL=... DATAHUB_GMS_TOKEN=... datahub ingest -c recipes/platform_recipe.dhub.yaml
All other recipes can be found in the recipes
directory and can be run similarly using the datahub ingest
command.
├── recipes (.dhub.yaml recipe files - https://datahubproject.io/docs/metadata-ingestion#recipes)
├── sync (source code for metadata fetching and ingestion)
│ ├── datahub (source code for DataHub utils and custom Ingestion Sources - https://datahubproject.io/docs/metadata-ingestion/adding-source)
└── tests (source code and sample data for tests)
To install a local instance of DataHub, see DataHub's Quickstart guide.
Start a DataHub instance locally: Launch Docker Desktop, then run datahub docker quickstart
The initial run will install various packages and can take well over 30 minutes. DataHub will keep running in the background.
Ingest data from a specific source: DATAHUB_GMS_URL="http://localhost:8080" DATAHUB_GMS_TOKEN=None datahub ingest -c recipes/<ingestion_source>.dhub.yaml
.
The local DataHub instance can by default be accessed via: http://localhost:9002/
-
Create a virtual environment:
$ python -m venv venv
-
Activate the virtual environment:
$ source venv/bin/activate
-
Install project dependencies:
$ pip install -r requirements.txt
. This should include the DataHub CLI. -
Install the module locally:
$ pip install -e .
To test whether the code conforms to the linting rules, you can
run make lint
to check Python and Yaml styles.
Running make format
will auto-format the code according to the
style rules.
DataHub - https://datahubproject.io/
Recipe - https://datahubproject.io/docs/metadata-ingestion#recipes