A Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources. Currently supports the CSV file format.
- 1. Environment setup
- 2. Manage Tags
- 3. How to contribute
Using virtualenv is optional, but strongly recommended unless you use Docker.
This is recommended so all related stuff will reside at the same place, making it easier to follow the next instructions.
mkdir ./datacatalog-tag-manager
cd ./datacatalog-tag-manager
All paths starting with ./
in the next steps are relative to the datacatalog-tag-manager
folder.
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
pip install --upgrade datacatalog-tag-manager
Docker may be used as an option to run datacatalog-tag-manager
. In this case, please disregard
the above virtualenv setup instructions.
git clone https://github.com/ricardolsmendes/datacatalog-tag-manager
cd ./datacatalog-tag-manager
BigQuery Metadata Viewer
Data Catalog TagTemplate User
- A custom role with
bigquery.datasets.updateTag
andbigquery.tables.updateTag
permissions
./credentials/datacatalog-tag-manager.json
This step may be skipped if you're using Docker.
export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-tag-manager.json
- SCHEMA
The metadata schema to create or update Tags is presented below. Use as many lines as needed to describe all the Tags and Fields you need.
Column | Description | Mandatory |
---|---|---|
linked_resource OR entry_name | Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries | ✓ |
template_name | Resource name of the Tag Template for the Tag | ✓ |
column | Attach Tags to a column belonging to the Entry schema | ✗ |
field_id | Id of the Tag field | ✓ |
field_value | Value of the Tag field | ✓ |
- SAMPLE INPUT
- sample-input/upsert-tags for reference;
- Data Catalog Sample Tags (Google Sheets) might help to create/export a CSV file.
- COMMANDS
Python + virtualenv
datacatalog-tags upsert --csv-file <CSV-FILE-PATH>
Docker
docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
--volume <CREDENTIALS-FILE-DIR>:/credentials --volume <CSV-FILE-DIR>:/data \
datacatalog-tag-manager upsert --csv-file /data/<CSV-FILE-PATH>
- SCHEMA
The metadata schema to delete Tags is presented below. Use as many lines as needed to delete all the Tags you want.
Column | Description | Mandatory |
---|---|---|
linked_resource OR entry_name | Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries | ✓ |
template_name | Resource name of the Tag Template of the Tag | ✓ |
column | Delete Tags from a column belonging to the Entry schema | ✗ |
- SAMPLE INPUT
- sample-input/delete-tags for reference;
- Data Catalog Sample Tags (Google Sheets) might help to create/export a CSV file.
- COMMANDS
Python + virtualenv
datacatalog-tags delete --csv-file <CSV-FILE-PATH>
Docker
docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
--volume <CREDENTIALS-FILE-DIR>:/credentials --volume <CSV-FILE-DIR>:/data \
datacatalog-tag-manager delete --csv-file /data/<CSV-FILE-PATH>
Please make sure to take a moment and read the Code of Conduct.
Please report bugs and suggest features via the GitHub Issues.
Before opening an issue, search the tracker for possible duplicates. If you find a duplicate, please add a comment saying that you encountered the problem as well.
Please make sure to read the Contributing Guide before making a pull request.