Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dbt-cloud integration command to dp cli #99

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

rdziadosz
Copy link

Adds a command configure-cloud to dp cli that creates dbt Cloud project with configured connection to BigQuery.

data_pipelines_cli/cli_commands/cloud.py Outdated Show resolved Hide resolved
data_pipelines_cli/cli_commands/cloud.py Outdated Show resolved Hide resolved
data_pipelines_cli/cli_commands/cloud.py Outdated Show resolved Hide resolved
data_pipelines_cli/cli_commands/cloud.py Outdated Show resolved Hide resolved
data_pipelines_cli/cli_commands/cloud.py Outdated Show resolved Hide resolved
data_pipelines_cli/dbt_cloud_api_client.py Outdated Show resolved Hide resolved
data_pipelines_cli/dbt_cloud_api_client.py Outdated Show resolved Hide resolved
data_pipelines_cli/dbt_cloud_api_client.py Outdated Show resolved Hide resolved
data_pipelines_cli/cli_commands/cloud.py Outdated Show resolved Hide resolved
data_pipelines_cli/cli_commands/cloud.py Outdated Show resolved Hide resolved
@p-pekala
Copy link
Collaborator

We will also need upgrade documentation docs dir in this repo

@p-pekala
Copy link
Collaborator

I don't see any documentation. I have no idea how dbtcloud.yml should look like.

data_pipelines_cli/cli_commands/dbtcloud.py Outdated Show resolved Hide resolved
credentials_id = client.create_credentials(environment["dataset"], project_id)
else:
credentials_id = None
environment_id = client.create_environment(project_id, environment["type"], environment["name"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract the IF + env creation to separate method create_environment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

data_pipelines_cli/cli_commands/dbtcloud.py Show resolved Hide resolved
data_pipelines_cli/dbt_cloud_api_client.py Outdated Show resolved Hide resolved
- string
- The cron expression with which the example job will be run
* - default_gcp_project
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there shouldn't be a default one, it always should be taken from bigquery.yml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- string
- Target dataset for this environment
* - dbt_version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it per environment? Can we make it global?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dbt Cloud this is set for each environment: https://docs.getdbt.com/docs/collaborate/environments/dbt-cloud-environments#common-environment-settings We could make the setting the same for all environments, but this way we would be limited to one version. I assume that someone might want to test the code on a separate environment e.g. before upgrading dbt for the whole project. Are you sure I should make such a change?

- string
- The dbt version used in this environment
* - bq_config_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would either remove "bq_" prefix or use env name as dev/prod etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- string
- Name of the environment that will be created in dbt Cloud
* - dataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset should also be taken from bigquery.yml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Details of the environments to be created in dbt Cloud

Configuration of the environments:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field "type" is missing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

bq_config = read_bigquery_config(environment["bq_config_dir"])
environments_projects[environment["name"]] = bq_config["project"]

client.create_environment_variable(project_id, dbtcloud_config["default_gcp_project"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need default_gcp_project. This default is in "base" config dir and you already read it in read_bigquery_config method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I changed it to be taken from "base." config I assume that the only use of this value would be if someone added another environment and did not update this environment variable.

dbtcloud_config = read_dbtcloud_config()
file = open(keyfile)
keyfile_data = json.load(file)
project_id = client.create_project(dbtcloud_config["project_name"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the name of vaialble into dbtcloud_project_id

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for environment in dbtcloud_config["environments"]:
environment_id = create_environment(client, environment, project_id)
if environment["type"] == "deployment":
client.create_job(project_id, environment_id, dbtcloud_config["schedule_interval"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schedule interval could be per environment I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

new_env = {
"env_var": env_var
}
print(new_env)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove print or replace with logging

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

client.create_job(project_id, environment_id, dbtcloud_config["schedule_interval"],
"Job - " + environment["name"])
bq_config = read_bigquery_config(environment["bq_config_dir"])
environments_projects[environment["name"]] = bq_config["project"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it resolve the project properly? In bigquery.yml we have project: "{{ env_var('GCP_PROJECT') }}" currently. It should be taken from env durring deployment isn't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added resolving env vars using dbt show command.

# remove resolving jinja / env vars
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants