-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dbt-cloud integration command to dp cli #99
base: develop
Are you sure you want to change the base?
Conversation
We will also need upgrade documentation docs dir in this repo |
I don't see any documentation. I have no idea how dbtcloud.yml should look like. |
credentials_id = client.create_credentials(environment["dataset"], project_id) | ||
else: | ||
credentials_id = None | ||
environment_id = client.create_environment(project_id, environment["type"], environment["name"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would extract the IF + env creation to separate method create_environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/configuration.rst
Outdated
- string | ||
- The cron expression with which the example job will be run | ||
* - default_gcp_project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there shouldn't be a default one, it always should be taken from bigquery.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- string | ||
- Target dataset for this environment | ||
* - dbt_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it per environment? Can we make it global?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In dbt Cloud this is set for each environment: https://docs.getdbt.com/docs/collaborate/environments/dbt-cloud-environments#common-environment-settings We could make the setting the same for all environments, but this way we would be limited to one version. I assume that someone might want to test the code on a separate environment e.g. before upgrading dbt for the whole project. Are you sure I should make such a change?
docs/configuration.rst
Outdated
- string | ||
- The dbt version used in this environment | ||
* - bq_config_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would either remove "bq_" prefix or use env name as dev/prod etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/configuration.rst
Outdated
- string | ||
- Name of the environment that will be created in dbt Cloud | ||
* - dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset should also be taken from bigquery.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- Details of the environments to be created in dbt Cloud | ||
|
||
Configuration of the environments: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field "type" is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
bq_config = read_bigquery_config(environment["bq_config_dir"]) | ||
environments_projects[environment["name"]] = bq_config["project"] | ||
|
||
client.create_environment_variable(project_id, dbtcloud_config["default_gcp_project"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need default_gcp_project. This default is in "base" config dir and you already read it in read_bigquery_config method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I changed it to be taken from "base." config I assume that the only use of this value would be if someone added another environment and did not update this environment variable.
dbtcloud_config = read_dbtcloud_config() | ||
file = open(keyfile) | ||
keyfile_data = json.load(file) | ||
project_id = client.create_project(dbtcloud_config["project_name"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change the name of vaialble into dbtcloud_project_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for environment in dbtcloud_config["environments"]: | ||
environment_id = create_environment(client, environment, project_id) | ||
if environment["type"] == "deployment": | ||
client.create_job(project_id, environment_id, dbtcloud_config["schedule_interval"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schedule interval could be per environment I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
new_env = { | ||
"env_var": env_var | ||
} | ||
print(new_env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove print or replace with logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
client.create_job(project_id, environment_id, dbtcloud_config["schedule_interval"], | ||
"Job - " + environment["name"]) | ||
bq_config = read_bigquery_config(environment["bq_config_dir"]) | ||
environments_projects[environment["name"]] = bq_config["project"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it resolve the project properly? In bigquery.yml we have project: "{{ env_var('GCP_PROJECT') }}" currently. It should be taken from env durring deployment isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added resolving env vars using dbt show
command.
# Environment variable
# Code formatting
# Documentation
# remove resolving jinja / env vars
# code formatting
Adds a command configure-cloud to dp cli that creates dbt Cloud project with configured connection to BigQuery.