-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create new feature for running and logging dbt commands #841
Conversation
Recent updates to dbt enable direct execution of dbt commands in python without needing to open a shell process. Using that dbt feature would be an improvement to this method. Docs on the new dbt feature here. |
6070ec9
to
5feee3c
Compare
This looks great @austinweisgrau! Although I'm unfamiliar with dbt myself and hoping someone else will be able to also review this. Some more general questions/thoughts:
Thank you again! |
Good questions!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some structural thoughts around this which I've tried to convey in comments, but I'm also happy to hop on a call if you're up for that - it might make explaining easier. I can also take a stab at doing the refactoring myself if you don't have the time to spend on making this a little more generalizable.
I think probably this should go in the utilities
folder. Which also answers the question of where the docs should go. The utilities are currently terribly underdocumented, but you can add a general description to this file, plus the autodocs syntax so it will grab your docstrings. Let me know if you need more instruction.
parsons/dbt/dbt.py
Outdated
logger = logging.getLogger(__name__) | ||
|
||
|
||
class dbtLogger: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm more concerned with abstracting from "Redshift" to "works with all databases" but we could also abstract from "Slack" to "works with any notification system".
To be clear, I'm not asking you to build that capability, just thinking on how we could structure this so it's easy for someone else to come build that capability. In particular, I'm thinking about dependency injection/inversion of control. I don't have much CS background - do you? My understanding is that in cases like these we'd want to create a logger that accepts a notification/messaging object and calls message(
on it and a runner that accepts a database connector and calls query(
on it but doesn't care about the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through this whole thing, I realize now that you're not really using any of the DB connectors directly so you're not calling query(
. Instead, you're using the dbt_redshift
dependency and sending connection info and commands directly to it.
So the challenge becomes whether we can abstract over the dbt_redshift
interface vs the dbt_bigquery
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is resolved with 088f974, which allows for this all to work with any database engine that works with dbt. This is actually handled at the level of package dependencies rather than in the code itself. So now the dbt extra can be installed with any database-specific version of dbt. e.g. pip install parsons[dbt-redshift]
, pip install parsons[dbt-bigquery]
etc
parsons/dbt/dbt.py
Outdated
full_log_message += f"\n*Duration:* {duration_time_str}\n\n" | ||
full_log_message += "\n".join(self.log_messages) | ||
|
||
if self.slack_webhook: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the main (only?) place where the logger is calling Slack directly, so it should be straightforward to lift out. I think the main challenge is handling the conditional here, on whether to use the webhook or the API key.
Also :red_circle:
etc is going to be platform-specific but I don't think it's a big deal if a notification has :red_circle:
instead of an actual red circle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written, if no slack configs are provided (no API key or webhook is passed in), this if/else block will not execute on any code and logging will remain limited to stdout.
Parsons has a few other notifications interfaces, but for a few reasons I don't think it's important to generalize the notifications of this particular utility over them at this time:
- We discussed this PR in the TMC engineering working group, and no one had a use case besides Slack for notifications
- Parsons itself doesn't have a generic notifications API, making it a bit cumbersome to try and build one into this PR
parsons/dbt/dbt.py
Outdated
def run(self) -> None: | ||
for command in self.commands: | ||
self.dbt_command(command) | ||
self.dbt_logger.send_to_slack() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd also want this line to be generic, like send_notification
instead of Slack specifically.
a9273f6
to
0d67f5f
Compare
80f490d
to
32269ff
Compare
Ok I finally looped back and made some updates:
|
I'm planning on re-reviewing this in a few weeks, after we've done the DBT Fundamentals cohort and I have a better mental sense of what DBT is and how it fits with Parsons. Just wanted to update this PR so no one reads it and wonders why it's been sitting un-merged for over a month. |
Prior implementation had the arguments in the wrong order
Hi @austinweisgrau - sorry I'm just now getting back to this. This still looks good to me, and after taking the dbt course I better understand the value of this utility. I'm going to update the branch and merge it. |
dbt is an increasingly popular tool in the data analytics community, for very good reason!
dbt's primary interface is through shell commands, which makes interacting with dbt commands programmatically somewhat difficult.
This new feature enables executing dbt commands from python, logs results to the python logger and optionally can also send formatted results to slack. The slack formatted message follows the formatting convention that dbt cloud uses for its slack messages.