Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithread support for stage_external_sources #280

Closed
azdoherty opened this issue Apr 8, 2024 · 2 comments
Closed

Multithread support for stage_external_sources #280

azdoherty opened this issue Apr 8, 2024 · 2 comments
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@azdoherty
Copy link

Describe the feature

Stage external resources would run a lot faster if it used multiple threads for multiple tables

Describe alternatives you've considered

I had previously used a pre-hook before each model that referenced an external table, which as they were part of the models did run in parallel. This implementation was a bit messy though as the external table did not appear in the DAG and you had to include a CREATE OR REPLACE EXTERNAL TABLE ... in your model

Additional context

I have only used this in bigquery

Who will this benefit?

Anyone with a lot of external tables they need to stage before each build - I have 10 and it takes over a minute, and it will scale linearly with the number of external tables

@azdoherty azdoherty added enhancement New feature or request triage labels Apr 8, 2024
@azdoherty
Copy link
Author

Should I close this due to the discussion here? dbt-labs/dbt-adapters#92

@jeremyyeo jeremyyeo added wontfix This will not be worked on and removed triage labels Apr 8, 2024
@jeremyyeo
Copy link
Collaborator

jeremyyeo commented Apr 8, 2024

Hey @azdoherty definitely move that discussion over there. Fwiw - this is probably a dbt-core library issue - it's not possible to run SQL statements in parallel today - dbt-external-table package or otherwise. I've provided the same workarounds as you have done - via hooks since models can run in parallel and some other funky patterns using custom materializations: https://gist.github.com/jeremyyeo/b61655a3e5a52eb27640363650c79a1e - idea is the same though - models run in parallel (up to threads config) so use that mechanism to do parallel run operations instead.

However - this is primarily a dbt-core / dbt-adapters library issue imho.

Additionally this is likely a dupe of #109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants