Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] support copy multiple tables in parallel using copy_partitions #1237

Open
3 tasks done
Klimmy opened this issue May 15, 2024 · 0 comments · May be fixed by #1413
Open
3 tasks done

[Feature] support copy multiple tables in parallel using copy_partitions #1237

Klimmy opened this issue May 15, 2024 · 0 comments · May be fixed by #1413
Labels
enhancement New feature or request python_models

Comments

@Klimmy
Copy link

Klimmy commented May 15, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Python BigQuery Client supports asynchronous copy jobs while the dbt-bigquery adapter sends BigQuery requests one by one (using incremental_strategy = 'insert_overwrite' with copy_partitions=true).

We can achieve a better performance if we start sending requests in small batches of partitions.

dbt-bigquery already supports parallel execution in the copy_bq_table function.
But in the bq_copy_partitions macro partitions are sent one at a time.

We can probably implement this feature by introducing a batch_size argument to the configs:

{{ config(
    materialized = 'incremental',
    incremental_strategy = 'insert_overwrite',
    partition_by = {
      "field": "day",
      "data_type": "date",
      "copy_partitions": true,
      "batch_size": 5
    }
) }}

Default value will be 1. And bq_copy_partitions macro will send a list of partitions to the copy_bq_table, where the size of list = batch_size.

Describe alternatives you've considered

No response

Who will this benefit?

Anyone who has high amount of heavy BigQuery partitions.

Are you interested in contributing this feature?

Definitely, just need a green light to proceed

Anything else?

No response

@Klimmy Klimmy added enhancement New feature or request triage labels May 15, 2024
@amychen1776 amychen1776 added python Pull requests that update Python code python_models and removed triage python Pull requests that update Python code labels Aug 27, 2024
AxelThevenot pushed a commit to AxelThevenot/dbt-bigquery that referenced this issue Nov 25, 2024
@AxelThevenot AxelThevenot linked a pull request Nov 25, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python_models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants