Improve performance of `delete+insert` incremental strategy #151

ataft · 2024-04-10T18:09:04Z

resolves #150
resolves #364

Problem

The delete query for the 'delete+insert' incremental_strategy with 2+ unique_key columns is VERY inefficient. In many cases, it will hang and never return for deleting small amounts of data (<100K rows).

Solution

Improve the query by switching to a much more efficient delete strategy:

delete from table1
where (col1, col2) in (
    select distinct col1, col2 from table1_tmp
)

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development, and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX

resolves dbt-labs#150 Problem The delete query for the 'delete+insert' incremental_strategy with 2+ unique_key columns is VERY inefficient. In many cases, it will hang and never return for deleting small amounts of data (<100K rows). Solution Improve the query by switching to a much more efficient delete strategy: ``` delete from table1 where (col1, col2) in ( select distinct col1, col2 from table1_tmp ) ```

Fleid · 2024-04-18T19:38:20Z

Hey @ataft, thank you for opening this here. Would you be comfortable writing tests for this PR?

ataft · 2024-04-22T16:03:32Z

@Fleid The existing tests should cover this. However, the issue with the original logic is that it technically works, but only for small amounts of data. Therefore, the tests do not catch the issue. To truly test, you need a database and ~100K rows. I'm not sure what dbt's strategy is for this.

dbeatty10 · 2024-05-30T14:58:27Z

dbt/include/global_project/macros/materializations/models/incremental/merge.sql

-            using {{ source }}
-            where (
-                {% for key in unique_key %}
-                    {{ source }}.{{ key }} = {{ target }}.{{ key }}


@peterallenwebb and I took a look at this PR today.

This line of code is updated by #110, so we think that PR should be reviewed/merged prior to reviewing this PR further.

dbeatty10 · 2024-12-12T02:18:10Z

Here's the commands I'm using to do testing on this PR:

gh pr checkout 151
git push origin ataft/main

This created this branch in the dbt Labs org: ataft/main.

Then we can use that branch within individual GHA workflows for each adapter by following the process described here: #372 (comment).

Here is the result:

ataft mentioned this pull request Apr 10, 2024

Fix incremental delete+insert SQL dbt-labs/dbt-core#9459

Closed

5 tasks

dbeatty10 added the performance label May 15, 2024

dbeatty10 changed the title ~~Fix incremental delete+insert SQL~~ Improve performance of delete+insert incremental strategy May 15, 2024

dbeatty10 assigned dbeatty10 and peterallenwebb May 30, 2024

dbeatty10 reviewed May 30, 2024

View reviewed changes

colin-rogers-dbt added the incremental Incremental modeling with dbt label Aug 9, 2024

Merge branch 'main' into main

b8b8a7f

colin-rogers-dbt requested a review from a team as a code owner August 9, 2024 00:16

cla-bot bot added the cla:yes label Aug 9, 2024

colin-rogers-dbt added the community This PR is from a community member label Aug 9, 2024

Merge branch 'main' into main

66facbc

dbeatty10 mentioned this pull request Nov 26, 2024

[Bug] Microbatch: Using backfill results in inefficient delete statements (Snowflake adapter) #364

Open

2 tasks

jtcohen6 mentioned this pull request Nov 28, 2024

[Feature] Optimize incremental 'insert_overwrite' strategy dbt-labs/dbt-bigquery#1409

Open

3 tasks

Merge branch 'main' into main

5e89033

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `delete+insert` incremental strategy #151

Improve performance of `delete+insert` incremental strategy #151

ataft commented Apr 10, 2024 •

edited by dbeatty10

Loading

Fleid commented Apr 18, 2024

ataft commented Apr 22, 2024

dbeatty10 May 30, 2024

dbeatty10 commented Dec 12, 2024

Improve performance of delete+insert incremental strategy #151

Are you sure you want to change the base?

Improve performance of delete+insert incremental strategy #151

Conversation

ataft commented Apr 10, 2024 • edited by dbeatty10 Loading

Problem

Solution

Checklist

Fleid commented Apr 18, 2024

ataft commented Apr 22, 2024

dbeatty10 May 30, 2024

Choose a reason for hiding this comment

dbeatty10 commented Dec 12, 2024

Improve performance of `delete+insert` incremental strategy #151

Improve performance of `delete+insert` incremental strategy #151

ataft commented Apr 10, 2024 •

edited by dbeatty10

Loading