Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3582][CORE][VL][CH] Refactor filter pushdown logic #4582

Merged
merged 3 commits into from
Jan 31, 2024

Conversation

baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Jan 30, 2024

What changes were proposed in this pull request?

Vanilla spark just push down part of filter condition into scan, however gluten can push down all filters in some cases, it was done in #4132. But, Clickhouse backend can not do this in parquet file format. we can add this later

This PR add postProcessPushDownFilter in SparkPlanExecApi, CHSparkPlanExecApi overwrite this funtion to add its own logic.

Note for CH

Consider TPCH 22 c_acctbal > (select avg(c_acctbal) from customer where ...). it can be push down before this PR. I disable it for ch backend, since I want to make push down functionality same as vanilla spark first.

Update for ScanTransformerFactory

Before this PR, createxxxTransformer assume pass the extra ilters, after this PR, we pass all push down filters, this is because we may remove unsupported filters

def createFileSourceScanTransformer(
      scanExec: FileSourceScanExec,
      reuseSubquery: Boolean,
      allPushDownFilters: Option[Seq[Expression]] = None,  // extraFilters: Seq[Expression] = Seq.empty,
      validation: Boolean = false): FileSourceScanExecTransformer = {}

  def createBatchScanTransformer(
      batchScan: BatchScanExec,
      reuseSubquery: Boolean,
      allPushDownFilters: Option[Seq[Expression]] = None,  //pushdownFilters: Seq[Expression] = Seq.empty,
      validation: Boolean = false): SparkPlan = {}

How was this patch tested?

Using existed UT

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen changed the title [CORE][VL][CH] Refactoer filter pushdown logic [CORE][VL][CH] Refactor filter pushdown logic Jan 30, 2024
@baibaichen baibaichen force-pushed the feature/refactor_push_down branch from 1ae57ab to 51d0c2c Compare January 30, 2024 07:09
@baibaichen
Copy link
Contributor Author

@liujiayi771 please review it.

Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/refactor_push_down branch from 51d0c2c to df72fa7 Compare January 30, 2024 08:26
Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/refactor_push_down branch from 2a104e5 to 7fcdee7 Compare January 30, 2024 10:09
Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/refactor_push_down branch from 7fcdee7 to 932d2ca Compare January 30, 2024 12:32
Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

/Benchmark Velox

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_4582_time.csv log/native_master_01_29_2024_f040fe8d6_time.csv difference percentage
q1 33.61 33.80 0.191 100.57%
q2 24.67 25.67 1.002 104.06%
q3 38.57 38.07 -0.493 98.72%
q4 38.11 40.59 2.473 106.49%
q5 71.83 69.30 -2.534 96.47%
q6 7.36 7.04 -0.328 95.55%
q7 82.52 85.84 3.313 104.01%
q8 82.64 86.69 4.047 104.90%
q9 122.29 121.85 -0.443 99.64%
q10 42.74 43.51 0.775 101.81%
q11 20.19 19.98 -0.210 98.96%
q12 24.66 29.14 4.474 118.14%
q13 45.36 45.40 0.047 100.10%
q14 19.46 15.65 -3.806 80.44%
q15 27.01 29.66 2.649 109.81%
q16 14.68 14.28 -0.395 97.31%
q17 100.45 102.13 1.678 101.67%
q18 151.04 149.99 -1.045 99.31%
q19 14.81 13.92 -0.885 94.02%
q20 28.58 26.90 -1.678 94.13%
q21 225.26 225.67 0.409 100.18%
q22 13.89 13.48 -0.405 97.09%
total 1229.73 1238.57 8.836 100.72%

@baibaichen baibaichen force-pushed the feature/refactor_push_down branch from 932d2ca to 4f3f4e0 Compare January 30, 2024 14:19
Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/refactor_push_down branch from 4f3f4e0 to 3eb78f2 Compare January 31, 2024 04:32
Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

Changes to common module looks good to me. Thanks.

@@ -384,30 +384,27 @@ object FilterHandler extends PredicateHelper {
(ExpressionSet(filters) -- ExpressionSet(scanFilters)).toSeq

// Separate and compare the filter conditions in Scan and Filter.
// Push down the remaining conditions in Filter into Scan.
// Try push down the remaining conditions in Filter into Scan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Try to push down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix in the follow up PR

@liujiayi771
Copy link
Contributor

LGTM. Thanks.

Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@baibaichen baibaichen changed the title [CORE][VL][CH] Refactor filter pushdown logic [GLUTEN-3582][CORE][VL][CH] Refactor filter pushdown logic Jan 31, 2024
Copy link

#3582

@baibaichen baibaichen merged commit 512b69a into apache:main Jan 31, 2024
22 checks passed
@baibaichen baibaichen deleted the feature/refactor_push_down branch January 31, 2024 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants