-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve Filter pushdown to Join #5770
Conversation
@yahoNanJing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the plan changes and the code carefully -- nice work @mingmwang
| | Aggregate: groupBy=[[]], aggr=[[SUM(lineitem.l_extendedprice)]] | | ||
| | Projection: lineitem.l_extendedprice | | ||
| | Inner Join: part.p_partkey = __scalar_sq_1.l_partkey Filter: CAST(lineitem.l_quantity AS Decimal128(30, 15)) < CAST(__scalar_sq_1.__value AS Decimal128(30, 15)) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a better plan because the redundant
Filter: part.p_partkey = lineitem.l_partkey AND lineitem.l_partkey = part.p_partkey
, which is already done by the earlier joins, is removed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also pushes the filter
| | Filter: CAST(lineitem.l_quantity AS Decimal128(30, 15)) < CAST(__scalar_sq_1.__value AS Decimal128(30, 15)) AND __scalar_sq_1.l_partkey = lineitem.l_partkey |
``
Into the Join which seems like a win to me (avoid generating output)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a better plan because the redundant
Filter: part.p_partkey = lineitem.l_partkey AND lineitem.l_partkey = part.p_partkey
, which is already done by the earlier joins, is removed, right?
Yes, the duplicated filters are removed. Actually why the original plan include duplicate filters is because the push_down_filter
rule infers additional filters and try to pushdown them down. If they can not be pushed down, those inferred filters are added back to the Filters, this is unnecessary, need to differ the inferred filters and the original filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for the explanation.
let cols = expr.to_columns()?; | ||
|
||
// Collect left & right field indices | ||
// Collect left & right field indices, the field indices are sorted in ascending order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the sort order is important for later stages, can you make a note about the rationale (so the comment explains why the sorting is important, in addition to noting the output is sorted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
@mingmwang Could you share any performance numbers for the improvements for the affected queries? |
Sure, will do. unfortunately, the performance improvement is just a little. For q17, the major bottleneck is still the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job to me. Thanks @mingmwang .
This PR remind me. I also notice some optimization. We can do a such as |
Which issue does this PR close?
Closes #.
Rationale for this change
Improve some TPCH query performance, simply the generate logical plan and physical plan.
What changes are included in this PR?
tpch-q7, tpch-q17, tpch-q19, tpch-q20 are impacted by this PR.
Are these changes tested?
Are there any user-facing changes?