-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make swap_hash_join public API #10702
Conversation
fn swap_hash_join( | ||
/// This function is public so other downstream projects can use it | ||
/// to construct `HashJoinExec` with right side as the build side. | ||
pub fn swap_hash_join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark supports build right in HashJoin operator internally. In planning stage, it decides which side is build side.
However, DataFusion HashJoin planning takes alternative approach. DataFusion HashJoin operator only supports build left. In optimization stage, once DataFusion finds build right is better, it will call this function to swap inputs for HashJoin to achieve build right effect.
So, to make Comet to use DataFusion HashJoin to support Spark build right HashJoin, we only need this function to be public so Comet can directly use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this input takes &HashJoinExec
as input, it seems like it might be easier to find / make more sense as a method on HashJoinExec
itself 🤔
So maybe we could update the code so this function is HashJoinExec::swap_inputs
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. 👍
Let me update it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, some functions used in swap_hash_join
are also used by other functions like swap_nl_join
. So I cannot simply move these functions into HashJoinExec
as private ones.
Only I can do is move these functions to joins/utils.rs in physical-plan
crate and make them as public APIs so both join_selection.rs and HashJoinExec
can use them.
It makes more APIs public actually, including swap_join_filter
, swap_reverting_projection
and swap_join_type
.
I think it is not so worth making more public APIs just for swap_hash_join
.
Seems the current approach is better?
WDYT?
fn swap_hash_join( | ||
/// This function is public so other downstream projects can use it | ||
/// to construct `HashJoinExec` with right side as the build side. | ||
pub fn swap_hash_join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this input takes &HashJoinExec
as input, it seems like it might be easier to find / make more sense as a method on HashJoinExec
itself 🤔
So maybe we could update the code so this function is HashJoinExec::swap_inputs
or something
@@ -157,7 +157,9 @@ fn swap_join_projection( | |||
} | |||
|
|||
/// This function swaps the inputs of the given join operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// This function swaps the inputs of the given join operator. | |
/// This function swaps the inputs of the given join operator, to construct `HashJoinExec` with right side as the build side |
@@ -157,7 +157,9 @@ fn swap_join_projection( | |||
} | |||
|
|||
/// This function swaps the inputs of the given join operator. | |||
fn swap_hash_join( | |||
/// This function is public so other downstream projects can use it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// This function is public so other downstream projects can use it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a clear comment, it might be wrongly moved out of public APIs easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree but the entire idea of pub
modifier implies 3rd party users to use the pub method? 🤔
@@ -157,7 +157,9 @@ fn swap_join_projection( | |||
} | |||
|
|||
/// This function swaps the inputs of the given join operator. | |||
fn swap_hash_join( | |||
/// This function is public so other downstream projects can use it | |||
/// to construct `HashJoinExec` with right side as the build side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// to construct `HashJoinExec` with right side as the build side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a benefit in leaving the documentation that explains the rationale for why this is public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @viirya
Thanks @andygrove @alamb @Dandandan @comphead |
Which issue does this PR close?
Closes #9603.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?