-
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Spark already does a filter push down as a part of join optimization. If a conditional only involves one side of a join then it is pushed down before the join. In other cases it cannot be pushed down so the filter happens after the join. In fact filtering after the join only works for Inner joins because for all other joins you may need to evaluate the conditional for all combinations of matching keys to see if the condition will allow or exclude the row from being created. We are working with cudf to be able to push the conditional into the join so we can reduce the amount of data that is materialized and increase conditional joins to include non-inner joins. But that is not likely to show up until the 0.16 release of cudf at the earliest. |
Beta Was this translation helpful? Give feedback.
Spark already does a filter push down as a part of join optimization. If a conditional only involves one side of a join then it is pushed down before the join. In other cases it cannot be pushed down so the filter happens after the join. In fact filtering after the join only works for Inner joins because for all other joins you may need to evaluate the conditional for all combinations of matching keys to see if the condition will allow or exclude the row from being created.
We are working with cudf to be able to push the conditional into the join so we can reduce the amount of data that is materialized and increase conditional joins to include non-inner joins. But that is not likely to sh…