[GLUTEN-6834][CORE] feat: add other join types from the official Substrait #6835

EpsilonPrime · 2024-08-14T06:56:23Z

Adds the join types currently defined in Substrait to the Gluten copy. This is one of a vast set of changes aimed at reducing the differences from the official version in hope that one day there would be no differences.

github-actions · 2024-08-14T06:56:40Z

#6834

github-actions · 2024-08-14T06:56:54Z

Run Gluten Clickhouse CI

github-actions · 2024-08-14T20:48:05Z

Run Gluten Clickhouse CI

PHILO-HE · 2024-08-15T01:56:00Z

@FelixYBW, @baibaichen

zhztheplayer · 2024-08-15T02:21:58Z

I suspect how it actually helps if we make such changes.

For example both Spark and Velox don't have a join type Right Anti at the moment (despite that whether they will add it in future or not). So it may look confusing to have it defined in the interchange protocol.

As commented in #6833 (comment), physical plans would vary more than logical plans in Gluten. We are already having to follow both Spark's and Velox/CH's plan protocols. It could have chance to start messing things up if we strictly follow another one in the middle layer.

EpsilonPrime · 2024-08-15T02:32:55Z

There are also physical relations in Substrait. The approach I am taking at unforking is to slowly chip away at the differences so we can swap the main instance in seamlessly. A swap of everything at once seems unlikely so I'm tackling what I can -- especially if there are minor code changes related to each small change.

lgbo-ustc · 2024-08-15T02:53:07Z

May add a new type for existence join ? existence join is transformed to left semi join at present
existence join is supported both at ch and velox now.

EpsilonPrime · 2024-08-15T02:58:57Z

May add a new type for existence join ? existence join is transformed to left semi join at present existence join is supported both at ch and velox now.

We are in the process of adding new join types to the specification at the moment so adding another is definitely a possibility. (Currently I have a pr out to add mark join.)

github-actions · 2024-08-15T05:17:08Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-08-17T15:03:05Z

@EpsilonPrime Before we decide to take effort on migrating to mainstream Substrait, we may do some study on how much it could help us on next moves.

Thing is if we unfork Substrait, we should use it for supporting more backends in Gluten. Otherwise we get less benefit than cost.

Do you happen to know the progress of Substrait integration of some projects, for example, DuckDB, Arrow and Datafusion, if Gluten decide to add support for these libraries, are their Substrait consumer implementations reliable enough for us to use?

FelixYBW · 2024-08-18T07:39:32Z

As commented in #6833 (comment), physical plans would vary more than logical plans in Gluten. We are already having to follow both Spark's and Velox/CH's plan protocols. It could have chance to start messing things up if we strictly follow another one in the middle layer.

we should have the same protocol in long term, substrait should be able to fit all requirements from different frameworks, libraries and accelerators. It's substrait's goal. But now for Gluten it's not urgent.

FelixYBW · 2024-08-18T07:42:16Z

@JkSelf @rui-mo Do you happen to have the list of our modifications in Glute's substrait? I remember we have several pending PRs to upstream substrait but there is no active review, so we paused there.

@EpsilonPrime you may start to review the pending PRs in substrait if we have. Add the missing features Gluten needs to upstream.

rui-mo · 2024-08-19T00:48:56Z

Gluten modifications to Substrait: https://github.com/apache/incubator-gluten/blob/main/docs/developers/SubstraitModifications.md

EpsilonPrime · 2024-08-19T05:29:13Z

@JkSelf @rui-mo Do you happen to have the list of our modifications in Glute's substrait? I remember we have several pending PRs to upstream substrait but there is no active review, so we paused there.

@EpsilonPrime you may start to review the pending PRs in substrait if we have. Add the missing features Gluten needs to upstream.

All of the pending changes were merged last year (about the time I became a reviewer). I have been also merging changes upstream (for instance the equivalent of TextReadOptions was added last week).

EpsilonPrime · 2024-08-19T05:46:20Z

@EpsilonPrime Before we decide to take effort on migrating to mainstream Substrait, we may do some study on how much it could help us on next moves.

Thing is if we unfork Substrait, we should use it for supporting more backends in Gluten. Otherwise we get less benefit than cost.

Do you happen to know the progress of Substrait integration of some projects, for example, DuckDB, Arrow and Datafusion, if Gluten decide to add support for these libraries, are their Substrait consumer implementations reliable enough for us to use?

DuckDB has the best Substrait support of the three. Datafusion has a few issues which I'm hoping are addressed in their next release. Acero is in maintenance mode but has a working implementation but it's very strict about what it accepts.

Other benefits include tools which run on Substrait (like the validator and text plan format) which aren't really being used by Gluten at the moment.

The Spark proposal to move the Gluten communication logic (which may or may not have included Substrait) there was the reason I started looking into this effort.

github-actions · 2024-08-23T00:53:46Z

Run Gluten Clickhouse CI

github-actions · 2024-08-23T00:54:58Z

Run Gluten Clickhouse CI

github-actions · 2024-08-23T06:25:44Z

Run Gluten Clickhouse CI

github-actions · 2024-09-06T06:59:02Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-09-06T08:42:47Z

It seems CH CI failing at

15:08:17  /home/jenkins/agent/workspace/gluten/gluten-ci/gluten/cpp-ch/local-engine/Parser/JoinRelParser.cpp:272:80: error: no member named 'JoinRel_JoinType_JOIN_TYPE_ANTI' in namespace 'substrait'
15:08:17    272 |         if (join_opt_info.is_null_aware_anti_join && join.type() == substrait::JoinRel_JoinType_JOIN_TYPE_ANTI)
15:08:17        |                                                                     ~~~~~~~~~~~^
15:08:17  1 error generated.

github-actions · 2024-09-07T03:18:07Z

Run Gluten Clickhouse CI

…trait (apache#6835)

github-actions bot added the CORE works for Gluten Core label Aug 14, 2024

github-actions bot added the VELOX label Aug 14, 2024

github-actions bot added the CLICKHOUSE label Aug 15, 2024

EpsilonPrime force-pushed the join_types branch from c4f689a to 6ce90e0 Compare August 23, 2024 00:53

EpsilonPrime added 5 commits September 5, 2024 23:57

feat: add other join types from the official Substrait

4984f41

Also update Velox code to new name for left anti join.

869bb66

Found a few more renames.

dd976b8

remove unused jointype in CrossRel

9743e51

CrossJoin's JoinType can be removed separately.

fab3250

EpsilonPrime force-pushed the join_types branch from bdd63fe to fab3250 Compare September 6, 2024 06:58

zhztheplayer changed the title ~~[GLUTEN-6834][core] feat: add other join types from the official Substrait~~ [GLUTEN-6834][CORE] feat: add other join types from the official Substrait Sep 6, 2024

zhztheplayer previously approved these changes Sep 6, 2024

View reviewed changes

rename one more JOIN_TYPE_ANTI

d2d1d17

EpsilonPrime dismissed zhztheplayer’s stale review via d2d1d17 September 7, 2024 03:17

zhztheplayer approved these changes Sep 9, 2024

View reviewed changes

zhztheplayer merged commit a4571dd into apache:main Sep 9, 2024
44 of 45 checks passed

dcoliversun pushed a commit to dcoliversun/gluten that referenced this pull request Sep 11, 2024

[GLUTEN-6834][CORE] feat: add other join types from the official Subs…

f4babab

…trait (apache#6835)

sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024

[GLUTEN-6834][CORE] feat: add other join types from the official Subs…

2f1d6a6

…trait (apache#6835)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-6834][CORE] feat: add other join types from the official Substrait #6835

[GLUTEN-6834][CORE] feat: add other join types from the official Substrait #6835

EpsilonPrime commented Aug 14, 2024

github-actions bot commented Aug 14, 2024

github-actions bot commented Aug 14, 2024

github-actions bot commented Aug 14, 2024

PHILO-HE commented Aug 15, 2024

zhztheplayer commented Aug 15, 2024 •

edited

Loading

EpsilonPrime commented Aug 15, 2024

lgbo-ustc commented Aug 15, 2024 •

edited

Loading

EpsilonPrime commented Aug 15, 2024

github-actions bot commented Aug 15, 2024

zhztheplayer commented Aug 17, 2024 •

edited

Loading

FelixYBW commented Aug 18, 2024

FelixYBW commented Aug 18, 2024

rui-mo commented Aug 19, 2024

EpsilonPrime commented Aug 19, 2024

EpsilonPrime commented Aug 19, 2024

github-actions bot commented Aug 23, 2024

github-actions bot commented Aug 23, 2024

github-actions bot commented Aug 23, 2024

github-actions bot commented Sep 6, 2024

zhztheplayer commented Sep 6, 2024

github-actions bot commented Sep 7, 2024

[GLUTEN-6834][CORE] feat: add other join types from the official Substrait #6835

[GLUTEN-6834][CORE] feat: add other join types from the official Substrait #6835

Conversation

EpsilonPrime commented Aug 14, 2024

github-actions bot commented Aug 14, 2024

github-actions bot commented Aug 14, 2024

github-actions bot commented Aug 14, 2024

PHILO-HE commented Aug 15, 2024

zhztheplayer commented Aug 15, 2024 • edited Loading

EpsilonPrime commented Aug 15, 2024

lgbo-ustc commented Aug 15, 2024 • edited Loading

EpsilonPrime commented Aug 15, 2024

github-actions bot commented Aug 15, 2024

zhztheplayer commented Aug 17, 2024 • edited Loading

FelixYBW commented Aug 18, 2024

FelixYBW commented Aug 18, 2024

rui-mo commented Aug 19, 2024

EpsilonPrime commented Aug 19, 2024

EpsilonPrime commented Aug 19, 2024

github-actions bot commented Aug 23, 2024

github-actions bot commented Aug 23, 2024

github-actions bot commented Aug 23, 2024

github-actions bot commented Sep 6, 2024

zhztheplayer commented Sep 6, 2024

github-actions bot commented Sep 7, 2024

zhztheplayer commented Aug 15, 2024 •

edited

Loading

lgbo-ustc commented Aug 15, 2024 •

edited

Loading

zhztheplayer commented Aug 17, 2024 •

edited

Loading