-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3553][CH] Support bucket scan for ch backend #3618
Conversation
Run Gluten Clickhouse CI |
// Make sure create a new read relId for the stream side first | ||
// before the one of the build side, when there is not shuffle on the build side | ||
(readRel, plan.output, true, substraitContext.nextRelId()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it for an existing bug? Could you explain a bit what happens with currect logic?
Nit: when there is not shuffle on the build sidd: not => no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is a shuffle in the stream side and no shuffle in the build side, the operation orders passed to the backend are: 0: stream side read from iter
, 1:build side scan
, 2: build side filter
, 4: join
, but the orders created in the join transformer are: 0:build side scan
, 1: build side filter
, 2: stream side read from iter
, 4: join
, because stream side read from iter
will be registed after creating the build side, so they are different.
private val bucketedScan: Boolean = { | ||
if ( | ||
relation.sparkSession.sessionState.conf.bucketingEnabled && relation.bucketSpec.isDefined | ||
&& !disableBucketedScan | ||
) { | ||
val spec = relation.bucketSpec.get | ||
val bucketColumns = spec.bucketColumnNames.flatMap(n => toAttribute(n)) | ||
bucketColumns.size == spec.bucketColumnNames.size | ||
} else { | ||
false | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WangGuangxin please help to review, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I note some code in gluten-core is changed, but CI jobs for velox backend are not triggered. It may be github action's bug that can happen for PR with many new files. |
How to fix this problem, or how to trigger velox ci manually ? |
Maybe, you can try to temporarily remove the paths filter: |
Run Gluten Clickhouse CI |
It works, thanks. After passed, I will revert back |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
.github/workflows/velox_be.yml
Outdated
paths: | ||
- '.github/**' | ||
- 'pom.xml' | ||
- 'backends-velox/**' | ||
- 'gluten-celeborn/**' | ||
- 'gluten-core/**' | ||
- 'gluten-data/**' | ||
- 'gluten-ut/**' | ||
- 'shims/**' | ||
- 'tools/gluten-it/**' | ||
- 'tools/gluten-te/**' | ||
- 'ep/build-arrow/**' | ||
- 'ep/build-velox/**' | ||
- 'cpp/*' | ||
- 'cpp/CMake/**' | ||
- 'cpp/velox/**' | ||
- 'cpp/core/**' | ||
- 'dev**' | ||
# - 'substrait/substrait-spark/**' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will revert after ci passed
Run Gluten Clickhouse CI |
May be due to the old github action checkout used:
|
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.github/workflows/velox_be.yml
Outdated
- 'cpp/velox/**' | ||
- 'cpp/core/**' | ||
- 'dev**' | ||
# - 'substrait/substrait-spark/**' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to revert these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to revert these changes?
yes, it needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Support bucket scan for ch backend, including parquet format and mergetree format. Close apache#3553.
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What changes were proposed in this pull request?
Support bucket scan for ch backend, including parquet format and mergetree format.
Close #3553.
(Fixes: #3553)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)