-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-6067][VL] [Part 3-1] Refactor: Rename VeloxColumnarWriteFilesExec to ColumnarWriteFilesExec #6403
Conversation
Run Gluten Clickhouse CI |
429bdb9
to
4b24d7f
Compare
Run Gluten Clickhouse CI |
4b24d7f
to
ea41482
Compare
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
37ff1bf
to
5ba29ac
Compare
Run Gluten Clickhouse CI |
5ba29ac
to
f03e47e
Compare
Run Gluten Clickhouse CI |
f03e47e
to
4837d46
Compare
Run Gluten Clickhouse CI |
4837d46
to
04ca20e
Compare
Run Gluten Clickhouse CI |
04ca20e
to
181d613
Compare
Run Gluten Clickhouse CI |
181d613
to
14cf511
Compare
Run Gluten Clickhouse CI |
14cf511
to
c475114
Compare
Run Gluten Clickhouse CI |
|
||
/** | ||
* This RDD is used to make sure we have injected staging write path before initializing the native | ||
* plan, and support Spark file commit protocol. | ||
*/ | ||
class VeloxColumnarWriteFilesRDD( | ||
class GlutenColumnarWriteFilesRDD( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After moving VeloxColumnarWriteFilesExec from backend-velox to gluten-core, can we update the class names by renaming GlutenColumnarWriteFilesExec to ColumnarWriteFilesExec and GlutenColumnarWriteFilesRDD to ColumnarWriteFilesRDD?
…nd move it to gluten-core 1. Return GlutenColumnarWriteFilesExec at SparkPlanExecApi 2. Move SparkWriteFilesCommitProtocol to gluten-core 3. SparkWriteFilesCommitProtocol support getFilename from internal commiter 4. Remove supportTransformWriteFiles from BackendSettingsApi 5. injectWriteFilesTempPath with fileName
…tenColumnarWriteFilesRDD to ColumnarWriteFilesRDD
c475114
to
cd7cdd0
Compare
Run Gluten Clickhouse CI |
What changes were proposed in this pull request?
(Fixes: #6067)
This PR Refactors Velox side code, rename
VeloxColumnarWriteFilesExec
toGlutenColumnarWriteFilesExec
, move it to gluten-core, so that Clickhouse backend can use the same SparkPlan in the followup PR.By supporting spark 3.4, Velox supports whole stage native write pipeline which is better than old implementation, clickhouse backend also adopt such implementation.
Major change 1
The only major difference between velox and clichouse is how to parse native metrics. which I introduce a new trait called
BackendWrite
, it only has one member now. Once native write pipeline is compeleted, we get it byBackendsApiManager.getSparkPlanExecApiInstance.createBackendWrite
, Please seeVeloxBackendWrite
for detailsMinor change 2
The other minor diffierence is clickhose backend doesn't generate filename. To compute filename per task, it uses
HadoopMapReduceCommitProtocol::getFilename
, and then injects them to backend. This is ok because Velox doesn't supportmaxRecordsPerFile
, see #4329 and clickhouse backend also follow this, which means one task only produce one file, no need more injections.Improve
I also pass File Format to backed.
How was this patch tested?
Uisng Existed UTs.