-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-7028][CH][Part-1] Using PushingPipelineExecutor
to write merge tree
#7029
Conversation
Run Gluten Clickhouse CI |
1bd3751
to
bfeafec
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
9cbb5c8
to
8a5c251
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
d271966
to
929465a
Compare
Run Gluten Clickhouse CI |
929465a
to
c57298f
Compare
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
LGTM |
dc6c979
to
33a6ec6
Compare
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
e17282f
to
190e062
Compare
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
73ee427
to
1db8664
Compare
Run Gluten Clickhouse CI |
3 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
10c5b8e
to
eeb966e
Compare
Run Gluten Clickhouse CI |
parseStorage => getStorage
SparkStorageMergeTree => SparkWriteStorageMergeTree
…geTreeTableInstance
3487e3c
to
67f377b
Compare
Run Gluten Clickhouse CI |
LGTM |
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240906) * Fix build due to ClickHouse/ClickHouse#65832 * Fix UT due to ClickHouse/ClickHouse#65832 * Fix conflict with #7122 * Fix conflict with #7029 * Run GlutenClickHouseMergeTreeCacheDataSSuite locally --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang Chen <[email protected]>
…rge tree (apache#7029) * 1. Rename Storages/Mergetree to Storages/MergeTree 2. Move MergeTreeTool.cpp/.h from Common to Storages/MergeTree 3. Move CustomStorageMergeTree.cpp/.h and StorageMergeTreeFactory.cpp/.h to MergeTree folderMove CustomStorageMergeTree.cpp/.h and StorageMergeTreeFactory.cpp/.h to MergeTree folder 4. Add CustomMergeTreeDataWriter 5. Remove TempStorageFreer 6. Add SubstraitParserUtils * Make query_map_ as QueryContextManager member * EMBEDDED_PLAN and create_plan_and_executor * minor refactor * tmp * SparkStorageMergeTree CustomMergeTreeDataWriter => SparkMergeTreeDataWriter * Add SparkMergeTreeSink * use SparkStorageMergeTree and SparkMergeTreeSink * Introduce GlutenSettings.h * GlutenMergeTreeWriteSettings * Fix Test Build * typo * ContextPtr => const ContextPtr & * minor refactor * fix style * using GlutenMergeTreeWriteSettings * [TMP] GlutenMergeTreeWriteSettings refactor * [TMP] StorageMergeTreeWrapper * [TMP] StorageMergeTreeWrapper::commitPartToRemoteStorageIfNeeded * [TMP] StorageMergeTreeWrapper::saveMetadata * move thread pool * tmp * rename * move to sparkmergetreesink.h/cpp * MergeTreeTableInstance * sameStructWith => sameTable * parseStorageAndRestore => restoreStorage parseStorage => getStorage * Sink with MergeTreeTable table; * remvoe SparkMergeTreeWriter::writeTempPartAndFinalize * refactor SinkHelper::writeTempPart * Remove write_setting of SparkMergeTreeWriter * SparkMergeTreeWriter using PushingPipelineExecutor * SparkMergeTreeWriteSettings * tmp * GlutenMergeTreeWriteSettings => SparkMergeTreeWriteSettings * make CustomStorageMergeTree constructor protected * MergeTreeTool.cpp/.h => SparkMergeTreeMeta.cpp/.h * CustomStorageMergeTree.cpp/.h => SparkStorageMergeTree.cpp/.h * CustomStorageMergeTree => SparkStorageMergeTree SparkStorageMergeTree => SparkWriteStorageMergeTree * Refactor move codes from MergeTreeRelParser to MergeTreeTable and MergeTreeTableInstance * Refactor Make static member to normal member
) * [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240906) * Fix build due to ClickHouse/ClickHouse#65832 * Fix UT due to ClickHouse/ClickHouse#65832 * Fix conflict with apache#7122 * Fix conflict with apache#7029 * Run GlutenClickHouseMergeTreeCacheDataSSuite locally --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang Chen <[email protected]>
…rge tree (apache#7029) * 1. Rename Storages/Mergetree to Storages/MergeTree 2. Move MergeTreeTool.cpp/.h from Common to Storages/MergeTree 3. Move CustomStorageMergeTree.cpp/.h and StorageMergeTreeFactory.cpp/.h to MergeTree folderMove CustomStorageMergeTree.cpp/.h and StorageMergeTreeFactory.cpp/.h to MergeTree folder 4. Add CustomMergeTreeDataWriter 5. Remove TempStorageFreer 6. Add SubstraitParserUtils * Make query_map_ as QueryContextManager member * EMBEDDED_PLAN and create_plan_and_executor * minor refactor * tmp * SparkStorageMergeTree CustomMergeTreeDataWriter => SparkMergeTreeDataWriter * Add SparkMergeTreeSink * use SparkStorageMergeTree and SparkMergeTreeSink * Introduce GlutenSettings.h * GlutenMergeTreeWriteSettings * Fix Test Build * typo * ContextPtr => const ContextPtr & * minor refactor * fix style * using GlutenMergeTreeWriteSettings * [TMP] GlutenMergeTreeWriteSettings refactor * [TMP] StorageMergeTreeWrapper * [TMP] StorageMergeTreeWrapper::commitPartToRemoteStorageIfNeeded * [TMP] StorageMergeTreeWrapper::saveMetadata * move thread pool * tmp * rename * move to sparkmergetreesink.h/cpp * MergeTreeTableInstance * sameStructWith => sameTable * parseStorageAndRestore => restoreStorage parseStorage => getStorage * Sink with MergeTreeTable table; * remvoe SparkMergeTreeWriter::writeTempPartAndFinalize * refactor SinkHelper::writeTempPart * Remove write_setting of SparkMergeTreeWriter * SparkMergeTreeWriter using PushingPipelineExecutor * SparkMergeTreeWriteSettings * tmp * GlutenMergeTreeWriteSettings => SparkMergeTreeWriteSettings * make CustomStorageMergeTree constructor protected * MergeTreeTool.cpp/.h => SparkMergeTreeMeta.cpp/.h * CustomStorageMergeTree.cpp/.h => SparkStorageMergeTree.cpp/.h * CustomStorageMergeTree => SparkStorageMergeTree SparkStorageMergeTree => SparkWriteStorageMergeTree * Refactor move codes from MergeTreeRelParser to MergeTreeTable and MergeTreeTableInstance * Refactor Make static member to normal member
) * [GLUTEN-1632][CH]Daily Update Clickhouse Version (20240906) * Fix build due to ClickHouse/ClickHouse#65832 * Fix UT due to ClickHouse/ClickHouse#65832 * Fix conflict with apache#7122 * Fix conflict with apache#7029 * Run GlutenClickHouseMergeTreeCacheDataSSuite locally --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang Chen <[email protected]>
What changes were proposed in this pull request?
This PR refactors
SparkMergeTreeWriter
, usingPushingPipelineExecutor
to write mergetree instead of manually written codes.SparkMergeTreeWriter
did 4 different tasksDB::Squashing
to merge blocks into one bigger blocks, this functionality is now done byPlanSquashingTransform
andApplySquashingTransform
SparkMergeTreeDataWriter
SinkHelper
and it's derived classes.SparkMergeTreeSink
andPushingPipelineExecutor
The current work flow looks like:
We did this works for two reasons:
WriteFilesExec
, so now we can write parquet and orc in one native pipeline wittout modify spark source codes, see [GLUTEN-6067][CH] [Part 3-2] Basic support for Native Write in Spark 3.5 #6586PushingPipelineExecutor
.After this PR, we can unify writing for all formats for spark 3.2, 3.3 and 3.5.
Other Refactor:
Storage/Mergetree
toStorage/MergeTree
Storage/MergeTree
CustomStorageMergeTree
toSparkStorageMergeTree
SparkWriteStorageMergeTree
and implementwrite
method to createSparkMergeTreeSink
.MergeTreeTableInstance
and inherit fromMergeTreeTable
(Fixes: #7028)
How was this patch tested?
Using Existed Tests