Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-7028][CH][Part-4] Refactor DeltaMergeTreeFileFormat to read table configuration from deltalog's metadata #7170

Merged
merged 8 commits into from
Sep 30, 2024

Conversation

baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Sep 9, 2024

What changes were proposed in this pull request?

DeltaMergeTreeFileFormat has lots of members which could be placed into metadata's configration. This PR move most of configurations into metadat‘s configuration except sanpshotid.

We now create ReadRel.ExtensionTable by DeltaMergeTreeFileFormat, see prepareWrite(),

   @transient val deltaMetaReader = DeltaMetaReader(metadata)

    val database = deltaMetaReader.storageDB
    val tableName = deltaMetaReader.storageTable
    val deltaPath = deltaMetaReader.storagePath

    val extensionTableBC = sparkSession.sparkContext.broadcast(
      ClickhouseMetaSerializer
        .forWrite(deltaMetaReader, metadata.schema)
        .toByteArray)

(Fixes: #7028)

How was this patch tested?

Existed UTs

Copy link

github-actions bot commented Sep 9, 2024

#7028

Copy link

github-actions bot commented Sep 9, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Sep 9, 2024

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen changed the title [GLUTEN-7028][CH][Part-2] [GLUTEN-7028][CH][Part-3] Sep 13, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen changed the title [GLUTEN-7028][CH][Part-3] [GLUTEN-7028][CH][Part-4] Sep 19, 2024
Copy link

Run Gluten Clickhouse CI

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Sep 25, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

call ClickhouseMetaSerializer.forWrite at driver side

org.apache.spark.sql.execution.datasources.utils.MergeTreeDeltaUtil => org.apache.spark.sql.execution.datasources.clickhouse.utils.MergeTreeDeltaUtil

ClickhouseMetaSerializer.forWrite => get parameter from clickhouseTableConfigs
Directly call ClickhouseMetaSerializer in CHMergeTreeWriterInjects
Simplify ExtensionTableNode
Minor refactor: using functional way to create collection
Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen marked this pull request as ready for review September 30, 2024 05:08
@baibaichen baibaichen changed the title [GLUTEN-7028][CH][Part-4] [GLUTEN-7028][CH][Part-4] Refactor DeltaMergeTreeFileFormat to read table configuration from deltalog's metadata Sep 30, 2024
@baibaichen baibaichen merged commit 93056f0 into apache:main Sep 30, 2024
8 checks passed
@baibaichen baibaichen deleted the feature/one-pipeline branch September 30, 2024 05:35
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_master_09_30_2024_time.csv log/native_master_09_29_2024_69a914f62e_time.csv difference percentage
q1 13.70 14.36 0.662 104.83%
q2 15.57 15.86 0.291 101.87%
q3 3.03 5.12 2.089 168.88%
q4 71.84 70.70 -1.138 98.42%
q5 10.55 10.48 -0.068 99.35%
q6 4.01 2.33 -1.675 58.20%
q7 6.01 6.22 0.208 103.45%
q8 3.38 5.29 1.906 156.37%
q9 25.07 26.98 1.914 107.64%
q10 10.09 9.00 -1.083 89.27%
q11 36.37 36.63 0.259 100.71%
q12 1.46 1.42 -0.038 97.38%
q13 6.76 6.28 -0.477 92.95%
q14a 46.57 49.12 2.546 105.47%
q14b 41.79 42.15 0.356 100.85%
q15 2.33 2.54 0.213 109.14%
q16 47.25 47.01 -0.239 99.49%
q17 4.88 4.69 -0.185 96.21%
q18 8.19 6.87 -1.312 83.98%
q19 1.95 2.03 0.081 104.15%
q20 1.47 1.36 -0.112 92.38%
q21 1.04 1.02 -0.019 98.14%
q22 7.81 8.65 0.844 110.80%
q23a 102.90 112.79 9.892 109.61%
q23b 125.59 126.15 0.560 100.45%
q24a 112.25 103.25 -8.999 91.98%
q24b 111.34 109.59 -1.747 98.43%
q25 4.15 4.04 -0.103 97.52%
q26 4.04 5.55 1.510 137.39%
q27 5.05 4.86 -0.190 96.23%
q28 30.61 31.38 0.769 102.51%
q29 11.24 10.51 -0.736 93.45%
q30 5.67 4.92 -0.749 86.80%
q31 6.84 6.86 0.028 100.40%
q32 1.21 1.20 -0.007 99.42%
q33 4.31 4.57 0.257 105.97%
q34 4.28 3.95 -0.330 92.28%
q35 7.79 8.47 0.680 108.73%
q36 5.68 5.58 -0.107 98.12%
q37 4.97 4.93 -0.038 99.24%
q38 13.38 15.08 1.703 112.73%
q39a 3.06 3.20 0.139 104.54%
q39b 3.13 2.91 -0.214 93.17%
q40 3.95 3.84 -0.108 97.27%
q41 0.64 0.67 0.031 104.90%
q42 0.87 0.90 0.029 103.30%
q43 4.62 4.47 -0.150 96.76%
q44 9.34 9.64 0.303 103.24%
q45 3.28 3.34 0.059 101.79%
q46 3.59 3.72 0.132 103.68%
q47 17.77 17.85 0.075 100.42%
q48 4.80 5.65 0.853 117.77%
q49 7.58 7.21 -0.374 95.07%
q50 21.74 21.91 0.164 100.76%
q51 9.62 9.95 0.334 103.48%
q52 0.98 1.00 0.022 102.24%
q53 2.15 2.17 0.019 100.87%
q54 3.55 3.68 0.124 103.50%
q55 1.02 1.07 0.046 104.55%
q56 4.07 4.05 -0.022 99.45%
q57 10.23 10.59 0.362 103.54%
q58 2.65 2.56 -0.088 96.68%
q59 11.41 10.94 -0.478 95.81%
q60 3.99 4.19 0.205 105.14%
q61 4.10 4.07 -0.033 99.19%
q62 4.55 4.60 0.055 101.22%
q63 2.17 2.24 0.071 103.29%
q64 59.35 64.24 4.885 108.23%
q65 19.95 17.71 -2.239 88.78%
q66 4.09 4.65 0.552 113.49%
q67 436.03 422.45 -13.585 96.88%
q68 4.60 4.55 -0.052 98.86%
q69 5.34 5.73 0.393 107.37%
q70 10.61 10.70 0.082 100.77%
q71 2.36 2.53 0.168 107.11%
q72 214.44 217.18 2.737 101.28%
q73 2.25 2.34 0.099 104.43%
q74 23.02 23.87 0.842 103.66%
q75 26.50 26.62 0.118 100.44%
q76 13.61 13.25 -0.368 97.30%
q77 2.19 2.03 -0.164 92.50%
q78 49.89 49.55 -0.336 99.33%
q79 3.90 4.04 0.145 103.73%
q80 11.42 11.95 0.533 104.66%
q81 4.56 4.52 -0.037 99.19%
q82 6.58 6.83 0.244 103.71%
q83 1.59 1.62 0.024 101.52%
q84 2.87 2.78 -0.087 96.98%
q85 6.75 8.18 1.427 121.12%
q86 4.20 4.08 -0.122 97.11%
q87 14.57 16.13 1.560 110.71%
q88 16.76 17.61 0.852 105.09%
q89 3.52 3.56 0.048 101.38%
q90 2.70 2.78 0.075 102.77%
q91 1.95 3.80 1.845 194.45%
q92 1.22 1.24 0.018 101.52%
q93 34.75 33.59 -1.165 96.65%
q94 25.87 25.48 -0.394 98.48%
q9 86.19 85.81 -0.377 99.56%
q5 2.61 2.65 0.039 101.49%
q96 18.33 17.81 -0.521 97.16%
q97 1.83 1.92 0.096 105.24%
q98 11.30 9.55 -1.754 84.48%
q99 11.30 9.55 -1.754 84.48%
total 2215.02 2219.58 4.559 100.21%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_09_30_2024_time.csv log/native_master_09_29_2024_69a914f62e_time.csv difference percentage
q1 52.59 52.80 0.210 100.40%
q2 30.47 28.83 -1.632 94.64%
q3 53.51 53.92 0.411 100.77%
q4 43.69 43.08 -0.615 98.59%
q5 103.16 105.49 2.337 102.27%
q6 10.65 11.43 0.781 107.33%
q7 108.91 113.09 4.184 103.84%
q8 116.01 117.90 1.888 101.63%
q9 172.47 171.73 -0.737 99.57%
q10 67.09 66.56 -0.526 99.22%
q11 25.91 27.55 1.640 106.33%
q12 33.45 32.70 -0.755 97.74%
q13 53.01 53.07 0.063 100.12%
q14 23.44 22.78 -0.659 97.19%
q15 50.76 49.78 -0.978 98.07%
q16 18.03 18.65 0.626 103.48%
q17 126.33 126.81 0.484 100.38%
q18 198.21 198.80 0.586 100.30%
q19 28.05 27.52 -0.536 98.09%
q20 42.97 40.53 -2.439 94.32%
q21 339.40 337.42 -1.973 99.42%
q22 15.95 16.11 0.162 101.02%
total 1714.05 1716.57 2.522 100.15%

sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
… table configuration from deltalog's metadata (apache#7170)

*Call ClickhouseMetaSerializer.forWrite at driver side and  Broadcast ReadRel.ExtensionTable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Fully Support writing parquet and mergetree in spark 3.5.x with delta protocol
3 participants