Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6590][CH] Support compact mergetree file on s3 #6591

Merged
merged 1 commit into from
Jul 31, 2024

Conversation

lwz9103
Copy link
Contributor

@lwz9103 lwz9103 commented Jul 25, 2024

What changes were proposed in this pull request?

(Fixes: #6590)

How was this patch tested?

unit tests

Copy link

#6590

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer changed the title [GLUTEN-6590] Support compact mergetree file on s3 [GLUTEN-6590][CH] Support compact mergetree file on s3 Jul 26, 2024
@@ -635,6 +671,7 @@ class GlutenClickHouseMergeTreeWriteOnS3Suite

withSQLConf(
"spark.databricks.delta.optimize.minFileSize" -> "200000000",
"spark.gluten.sql.columnar.backend.ch.runtime_settings.mergetree.insert_without_local_storage" -> "true",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add this config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test directly merge parts on s3. See this code branch
image

@@ -556,7 +556,7 @@ std::vector<String> BackendInitializerUtil::wrapDiskPathConfig(
if (path_prefix.empty() && path_suffix.empty())
return changed_paths;
Poco::Util::AbstractConfiguration::Keys disks;
std::unordered_set<String> disk_types = {"s3", "hdfs_gluten", "cache"};
std::unordered_set<String> disk_types = {"s3_gluten", "hdfs_gluten", "cache"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlutenClickHouseWholeStageTransformerSuite configuration of disk type for UT should be changed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disk type is already set to s3_gluten.
image

Copy link
Contributor

@liuneng1994 liuneng1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check unit tests can test current codes

@lwz9103
Copy link
Contributor Author

lwz9103 commented Jul 29, 2024

Please check unit tests can test current codes

Please see this code to test compact files on s3
image

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@lwz9103
Copy link
Contributor Author

lwz9103 commented Jul 31, 2024

Test Result

Compact files on s3
image
Local UT test
image

Copy link

Run Gluten Clickhouse CI

@baibaichen baibaichen merged commit 0ed44a4 into apache:main Jul 31, 2024
7 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_master_07_31_2024_time.csv log/native_master_07_30_2024_97f4eb2e0c_time.csv difference percentage
q1 14.31 13.56 -0.748 94.77%
q2 13.71 13.36 -0.355 97.41%
q3 4.56 2.52 -2.039 55.26%
q4 69.81 70.21 0.394 100.57%
q5 7.70 7.80 0.100 101.29%
q6 3.28 2.20 -1.080 67.05%
q7 4.84 6.60 1.760 136.33%
q8 5.15 5.23 0.081 101.58%
q9 23.96 25.55 1.597 106.67%
q10 9.73 10.40 0.671 106.90%
q11 36.54 37.48 0.939 102.57%
q12 1.38 1.47 0.091 106.60%
q13 6.31 6.40 0.091 101.44%
q14a 46.21 45.84 -0.366 99.21%
q14b 42.83 42.50 -0.329 99.23%
q15 2.53 2.66 0.133 105.25%
q16 45.93 45.34 -0.587 98.72%
q17 5.33 4.97 -0.366 93.14%
q18 6.62 6.85 0.224 103.38%
q19 2.12 2.24 0.121 105.72%
q20 1.52 1.44 -0.077 94.91%
q21 1.06 1.24 0.179 116.84%
q22 7.67 7.85 0.179 102.34%
q23a 101.98 101.56 -0.424 99.58%
q23b 126.41 125.77 -0.635 99.50%
q24a 95.33 96.52 1.193 101.25%
q24b 96.41 92.44 -3.968 95.88%
q25 4.06 4.14 0.072 101.77%
q26 3.30 3.22 -0.082 97.52%
q27 3.78 3.78 0.001 100.03%
q28 31.36 30.95 -0.408 98.70%
q29 11.97 11.17 -0.809 93.25%
q30 4.93 4.85 -0.078 98.42%
q31 7.26 7.25 -0.014 99.81%
q32 1.35 1.18 -0.174 87.16%
q33 4.40 4.36 -0.045 98.98%
q34 3.78 3.99 0.210 105.54%
q35 7.81 8.90 1.095 114.03%
q36 4.73 4.67 -0.057 98.79%
q37 4.44 4.74 0.304 106.85%
q38 12.99 13.61 0.615 104.74%
q39a 3.36 2.93 -0.429 87.23%
q39b 2.84 2.56 -0.278 90.18%
q40 4.41 6.20 1.798 140.80%
q41 0.59 0.65 0.058 109.94%
q42 0.94 0.97 0.027 102.89%
q43 4.35 4.42 0.073 101.67%
q44 13.72 9.95 -3.777 72.48%
q45 3.04 3.26 0.224 107.38%
q46 3.54 3.75 0.213 106.03%
q47 17.44 17.50 0.067 100.38%
q48 5.12 5.24 0.119 102.32%
q49 8.59 8.38 -0.212 97.53%
q50 21.98 21.30 -0.682 96.90%
q51 10.16 9.64 -0.516 94.92%
q52 1.06 1.05 -0.003 99.69%
q53 2.30 2.35 0.049 102.13%
q54 3.78 3.92 0.138 103.65%
q55 1.05 1.08 0.032 103.05%
q56 4.28 4.08 -0.198 95.36%
q57 10.67 10.79 0.118 101.10%
q58 2.27 2.37 0.093 104.08%
q59 10.68 10.62 -0.066 99.38%
q60 4.05 3.99 -0.063 98.45%
q61 4.09 4.19 0.099 102.41%
q62 4.20 4.11 -0.095 97.73%
q63 2.42 2.40 -0.017 99.28%
q64 59.10 66.74 7.644 112.93%
q65 19.45 18.21 -1.243 93.61%
q66 4.97 3.94 -1.028 79.33%
q67 409.37 382.09 -27.275 93.34%
q68 3.59 3.39 -0.193 94.62%
q69 5.38 4.96 -0.425 92.11%
q70 11.25 11.27 0.019 100.17%
q71 2.30 2.28 -0.021 99.10%
q72 217.00 214.05 -2.951 98.64%
q73 2.57 3.02 0.447 117.37%
q74 22.95 23.33 0.378 101.65%
q75 26.39 25.98 -0.406 98.46%
q76 11.08 11.24 0.166 101.49%
q77 2.29 2.22 -0.078 96.58%
q78 53.84 50.05 -3.785 92.97%
q79 3.79 3.78 -0.016 99.57%
q80 12.31 12.27 -0.047 99.62%
q81 4.89 4.88 -0.008 99.83%
q82 7.12 6.96 -0.155 97.82%
q83 1.60 1.69 0.099 106.20%
q84 2.81 2.80 -0.012 99.58%
q85 7.37 8.34 0.966 113.11%
q86 3.80 3.86 0.062 101.64%
q87 13.40 15.24 1.840 113.73%
q88 21.53 21.37 -0.160 99.26%
q89 3.52 3.54 0.021 100.59%
q90 3.03 3.14 0.115 103.80%
q91 2.08 2.51 0.434 120.91%
q92 1.37 1.36 -0.007 99.51%
q93 38.89 40.22 1.327 103.41%
q94 24.38 24.76 0.380 101.56%
q9 86.65 87.37 0.712 100.82%
q5 2.87 2.56 -0.312 89.16%
q96 17.50 17.11 -0.395 97.74%
q97 1.88 1.87 -0.015 99.22%
q98 10.29 9.89 -0.406 96.06%
q99 10.29 9.89 -0.406 96.06%
total 2162.96 2132.81 -30.148 98.61%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_07_31_2024_time.csv log/native_master_07_30_2024_3a5e5b1d25_time.csv difference percentage
q1 40.49 40.48 -0.018 99.96%
q2 30.38 30.82 0.433 101.43%
q3 53.36 56.32 2.963 105.55%
q4 41.93 41.60 -0.332 99.21%
q5 105.75 106.67 0.917 100.87%
q6 11.50 14.28 2.779 124.17%
q7 116.91 121.11 4.195 103.59%
q8 116.05 115.72 -0.330 99.72%
q9 168.38 171.12 2.748 101.63%
q10 65.81 66.00 0.188 100.28%
q11 26.76 28.05 1.287 104.81%
q12 29.20 30.46 1.258 104.31%
q13 51.79 52.36 0.569 101.10%
q14 25.72 25.55 -0.164 99.36%
q15 53.02 55.81 2.785 105.25%
q16 20.31 20.09 -0.226 98.89%
q17 131.74 134.70 2.960 102.25%
q18 198.71 199.98 1.266 100.64%
q19 27.35 25.41 -1.943 92.90%
q20 41.43 40.81 -0.613 98.52%
q21 379.32 381.21 1.883 100.50%
q22 15.56 18.70 3.143 120.21%
total 1751.51 1777.26 25.748 101.47%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Support compact mergetree file on s3
4 participants