Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-5901][CH] Support CH backend parquet + delta #5902

Merged
merged 1 commit into from
May 31, 2024

Conversation

zzcclp
Copy link
Contributor

@zzcclp zzcclp commented May 29, 2024

What changes were proposed in this pull request?

Support CH backend to read/write parquet with the delta:

  1. native read parquet from the delta catalog;
  2. fallback write the parquet to the delta catalog ( don't support the DeltaInvariantCheckerExec operator and DeltaTaskStatisticsTracker) ;
  3. Use the ClickHouseSparkCatalog as the uniform catalog.

Close #5901.

(Fixes: #5901)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

#5901

Copy link

Run Gluten Clickhouse CI

@zzcclp zzcclp force-pushed the ch-delta-parquet branch from 1a60d89 to d2f8de7 Compare May 29, 2024 12:10
Copy link

Run Gluten Clickhouse CI

@zzcclp zzcclp force-pushed the ch-delta-parquet branch from d2f8de7 to 1d75ab0 Compare May 30, 2024 10:43
Copy link

Run Gluten Clickhouse CI

@zzcclp zzcclp force-pushed the ch-delta-parquet branch from 1d75ab0 to bb3fa01 Compare May 31, 2024 02:14
Copy link

Run Gluten Clickhouse CI

@zzcclp zzcclp force-pushed the ch-delta-parquet branch from bb3fa01 to 1c0b490 Compare May 31, 2024 06:12
Copy link

Run Gluten Clickhouse CI

Support CH backend to read/write parquet with the delta:

1. native read parquet from the delta catalog;
2. fallback write the parquet to the delta catalog ( don't support the DeltaInvariantCheckerExec operator and DeltaTaskStatisticsTracker) ;
3. Use the ClickHouseSparkCatalog as the uniform catalog.

Close apache#5901.
@zzcclp zzcclp force-pushed the ch-delta-parquet branch from 1c0b490 to d2968cc Compare May 31, 2024 06:15
Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@baibaichen baibaichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zzcclp zzcclp merged commit 2c89fb1 into apache:main May 31, 2024
40 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5902_time.csv log/native_master_05_30_2024_73eb21db45_time.csv difference percentage
q1 34.11 34.62 0.510 101.50%
q2 23.82 23.80 -0.019 99.92%
q3 39.04 37.04 -1.999 94.88%
q4 30.18 32.22 2.042 106.77%
q5 69.31 70.41 1.100 101.59%
q6 7.63 7.25 -0.379 95.03%
q7 79.86 81.31 1.448 101.81%
q8 84.17 85.87 1.704 102.02%
q9 120.42 118.14 -2.283 98.10%
q10 47.06 45.74 -1.324 97.19%
q11 20.43 21.64 1.208 105.91%
q12 30.01 22.83 -7.188 76.05%
q13 36.57 52.00 15.429 142.20%
q14 22.86 18.87 -3.991 82.54%
q15 29.24 32.69 3.445 111.78%
q16 14.33 13.77 -0.560 96.09%
q17 99.84 101.66 1.817 101.82%
q18 146.83 143.98 -2.853 98.06%
q19 13.54 14.76 1.214 108.97%
q20 29.35 28.58 -0.769 97.38%
q21 257.12 265.55 8.431 103.28%
q22 12.22 14.49 2.276 118.63%
total 1247.93 1267.19 19.260 101.54%

loneylee pushed a commit to loneylee/kylin that referenced this pull request Jul 31, 2024
1. Disable gluten in analyze table
2. Gluten support delta 2.3.0
- apache/incubator-gluten#5902
- apache/incubator-gluten#5945
3. Support sum0
4. Fix build
- apache/incubator-gluten#5796
- apache/incubator-gluten#5767
5. Fix SlowQueryDetectorTest.testSparderTimeoutCancelJob due to context dirty
6. Cleanup threadlocal contexts
7. Fix GMT+8 not support
8. Fix case class test coverage
9. Add 1 gluten disabled case in analyze table
10.Remove unsupported pushdown filter:
    10.1. when we create KylinFileSourceScanExec, we didn't remove subquery filter.
    10.2. KylinFileSourceScanExec doesn't inherit from FileSourceScanExec, we miss chance to correct push down filter.
11. native support floor_datetime and ceil_datetime
12. native support kap_add_months and kap_months_between
13. native support _ymdint_between
14. native support truncate
15. native support kylin_split_part
16. native support kylin instr
pfzhan pushed a commit to Kyligence/kylin that referenced this pull request Jul 31, 2024
1. Disable gluten in analyze table
2. Gluten support delta 2.3.0
- apache/incubator-gluten#5902
- apache/incubator-gluten#5945
3. Support sum0
4. Fix build
- apache/incubator-gluten#5796
- apache/incubator-gluten#5767
5. Fix SlowQueryDetectorTest.testSparderTimeoutCancelJob due to context dirty
6. Cleanup threadlocal contexts
7. Fix GMT+8 not support
8. Fix case class test coverage
9. Add 1 gluten disabled case in analyze table
10.Remove unsupported pushdown filter:
    10.1. when we create KylinFileSourceScanExec, we didn't remove subquery filter.
    10.2. KylinFileSourceScanExec doesn't inherit from FileSourceScanExec, we miss chance to correct push down filter.
11. native support floor_datetime and ceil_datetime
12. native support kap_add_months and kap_months_between
13. native support _ymdint_between
14. native support truncate
15. native support kylin_split_part
16. native support kylin instr
pfzhan pushed a commit to pfzhan/kylin that referenced this pull request Jul 31, 2024
1. Disable gluten in analyze table
2. Gluten support delta 2.3.0
- apache/incubator-gluten#5902
- apache/incubator-gluten#5945
3. Support sum0
4. Fix build
- apache/incubator-gluten#5796
- apache/incubator-gluten#5767
5. Fix SlowQueryDetectorTest.testSparderTimeoutCancelJob due to context dirty
6. Cleanup threadlocal contexts
7. Fix GMT+8 not support
8. Fix case class test coverage
9. Add 1 gluten disabled case in analyze table
10.Remove unsupported pushdown filter:
    10.1. when we create KylinFileSourceScanExec, we didn't remove subquery filter.
    10.2. KylinFileSourceScanExec doesn't inherit from FileSourceScanExec, we miss chance to correct push down filter.
11. native support floor_datetime and ceil_datetime
12. native support kap_add_months and kap_months_between
13. native support _ymdint_between
14. native support truncate
15. native support kylin_split_part
16. native support kylin instr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Support CH backend parquet + delta
3 participants