Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] parquet file metadata columns support in velox #3870

Merged
merged 13 commits into from
Mar 14, 2024

Conversation

gaoyangxiaozhu
Copy link
Contributor

@gaoyangxiaozhu gaoyangxiaozhu commented Nov 28, 2023

What changes were proposed in this pull request?

Support file metadata column velox native access, it is requirement of Delta native support for Delta Microsoft Team asked.

Fixes: #2618

How was this patch tested?

Via UT and manually tests

image

dependency PR in velox is facebookincubator/velox#8800 and is merged

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

@FelixYBW FelixYBW changed the title file metadata columns support in velox [VL] file metadata columns support in velox Nov 28, 2023
Copy link

Run Gluten Clickhouse CI

7 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

@gaoyangxiaozhu gaoyangxiaozhu force-pushed the gayangya/metadatacolumns branch from d19534a to 43fc8bc Compare December 5, 2023 07:54
Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

1 similar comment
Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

@gaoyangxiaozhu gaoyangxiaozhu force-pushed the gayangya/metadatacolumns branch from 5753e50 to 79ba0da Compare December 5, 2023 13:01
Copy link

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

2 similar comments
Copy link

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

@gaoyangxiaozhu gaoyangxiaozhu force-pushed the gayangya/metadatacolumns branch from 273bee9 to 913d780 Compare December 6, 2023 08:40
Copy link

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

3 similar comments
Copy link

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Contributor Author

@zhouyuan and @FelixYBW could you help check why centos7-test fail and give some input ? Meanwhile could you help have a draft review to see if current implement for metadata column native support good to you , if the implement ok for you guys , i will sync with meta velox guys for velox part PR review.

@yma11
Copy link
Contributor

yma11 commented Dec 19, 2023

@gaoyangxiaozhu Thanks for providing this support. The implementation seems okay. Please do a rebase and go ahead with corresponding support in Velox first.

Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Feb 13, 2024
Copy link

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@github-actions github-actions bot closed this Feb 24, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Contributor Author

various build issue , can you help re-trigger @yma11 / @zhouyuan / @zhli1142015
image

Copy link

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Contributor Author

gaoyangxiaozhu commented Mar 13, 2024

can we re-trigger again for failed job which all due to below @zhouyuan / @yma11 / @zhli1142015

image

Copy link

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Contributor Author

can we merge the PR ? @zhli1142015 / @yma11 / @zhouyuan

@zhouyuan
Copy link
Contributor

@gaoyangxiaozhu This is in good state to me. Will try with internal delta lake jenkins job
CC @zzcclp as this will also change the API for CK backend

@yma11
Copy link
Contributor

yma11 commented Mar 14, 2024

@gaoyangxiaozhu seems there is some conflicts and you need a rebase. Thanks.

Copy link

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Contributor Author

@gaoyangxiaozhu seems there is some conflicts and you need a rebase. Thanks.

done

Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zhli1142015 zhli1142015 merged commit 1fbd9e6 into apache:main Mar 14, 2024
17 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3870_time.csv log/native_master_03_13_2024_d7ed0844e_time.csv difference percentage
q1 35.41 38.81 3.399 109.60%
q2 25.71 24.06 -1.644 93.60%
q3 37.01 38.18 1.169 103.16%
q4 40.14 38.49 -1.657 95.87%
q5 68.03 69.71 1.671 102.46%
q6 7.43 7.45 0.026 100.35%
q7 83.15 82.46 -0.695 99.16%
q8 85.00 83.21 -1.785 97.90%
q9 122.38 121.83 -0.549 99.55%
q10 44.07 44.36 0.294 100.67%
q11 19.80 20.90 1.101 105.56%
q12 26.65 28.06 1.413 105.30%
q13 48.44 46.88 -1.561 96.78%
q14 22.24 21.98 -0.259 98.83%
q15 32.04 33.08 1.046 103.26%
q16 14.72 13.84 -0.883 94.00%
q17 100.17 101.80 1.629 101.63%
q18 142.74 141.05 -1.683 98.82%
q19 13.54 15.07 1.529 111.30%
q20 29.49 27.03 -2.464 91.64%
q21 227.01 229.52 2.510 101.11%
q22 15.28 13.86 -1.422 90.69%
total 1240.45 1241.63 1.182 100.10%

taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Mar 25, 2024
[VL]  parquet file metadata columns support in velox.

Co-authored-by: Zhen Li <[email protected]>
taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Oct 8, 2024
[VL]  parquet file metadata columns support in velox.

Co-authored-by: Zhen Li <[email protected]>
taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Oct 9, 2024
[VL]  parquet file metadata columns support in velox.

Co-authored-by: Zhen Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL][Spark 3.3+] support return metadataColumns from native scan insteads of fallback
5 participants