Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6026][VL] Add Support for HiveFileFormat parquet write for Spark 3.4+ #6062

Merged
merged 16 commits into from
Jun 14, 2024

Conversation

surnaik
Copy link
Contributor

@surnaik surnaik commented Jun 12, 2024

What changes were proposed in this pull request?

Adds support of HiveFileFormat write when output file type is of type parquet

(Fixes: #6026 )

How was this patch tested?

Fixed the existing UTs to check whether pushed to Velox.
Checked manually in Spark.

Copy link

#6026

@surnaik
Copy link
Contributor Author

surnaik commented Jun 12, 2024

Please take a look - @ulysses-you , Thanks!!

@ulysses-you
Copy link
Contributor

thank you @surnaik , it seems tests did not pass, can you take a look ?

Copy link

Run Gluten Clickhouse CI

@surnaik
Copy link
Contributor Author

surnaik commented Jun 13, 2024

thank you @surnaik , it seems tests did not pass, can you take a look ?

This is strange, locally passing, checking the logs, it says nativeUsed was false, investigating further

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@surnaik
Copy link
Contributor Author

surnaik commented Jun 13, 2024

thank you @surnaik , it seems tests did not pass, can you take a look ?

Found the issue, have raised the fix, waiting for CI to pass.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

ulysses-you
ulysses-you previously approved these changes Jun 14, 2024
Copy link
Contributor

@ulysses-you ulysses-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm if tests pass, thank you

Copy link

Run Gluten Clickhouse CI

@surnaik
Copy link
Contributor Author

surnaik commented Jun 14, 2024

@ulysses-you could you please take a look once again, all the tests are now passing. Thanks for your time!

@surnaik
Copy link
Contributor Author

surnaik commented Jun 14, 2024

@PHILO-HE could you please take a look as well. Thanks!

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! Thanks!

@PHILO-HE PHILO-HE changed the title [GLUTEN-6026][VL]Add Support for HiveFileFormat Write for Spark 3.4+ [GLUTEN-6026][VL] Add Support for HiveFileFormat parquet write for Spark 3.4+ Jun 14, 2024
@ulysses-you ulysses-you merged commit 284b304 into apache:main Jun 14, 2024
39 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6062_time.csv log/native_master_06_13_2024_142cf0fbc_time.csv difference percentage
q1 34.30 35.57 1.270 103.70%
q2 26.52 23.78 -2.740 89.67%
q3 36.96 39.61 2.650 107.17%
q4 34.43 32.76 -1.670 95.15%
q5 70.82 67.54 -3.279 95.37%
q6 8.13 9.42 1.285 115.80%
q7 80.49 80.69 0.205 100.25%
q8 87.27 84.77 -2.498 97.14%
q9 121.26 120.70 -0.558 99.54%
q10 44.03 45.25 1.219 102.77%
q11 22.21 19.79 -2.420 89.10%
q12 25.08 27.92 2.835 111.30%
q13 38.68 39.43 0.759 101.96%
q14 17.88 18.98 1.101 106.16%
q15 31.43 33.04 1.613 105.13%
q16 14.30 13.85 -0.454 96.83%
q17 103.23 101.57 -1.659 98.39%
q18 145.29 144.97 -0.325 99.78%
q19 13.93 13.79 -0.138 99.01%
q20 27.94 26.68 -1.262 95.48%
q21 260.91 260.83 -0.086 99.97%
q22 11.99 13.91 1.923 116.04%
total 1257.09 1254.85 -2.232 99.82%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Support HiveFileFormat writes using Velox backend
4 participants