Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Fix shuffle with round robin partitioning fail #5928

Merged
merged 1 commit into from
May 31, 2024

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented May 30, 2024

What changes were proposed in this pull request?

We should validate the project before sort rather than using ProjectExecTransformer directly. The hash expression may not support offload to native, e.g., the child output contains null type or something else.

The following query would fail:

spark.sql("select 1 as c, null as x").repartition().collect

How was this patch tested?

add test

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@ulysses-you
Copy link
Contributor Author

cc @zhztheplayer @marin-ma thank you

@marin-ma
Copy link
Contributor

Thanks. Seems like this failure is because we creates a hash computation for all input columns. If there are any NullType in the input, those column types will be converted to UNKNOWN Type in Velox, but Velox doesn't support hash computing on UNKNOWN types. I would suggest we keep this check, meanwhile drop the NullType input columns in the hash computation as the null values doesn't affect the hash computation (hash(x) and hash(x, null) produce the same result). Then we can avoid fallback for NullTypes.

@ulysses-you
Copy link
Contributor Author

@marin-ma Is null type the only unsupported data type for hash expression ? If so, I can drop null column before going to hash expression.

@ulysses-you
Copy link
Contributor Author

@marin-ma never mind.. I will drop null type column in this pr

Copy link
Contributor

@marin-ma marin-ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@marin-ma marin-ma merged commit 4c3fbb8 into apache:main May 31, 2024
38 checks passed
@ulysses-you ulysses-you deleted the shuffle branch May 31, 2024 01:28
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5928_time.csv log/native_master_05_30_2024_73eb21db45_time.csv difference percentage
q1 35.83 34.62 -1.208 96.63%
q2 23.77 23.80 0.032 100.13%
q3 36.59 37.04 0.442 101.21%
q4 32.06 32.22 0.157 100.49%
q5 71.18 70.41 -0.772 98.92%
q6 7.64 7.25 -0.389 94.90%
q7 81.41 81.31 -0.097 99.88%
q8 84.44 85.87 1.429 101.69%
q9 121.28 118.14 -3.147 97.41%
q10 46.62 45.74 -0.887 98.10%
q11 19.88 21.64 1.766 108.89%
q12 26.33 22.83 -3.505 86.69%
q13 52.00 52.00 -0.007 99.99%
q14 19.29 18.87 -0.429 97.78%
q15 30.87 32.69 1.815 105.88%
q16 14.89 13.77 -1.120 92.48%
q17 104.69 101.66 -3.033 97.10%
q18 144.80 143.98 -0.827 99.43%
q19 15.75 14.76 -0.994 93.69%
q20 26.41 28.58 2.177 108.24%
q21 259.28 265.55 6.271 102.42%
q22 11.90 14.49 2.588 121.74%
total 1266.93 1267.19 0.264 100.02%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants