-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3922][CH] Fix incorrect shuffle hash id value when executing modulo #3923
Conversation
Run Gluten Clickhouse CI |
auto res = hash_int32 % parts_num_int32; | ||
if (res < 0) | ||
{ | ||
res += parts_num_int32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using pmod and cast function in CH. Do not repeat already implemented logcis.
Run Gluten Clickhouse CI |
perf tests: |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
perf tests: |
…modulo Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
Run Gluten Clickhouse CI |
perf tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
…modulo (apache#3923) Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark. Close apache#3922.
What changes were proposed in this pull request?
Fix incorrect shuffle hash id value when executing modulo. In CH Backend, the data type of the shuffle split num is a UInt32 and the returned type of the hash function is a UInt64, when the returned value of the hash function is more than 2^31 - 1, the modulo value of the hash value and the shuffle split num is different from the one of the vanilla spark.
Close #3922.
(Fixes: #3922)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)