-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes quicksort and quickselect to use template based sorting networks #61
Conversation
6830beb
to
70424a6
Compare
We will want to use
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome!
src/xss-network-qsort.hpp
Outdated
int64_t num_to_write | ||
= std::min((int64_t)std::max(0, N - i * vtype::numlanes), | ||
(int64_t)vtype::numlanes); | ||
typename vtype::opmask_t load_mask = ((0x1ull << num_to_write) - 0x1ull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use vtype::get_partial_loadmask
instead. See
x86-simd-sort/src/avx512-32bit-qsort.hpp
Line 259 in 0890de5
static opmask_t get_partial_loadmask(int size) |
You also won't need num_to_write
to be a int64_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might need to update that function. It won't work when num_to_write == numlanes
src/xss-network-qsort.hpp
Outdated
int64_t num_to_write | ||
= std::min((int64_t)std::max(0, N - i * vtype::numlanes), | ||
(int64_t)vtype::numlanes); | ||
typename vtype::opmask_t load_mask = ((0x1ull << num_to_write) - 0x1ull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Use vtype::get_partial_loadmask
.
src/avx512-common-qsort.h
Outdated
@@ -738,4 +819,37 @@ inline void avx512_partial_qsort_fp16(uint16_t *arr, | |||
avx512_qselect_fp16(arr, k - 1, arrsize, hasnan); | |||
avx512_qsort_fp16(arr, k - 1); | |||
} | |||
|
|||
template <typename vtype, typename type_t> | |||
X86_SIMD_SORT_INLINE type_t get_pivot_64bit(type_t *arr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move these template specializations to their respective files.
src/avx512-common-qsort.h
Outdated
const int64_t right); | ||
|
||
template <typename vtype, typename type_t> | ||
X86_SIMD_SORT_INLINE type_t get_pivot(type_t *arr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need this once you move the specialized functions to their own files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes, otherwise looks good.
numpy/numpy#24498 was merged. Could you rebase with main and update? Also, use x86-simd-sort/src/avx512-common-qsort.h Line 98 in b9f9340
|
d5d5a90
to
4ff3e1d
Compare
…sks; also a few smaller changes
4ff3e1d
to
70733ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One last fix and it will be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks @sterrettm2 :)
This changes quicksort and quickselect to use generic template based sorting networks, instead of the current implementations. It also changed the 32-bit qsort and qselect to use a 256 element sorting network instead of a 128 element sorting network. Here is a performance comparison for these changes: