Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add key-value file! #2

Merged
merged 19 commits into from
Mar 13, 2023
Merged

add key-value file! #2

merged 19 commits into from
Mar 13, 2023

Conversation

ruclz
Copy link
Contributor

@ruclz ruclz commented Oct 24, 2022

Oceanbase sort needs key-value struct. But the avx512-64bits-qsort only supports key sort. We need key-value sort in this lib.

@ruclz ruclz marked this pull request as draft October 24, 2022 07:32
@xtangxtang
Copy link

xtangxtang commented Nov 17, 2022

The PR is to meet the requirement from Oceanbase.

Oceanbase is a Key/Value store Database. It uses std::sort to sort the object during creating index or other jobs. Not like just sort the key, Oceanbase passes inner ObStore class object as an array to do the sort job. And the std::sort callback comparing function to determine how to get the key from the ObStore to compare. So in Oceanbase, it swaps the ObStore object index in the array according to the sort function.

Since the avx512 sort in this project do a wondful work in bitonic sort, we want to reuse the core code. But the sort API just do the sort work, doesn't return the indexes of the key after sorted, it is hard for us to integrated the code into Oceanbase. So we create the PR to modify the sort API from void avx512_qsort(uint16_t *arr, int64_t arrsize) to void avx512_qsort(uint16_t *arr,uint64_t *indexes, int64_t arrsize'). The returned indexes is used to record the sorted key indexes and help API caller to swap the its objec array outside.

Currently we have created a PR in Oceanbase to merge this project code as a 3rd party lib, now is wating for this PR approval.

We did Create index performance test for Oceanbase 3.1 version:
creating index can speed up from 20% ~ 47% according to different index type. The performance workload is SSB. The table and creating index steps is as following SQL:

create table lineitem (
l_orderkey bigint not null,
l_partkey bigint not null,
l_suppkey bigint not null,
l_linenumber bigint not null,
l_quantity bigint not null,
l_extendedprice bigint not null,
l_discount bigint not null,
l_tax bigint not null,
l_returnflag char(1) default null,
l_linestatus char(1) default null,
l_shipdate date not null,
l_commitdate date default null,
l_receiptdate date default null,
l_shipinstruct char(25) default null,
l_shipmode char(10) default null,
l_comment varchar(44) default null,
primary key(l_orderkey, l_linenumber))
tablegroup = tpch_tg_100g_lineitem_order_group
partition by key (l_orderkey) partitions 192;

create index I_L_ORDERKEY on lineitem(l_orderkey) local;
create index I_L_SHIPINS on lineitem(l_shipinstruct) local;
create index I_L_SHIPDATE_COMMITDATE on lineitem(l_shipdate, l_commitdate) local;

drop index I_L_ORDERKEY on lineitem;
drop index I_L_SHIPINS on lineitem;
drop index I_L_SHIPDATE_COMMITDATE on lineitem;    

The performance is test on Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz and the results is (unit is second):

The performance result is as following:
image

@@ -0,0 +1,57 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file.

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood, you are essentially adding argsort. The modified indices would sort the original array. Correct? Since this is a new feature, could you add new unit tests that cover this feature?

In general, the PR needs a lot of formatting changes. The diff is pretty hard to read. Could you fix the formatting and run clang-format on the files and re-submit? The formatting specifications are in a _clang-format file.

@r-devulap
Copy link
Contributor

On a second thought, argsort seems like a special case. You are sorting both key and value arrays.

@ruclz
Copy link
Contributor Author

ruclz commented Jan 29, 2023

As far as I understood, you are essentially adding argsort. The modified indices would sort the original array. Correct? Since this is a new feature, could you add new unit tests that cover this feature?

In general, the PR needs a lot of formatting changes. The diff is pretty hard to read. Could you fix the formatting and run clang-format on the files and re-submit? The formatting specifications are in a _clang-format file.

image

Hello, I run the clang_format execution in the project, but I got the error like the figure, can you have some suggests to solve it? Also, the unit tests can't run on my environment including the unit tests that you have completed, can you share your environment to me, like OS, compiler version and so on.

@ruclz
Copy link
Contributor Author

ruclz commented Jan 29, 2023

On a second thought, argsort seems like a special case. You are sorting both key and value arrays.

OK, we are sorting key, and the value column exchange with each other according to key.

@r-devulap
Copy link
Contributor

Hello, I run the clang_format execution in the project, but I got the error like the figure, can you have some suggests to solve it?

Hmm, not sure why you see that error. I am able to run clang-format on your patch just fine.

Also, the unit tests can't run on my environment including the unit tests that you have completed, can you share your environment to me, like OS, compiler version and so on.

Do you see an error? I am using ubuntu latest, g++-10 and running on a SKX. If your processor doesnt support AVX-512, you could run on an SDE too with this command: sde-external-9.14.0-2022-10-25-lin/sde64 -icx -- ./testexe

@ruclz
Copy link
Contributor Author

ruclz commented Feb 13, 2023

OK, I added a unittest for key-value sort, and I ran clang_format on all files I modified. Please help to review. Thanks.

@r-devulap r-devulap mentioned this pull request Feb 17, 2023
@r-devulap
Copy link
Contributor

Hi @ruclz This PR modifies the existing avx512_qsort<T>() function signature by adding another argument to the function and also adds unnecessary code if someone wants to use only data sorting and not key-value sort. I think a better option would be to move your code to new header files avx512_64bit_keyvaluesort.hpp and avx512-common-keyvaluesort.h . I understand there might be some code duplication but we can work on minimizing that in a separate PR.

@r-devulap r-devulap marked this pull request as ready for review February 21, 2023 22:44
Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move code to separate files and add new API's avx512_keyvaluesort<T>().

@ruclz
Copy link
Contributor Author

ruclz commented Feb 23, 2023

Hi, @r-devulap, I have moved my code to new header files avx512-64bit-keyvaluesort.hpp and avx512-common-keyvaluesort.h. The unit test can run correctly, please help to review, thanks.

@ruclz
Copy link
Contributor Author

ruclz commented Feb 23, 2023

Sorry, I forgot to merge the test_all.cpp in the last commit.

Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working on reviewing the code, meanwhile lets clean up all the duplicated code.

src/avx512-64bit-keyvaluesort.hpp Outdated Show resolved Hide resolved
src/avx512-64bit-keyvaluesort.hpp Show resolved Hide resolved
qsort_64bit_<vtype>(keys, indexes, pivot_index, right, max_iters - 1);
}

X86_SIMD_SORT_INLINE int64_t replace_nan_with_inf_kv(double *arr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same function as in avx512-64bit-qsort.hpp. You could move this to the common header file as well and no need to re-write it.

src/avx512-64bit-keyvaluesort.hpp Outdated Show resolved Hide resolved
#else
#define X86_SIMD_SORT_INLINE static
#define X86_SIMD_SORT_FINLINE static
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines 36-82 is duplicated. Move them to new header file and include that here and avx512-common-qsort.h

@@ -695,7 +695,7 @@ void avx512_qsort(int16_t *arr, int64_t arrsize)
}

template <>
void avx512_qsort(uint16_t *arr, int64_t arrsize)
inline void avx512_qsort(uint16_t *arr, int64_t arrsize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. get rid of inline please.

src/avx512-32bit-qsort.hpp Outdated Show resolved Hide resolved
@@ -691,7 +691,7 @@ void avx512_qsort<int32_t>(int32_t *arr, int64_t arrsize)
}

template <>
void avx512_qsort<uint32_t>(uint32_t *arr, int64_t arrsize)
inline void avx512_qsort<uint32_t>(uint32_t *arr, int64_t arrsize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

src/avx512-32bit-qsort.hpp Outdated Show resolved Hide resolved

template <>
inline void
avx512_qsort_kv<int64_t>(int64_t *keys, uint64_t *indexes, int64_t arrsize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are indices always expected to be a uint64_t ?

@r-devulap
Copy link
Contributor

@ruclz Could you add benchmarking to know how fast this is relative to using std::sort? From issue #8:

avx512_qsort_kv(double *keys, uint64_t *indexes, int64_t arrsize) is a little worse than my simple bubble sorting.

@ruclz
Copy link
Contributor Author

ruclz commented Feb 28, 2023

@ruclz Could you add benchmarking to know how fast this is relative to using std::sort? From issue #8:

avx512_qsort_kv(double *keys, uint64_t *indexes, int64_t arrsize) is a little worse than my simple bubble sorting.

OK, I am coding for benchmarking.

@ruclz
Copy link
Contributor Author

ruclz commented Mar 1, 2023

@ruclz Could you add benchmarking to know how fast this is relative to using std::sort? From issue #8:

avx512_qsort_kv(double *keys, uint64_t *indexes, int64_t arrsize) is a little worse than my simple bubble sorting.

image

The figure shows the performance of key-value bitonic sort and std::sort. There is no performance drop compared with std::sort.


for (uint64_t ii = 0; ii < iters; ++ii) {
start = cycles_start();
std::sort(sortedaar_bckup.begin(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if std::sort is a fair comparison. The way the key-values are arranged in memory for std::vector<sorted_t<K, V>> is very different when compared two distinct vectors std::vector<K>, std::vector<V>. I would think this can alter performance significantly because you can no longer load continuous chunk of memory into ZMM registers. Does Oceanbase have two separate vectors for key and values (I assume they do cos that's the way you implemented the sort). How does Oceanbase currently sort them?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObStoreRow is the base unit to do the sort in Oceanbase. The key and value are all ObObject class type in the ObStoreRow. There is no sperate vector for key and value. The ObStoreRow is passed to the std::sort in Oceanbase. std::sort first get the key type from ObObject and then do the sort work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To enable the AVX512 sort, we need extract the key from the ObStoreRow and then pass the key and ObStoreRow as the value to avx512_sort(key, value) like API. That's way we need the key-value interface

benchmarks/bench-tgl.out Outdated Show resolved Hide resolved
src/avx512-64bit-keyvaluesort.hpp Outdated Show resolved Hide resolved
…12-64bit-common.h, modify the bench-tgl.out to origin file.
@ruclz
Copy link
Contributor Author

ruclz commented Mar 9, 2023

We have moved the get_povit_64bit function and sort_zmm_64bit function to avx512-64bit-common.h, and modified the bench-tgl.out to origin file. Please help to review again, thanks.

@r-devulap
Copy link
Contributor

@ruclz please fix the author name and get rid of the unnecessary struct and we should be good to go.

@ruclz
Copy link
Contributor Author

ruclz commented Mar 10, 2023

Yeah, we have fixed the authors name in the last commit.

@r-devulap
Copy link
Contributor

LGTM, Thanks @ruclz!

@r-devulap r-devulap merged commit 28c0b3e into intel:main Mar 13, 2023
@r-devulap
Copy link
Contributor

Did a little bit of code cleanup, see #15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants