Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to Expose partition_unrolled Function for Public Use #160

Open
game-difficulty opened this issue Aug 7, 2024 · 4 comments
Open

Comments

@game-difficulty
Copy link

Dear x86-simd-sort maintainers,

I am currently working on a high-performance sorting algorithm to handle billions of uint64_t data entries. To optimize the sorting process, I am leveraging parallel execution and the efficient sorting capabilities provided by the x86-simd-sort library.

In my implementation, I need to partition the data efficiently based on specific pivot values. The partition_unrolled function defined in src/xss-common-qsort.h appears to be highly optimized for this purpose. However, this function is not currently exposed for public use, making it challenging to directly utilize its capabilities.

Here is a simplified version of my parallel sorting implementation where efficient partitioning is critical:

extern "C" void parallel_sort(uint64_t * arr, size_t size, const std::array<uint64_t, 3>& pivots) {

    auto partition = [](uint64_t* arr, size_t low, size_t high, uint64_t pivot) {
        size_t i = low;
        size_t j = high;
        while (true) {
            while (i <= high && arr[i] <= pivot) i++;
            while (j >= low && arr[j] > pivot) j--;
            if (i >= j) break;
            std::swap(arr[i], arr[j]);
            i++;
            j--;
        }
        return i;
    };

    size_t mid1 = partition(arr, 0, size - 1, pivots[1]);
#pragma omp parallel sections
    {
#pragma omp section
        {size_t low_mid = partition(arr, 0, mid1 - 1, pivots[0]);}
#pragma omp section
        {size_t high_mid = partition(arr, mid1, size - 1, pivots[2]);}
    }
#pragma omp parallel sections
    {
#pragma omp section
        {x86simdsort::qsort(arr, low_mid, false);}
#pragma omp section
        {x86simdsort::qsort(arr + low_mid, mid1 - low_mid, false);}
#pragma omp section
        {x86simdsort::qsort(arr + mid1, high_mid - mid1, false);}
#pragma omp section
        {x86simdsort::qsort(arr + high_mid, size - high_mid, false);}
    }
}

Unfortunately, the custom partition function I wrote is quite inefficient, accounting for more than 60% of the total sorting time. This inefficiency diminishes the overall performance gains from parallel sorting.

To address this, I believe that making the partition_unrolled function available for public use would significantly enhance the library's utility and allow developers to achieve better performance in their sorting algorithms. Would it be possible to expose this function in the public API of the x86-simd-sort library?

@game-difficulty
Copy link
Author

I've currently figured out how to modify something to call it, but the code is getting rather ugly

@r-devulap
Copy link
Contributor

r-devulap commented Aug 9, 2024

@game-difficulty could you not just add __attribute__((visibility("default"))) to the partition function and build the shared library? That should expose partition function per your requirement. EDIT: fixed incorrect attribute :)

@game-difficulty
Copy link
Author

Since the partition function is a template function, it might not work to export it with attribute((visibility("default"))) because template functions are instantiated only when used.

@r-devulap
Copy link
Contributor

Have you tried building with this line removed:

gnu_symbol_visibility : 'inlineshidden',

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants