Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Row based sort follow-up #6579

Merged
merged 13 commits into from
Jul 29, 2024
Merged

Conversation

marin-ma
Copy link
Contributor

@marin-ma marin-ma commented Jul 24, 2024

  1. Enable radix sort. TPCH sf1T 1342s -> 1300s
  2. Fix spill coredump. (using velox StlAllocator will cause coredump, replace the std::vector with raw buffer).
  3. Refine metrics. Add "shuffle wall time" to calculate the overall shuffle write time.
  4. Cut the memory overhead from sort buffer allocation in half.

Unresolved issues:

  1. Perf regression for useRadixSort=false. This is likely caused by using stdlib qsort to replace the std::sort, as the std::vector is replaced with raw buffer to avoid coredump.
  2. Less perf gain on TPCH sf3T because of spill.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

5 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@marin-ma marin-ma force-pushed the row-based-sort-followup branch from 0b5d400 to a9d3457 Compare July 29, 2024 03:52
Copy link

Run Gluten Clickhouse CI

namespace gluten {

template <typename Element>
class RadixSort {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the file a 1:1 porting from vanilla Spark's Java code? If so can we add some comments somewhere in code to note that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants