Support hyper log log plus plus(HLL++) #17133

res-life · 2024-10-21T12:42:24Z

HLL++ description

First get a 64 bits hash code, and generate a interger pair: register index -> register value. The register index is in [0, 512) if precision is 9.
Second, merge in the same group:
For each register index, find the max register value.
So HLL++ is actually like a process of doing max aggregation 512 times on 512 interger columns.

e.g., below is 2 sketches in a group.

register index 0	register index 1	...	register index 511
1	22	...	44
11	2	...	4
Aggregation result is:

register index 0	register index 1	...	register index 511
11	22	...	44

reduce_by_key memory issue

thrust::reduce_by_key will allocates n sketch values (n is the num_rows) which are large, typically each sketch is 512 integers(512 * 4 = 2K bytes). If there are 1G rows, it means reduce_by_key will use 2k * 1G = 2T intermidate memory.

link

    // scan the values by flag
    thrust::detail::temporary_array<ValueType,ExecutionPolicy> scanned_values(exec, n);

thrust::reduce_by_key is not suitable for merge HLL sketches.
This PR uses a new way to do aggregate.

aggregation steps this PR

Step1:
Partial merge:
Because group labels is sorted, so the group labels in each segment is also sorted.
Split rows into small segments with each segment contains 256 items.
One thread handles one segement.
In each segment, produces a cache recording the max values. It only uses N/256 caches which is relatively small memory. When scaning the items in a segment, when meets a new group, outputs the result for the previous group.

Step 2:
Merge the cache and the result to get the final result.

For more details, please refer to the comments in the code.

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2024-10-21T12:42:28Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

vyasr · 2024-10-22T18:15:49Z

Please link to #10652 as appropriate.

ttnghia · 2024-11-01T03:19:58Z

cpp/include/cudf/aggregation.hpp

+    MERGE_TDIGEST,    ///< create a tdigest by merging multiple tdigests together
+    HISTOGRAM,        ///< compute frequency of each element
+    MERGE_HISTOGRAM,  ///< merge partial values of HISTOGRAM aggregation
+    HLLPP,  ///< approximating the number of distinct items by using hyper log log plus plus (HLLPP)


Suggested change

HLLPP, ///< approximating the number of distinct items by using hyper log log plus plus (HLLPP)

HLLPP, ///< approximating the number of distinct items using HyperLogLogPlusPlus

ttnghia · 2024-11-01T03:29:49Z

cpp/include/cudf/hashing/detail/xxhash_64.cuh

+using hash_value_type = uint64_t;
+
+template <typename Key>
+struct XXHash_64 {


This seems to be moved from the source file, which was added in #13612. However, I'm confused with the other XXHash_64 added in NVIDIA/spark-rapids-jni#1248. What is the difference between them? Is this XXHash_64 enough for addressing Spark's behavior in HLLPP?

ttnghia · 2024-11-01T03:44:45Z

cpp/src/groupby/sort/aggregate.cpp

+  int const precision =
+    dynamic_cast<cudf::detail::merge_hyper_log_log_aggregation const&>(agg).precision;
+


Suggested change

int const precision =

dynamic_cast<cudf::detail::merge_hyper_log_log_aggregation const&>(agg).precision;

int const precision =

dynamic_cast<cudf::detail::merge_hyper_log_log_aggregation const&>(agg).precision;

Sorry, my eyes aren't able to pick up the difference suggested here. What's the change?

I suspect removal of the blank line, but github has a bug that doesn't show it.

cpp/src/groupby/sort/group_hyper_log_log_plus_plus.cu

Signed-off-by: Chong Gao <[email protected]>

res-life · 2024-11-26T08:09:50Z

/ok to test

res-life · 2024-11-26T10:38:33Z

Ready to review except test cases.

github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue Java Affects Java cuDF API. labels Oct 21, 2024

res-life requested a review from ttnghia October 21, 2024 12:42

res-life force-pushed the hll-new branch 3 times, most recently from bb2bfc2 to dfb3bc3 Compare October 30, 2024 08:43

github-actions bot assigned res-life Oct 30, 2024

res-life force-pushed the hll-new branch 3 times, most recently from 171e417 to 5d96885 Compare October 31, 2024 12:47

ttnghia reviewed Nov 1, 2024

View reviewed changes

cpp/src/groupby/sort/group_hyper_log_log_plus_plus.cu Outdated Show resolved Hide resolved

res-life changed the title ~~[Do not review] Support hyper log log plus plus(HLL++)~~ Support hyper log log plus plus(HLL++) Nov 5, 2024

Chong Gao added 3 commits November 21, 2024 13:27

Support hyper log log plus plus(HLL++)

06512e9

Signed-off-by: Chong Gao <[email protected]>

Improve: use shared memory

abb4cad

Reduction for hllpp

77ea21c

res-life force-pushed the hll-new branch from 5af14f5 to 77ea21c Compare November 21, 2024 07:33

Chong Gao added 2 commits November 22, 2024 18:34

Improve reduction

d3b6066

Refine code; Add comments

57efaff

res-life mentioned this pull request Nov 26, 2024

Add support for Hyper Log Log PLus Plus(HLL++) NVIDIA/spark-rapids#11638

Draft

Chong Gao added 2 commits November 26, 2024 15:13

Adjust configs to get better performance

57be29e

Update code comments; Minor changes

51ead98

res-life changed the base branch from branch-24.12 to branch-25.02 November 26, 2024 07:52

res-life added non-breaking Non-breaking change feature request New feature or request labels Nov 26, 2024

Format code

8e0ff01

Use has_nested_nulls; fix compile error

884efe0

Chong Gao added 2 commits November 27, 2024 14:35

update xxhash64 for hllpp

41f4ea2

Merge branch 'branch-25.02' into hll-new

59ee807

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support hyper log log plus plus(HLL++) #17133

Support hyper log log plus plus(HLL++) #17133

res-life commented Oct 21, 2024 •

edited

Loading

copy-pr-bot bot commented Oct 21, 2024

vyasr commented Oct 22, 2024

ttnghia Nov 1, 2024

res-life Nov 26, 2024

ttnghia Nov 1, 2024

ttnghia Nov 1, 2024

mythrocks Nov 5, 2024

wence- Nov 6, 2024

res-life commented Nov 26, 2024

res-life commented Nov 26, 2024

	HLLPP, ///< approximating the number of distinct items by using hyper log log plus plus (HLLPP)
	HLLPP, ///< approximating the number of distinct items using HyperLogLogPlusPlus

		int const precision =
		dynamic_cast<cudf::detail::merge_hyper_log_log_aggregation const&>(agg).precision;

Support hyper log log plus plus(HLL++) #17133

Are you sure you want to change the base?

Support hyper log log plus plus(HLL++) #17133

Conversation

res-life commented Oct 21, 2024 • edited Loading

HLL++ description

reduce_by_key memory issue

aggregation steps this PR

Description

Checklist

copy-pr-bot bot commented Oct 21, 2024

vyasr commented Oct 22, 2024

ttnghia Nov 1, 2024

Choose a reason for hiding this comment

res-life Nov 26, 2024

Choose a reason for hiding this comment

ttnghia Nov 1, 2024

Choose a reason for hiding this comment

ttnghia Nov 1, 2024

Choose a reason for hiding this comment

mythrocks Nov 5, 2024

Choose a reason for hiding this comment

wence- Nov 6, 2024

Choose a reason for hiding this comment

res-life commented Nov 26, 2024

res-life commented Nov 26, 2024

res-life commented Oct 21, 2024 •

edited

Loading