Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings::contains() for multiple scalar search targets #16641

Closed

Conversation

res-life
Copy link
Contributor

@res-life res-life commented Aug 22, 2024

Description

This is based on #15536
Added three optimizations:

  • For short strings, handles multiple targets for a string in one thread to improve memory access.
    For each index of the string, sequentially search each target.
  for (auto str_byte_idx = 0; str_byte_idx < d_str.size_bytes(); ++str_byte_idx) {  // iterate the start index in the string
    for (auto target_idx = 0; target_idx < num_targets; ++target_idx) { // iterate the targets
  • For long strings, leverage the warp parallel approach, but instead of one target in a warp, this PR handles multiple targets in a warp. This also aims to improve memory access.
for (size_t target_idx = 0; target_idx < num_targets; target_idx++) {
 for (auto i = lane_idx; ...... ; i += cudf::detail::warp_size) {
  • Index the first chars in the targets
    This makes the searching for short strings(<=64) very fast.
/**
 * Execute multi contains for short strings
 * First index the first char for all targets.
 * Index the first char:
 *   collect first char for all targets and do uniq and sort,
 *   then index the targets for the first char.
 *   e.g.:
 *     targets: xa xb ac ad af
 *     first char set is: (a, x)
 *     index result is:
 *       {
 *         a: [2, 3, 4],   // indexes for: ac ad af, [2,3,4] is the target indexes
 *         x: [0, 1]       // indexes for: xa xb, [0, 1] is the target indexes
 *       }
 * when do searching:
 *   find (binary search) from `first char set` for a char in string:
 *     if char in string is not in ['a', 'x'], fast skip
 *     if char in string is 'x', then only need to try ["xa", "xb"] targets.
 *     if char in string is 'a', then only need to try ["ac", "ad", "af"] targets.
 *
 */

In this way, when checking the first char in a string for all targets, previously we need to compare n times.
After this change, we only need log(n) times by using binary search.
Original:

for c in string:
  for target in targets:
    // compare the first char
    ...
    // compare the 2nd ~ end char.

Now:

for c in string:
  // compare the first char by binary search
  int[] first_char_matched_targets = binary_search(firs_char_set_in_targets)  

  for (target in first_char_matched_targets) {
       // compare the 2nd ~ end char. 
  }
  ...

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

mythrocks and others added 2 commits August 22, 2024 16:50
This commit adds a new `strings::contains()` overload that allows
for the search of multiple scalar search targets in the same call.

The trick here is that a new kernel has been introduced, to extend
the "string-per-warp" approach to search for multiple search keys
in the same kernel.

This approach allows CUDF to potentially reduce the number of kernels
launched for `string::contains()` by a factor of `N`, if all the
search keys can be specified in the same call.  This helps reduce
the kernel-launch overheads for processes that do large numbers
of calls to `string::contains()`.

Signed-off-by: MithunR <[email protected]>

Changed iteration order, for better cache performance.

More optimizations:

1. Removed calls to `thrust::fill()`. The bool values are now explicitly written in the kernel.
2. Switched host-device copy to use async.

Revert "More optimizations:"

This reverts commit c0e355c.

This commit was wrong: The thrust::fill() checks for empty target strings.
If removed, we'll need to check for empty target strings for every input string
row.
This was better done the old way.

More improvements:

1. Removed thrust::fill call. Setting values explicitly in the kernel.
2. Switched from using io::hostdevice_vector to rmm::device_uvector. The string_view allocation is tiny.

This has helped reduce the time spent in strings::contains().

For small strings, delegate to thread-per-string algo.
@res-life res-life self-assigned this Aug 22, 2024
@res-life res-life requested review from a team as code owners August 22, 2024 09:17
Copy link

copy-pr-bot bot commented Aug 22, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Java Affects Java cuDF API. labels Aug 22, 2024
@res-life res-life requested a review from ttnghia August 22, 2024 09:19
@res-life res-life added feature request New feature or request non-breaking Non-breaking change cuDF (Java) labels Aug 22, 2024
@res-life
Copy link
Contributor Author

res-life commented Aug 22, 2024

I got this result for benchmark test:

|  api   | row_width | num_rows | hit_rate | chars_size | Samples | CPU Time  | Noise  | GPU Time  | Noise | Elem/s | GlobalMem BW | BWUtil |
|--------|-----------|----------|----------|------------|---------|-----------|--------|-----------|-------|--------|--------------|--------|
| origin |        32 |  1953000 |       20 |   51065308 |    832x | 11.784 ms | 10.02% | 11.777 ms | 9.95% | 4.336G |   4.336 GB/s |  0.65% |
|    new |        32 |  1953000 |       20 |   51065308 |   1328x | 10.530 ms |  9.27% | 10.503 ms | 8.77% | 4.862G |   4.862 GB/s |  0.72% |
| origin |        64 |  1953000 |       20 |  102130616 |    384x | 39.089 ms |  2.48% | 39.084 ms | 2.48% | 2.613G |   2.613 GB/s |  0.39% |
|    new |        64 |  1953000 |       20 |  102130616 |    579x | 25.887 ms |  8.45% | 25.851 ms | 8.25% | 3.951G |   3.951 GB/s |  0.59% |
| origin |       128 |  1953000 |       20 |  204261232 |    383x | 39.168 ms |  8.19% | 39.143 ms | 8.17% | 5.218G |   5.218 GB/s |  0.78% |
|    new |       128 |  1953000 |       20 |  204261232 |    483x | 31.051 ms |  7.44% | 31.020 ms | 7.27% | 6.585G |   6.585 GB/s |  0.98% |
| origin |        32 |  1953000 |       80 |   59640595 |    624x | 14.563 ms |  9.22% | 14.556 ms | 9.14% | 4.097G |   4.097 GB/s |  0.61% |
|    new |        32 |  1953000 |       80 |   59640595 |    640x | 11.646 ms |  7.77% | 11.614 ms | 7.14% | 5.135G |   5.135 GB/s |  0.76% |
| origin |        64 |  1953000 |       80 |  119281190 |    378x | 39.727 ms |  4.76% | 39.710 ms | 4.72% | 3.004G |   3.004 GB/s |  0.45% |
|    new |        64 |  1953000 |       80 |  119281190 |    730x | 20.524 ms |  5.80% | 20.505 ms | 5.63% | 5.817G |   5.817 GB/s |  0.87% |
| origin |       128 |  1953000 |       80 |  238562380 |    375x | 40.057 ms |  5.65% | 40.050 ms | 5.65% | 5.957G |   5.957 GB/s |  0.89% |
|    new |       128 |  1953000 |       80 |  238562380 |    449x | 33.396 ms |  5.87% | 33.367 ms | 5.81% | 7.150G |   7.150 GB/s |  1.06% |

origin: call contains single target multiple times.
new: single call to handle multiple targets.

We get about 1.2x ~ 2x speed.

@res-life
Copy link
Contributor Author

/ok to test

@res-life res-life marked this pull request as draft August 22, 2024 10:08
@res-life
Copy link
Contributor Author

/ok to test

@mythrocks
Copy link
Contributor

mythrocks commented Aug 22, 2024

I'd better cc @davidwendt, whom I consulted as part of #15536. I'd feel a little better about 👍-ing my own work if he reviewed as well. :]

Thank you for taking this forward, @res-life.

@davidwendt
Copy link
Contributor

Will the comments from #15536 be addressed here?
For example: #15536 (comment) which is also referenced here #15536 (comment)

Signed-off-by: Chong Gao <[email protected]>
@res-life
Copy link
Contributor Author

res-life commented Aug 28, 2024

20 targets:

|   api    |  row_width  |  num_rows  |  hit_rate  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |           Diff |   %Diff |  Status  |
|----------|-------------|------------|------------|------------|-------------|------------|-------------|----------------|---------|----------|
| contains |     32      |   260000   |     20     |   1.985 ms |      11.18% |   1.468 ms |       4.98% |    -516.649 us | -26.03% |   FAIL   |
| contains |     64      |   260000   |     20     |   5.034 ms |       9.42% |   3.428 ms |       7.84% |   -1605.743 us | -31.90% |   FAIL   |
| contains |     128     |   260000   |     20     |   5.279 ms |       5.30% |   3.341 ms |       6.38% |   -1938.129 us | -36.71% |   FAIL   |
| contains |     256     |   260000   |     20     |   7.924 ms |      39.80% |   5.066 ms |       8.14% |   -2858.295 us | -36.07% |   FAIL   |
| contains |     512     |   260000   |     20     |  12.412 ms |       4.77% |  10.131 ms |      28.95% |   -2281.325 us | -18.38% |   FAIL   |
| contains |    1024     |   260000   |     20     |  21.000 ms |       2.64% |  15.017 ms |       5.66% |   -5983.068 us | -28.49% |   FAIL   |
| contains |     32      |  1953000   |     20     |  11.008 ms |       3.06% |  11.388 ms |       4.74% |     380.332 us |   3.46% |   FAIL   |
| contains |     64      |  1953000   |     20     |  32.544 ms |       2.02% |  25.464 ms |       2.68% |   -7080.123 us | -21.76% |   FAIL   |
| contains |     128     |  1953000   |     20     |  37.533 ms |       1.96% |  25.860 ms |       2.63% |  -11673.166 us | -31.10% |   FAIL   |
| contains |     256     |  1953000   |     20     |  55.251 ms |       1.26% |  38.541 ms |       1.12% |  -16710.174 us | -30.24% |   FAIL   |
| contains |     512     |  1953000   |     20     |  89.551 ms |       1.25% |  64.981 ms |       0.36% |  -24569.331 us | -27.44% |   FAIL   |
| contains |    1024     |  1953000   |     20     | 156.897 ms |       0.48% | 113.479 ms |       0.28% |  -43417.619 us | -27.67% |   FAIL   |
| contains |     32      |  16777216  |     20     |  92.424 ms |       0.82% |  95.961 ms |       0.35% |       3.537 ms |   3.83% |   FAIL   |
| contains |     64      |  16777216  |     20     | 281.194 ms |       0.49% | 220.198 ms |       2.10% |  -60995.860 us | -21.69% |   FAIL   |
| contains |     32      |   260000   |     80     |   2.036 ms |      11.27% |   1.155 ms |       3.22% |    -880.620 us | -43.26% |   FAIL   |
| contains |     64      |   260000   |     80     |   4.863 ms |       6.16% |   2.219 ms |       5.78% |   -2643.471 us | -54.36% |   FAIL   |
| contains |     128     |   260000   |     80     |   5.573 ms |       9.79% |   3.674 ms |       4.58% |   -1899.006 us | -34.08% |   FAIL   |
| contains |     256     |   260000   |     80     |   7.964 ms |      29.06% |   5.371 ms |       4.58% |   -2593.164 us | -32.56% |   FAIL   |
| contains |     512     |   260000   |     80     |  12.110 ms |       3.45% |   8.800 ms |       4.13% |   -3310.160 us | -27.33% |   FAIL   |
| contains |    1024     |   260000   |     80     |  21.181 ms |       7.49% |  15.685 ms |       4.33% |   -5496.013 us | -25.95% |   FAIL   |
| contains |     32      |  1953000   |     80     |  10.699 ms |       7.82% |   8.074 ms |       3.91% |   -2625.455 us | -24.54% |   FAIL   |
| contains |     64      |  1953000   |     80     |  30.158 ms |       3.32% |  15.044 ms |       4.32% |  -15114.107 us | -50.12% |   FAIL   |
| contains |     128     |  1953000   |     80     |  38.119 ms |       1.71% |  27.495 ms |       1.25% |  -10623.225 us | -27.87% |   FAIL   |
| contains |     256     |  1953000   |     80     |  55.647 ms |       2.55% |  40.471 ms |       5.45% |  -15176.148 us | -27.27% |   FAIL   |
| contains |     512     |  1953000   |     80     |  88.548 ms |       1.68% |  65.727 ms |       1.51% |  -22820.777 us | -25.77% |   FAIL   |
| contains |    1024     |  1953000   |     80     | 154.357 ms |       1.06% | 116.059 ms |       0.49% |  -38297.609 us | -24.81% |   FAIL   |
| contains |     32      |  16777216  |     80     |  85.989 ms |       2.62% |  69.670 ms |       3.14% |  -16319.696 us | -18.98% |   FAIL   |
| contains |     64      |  16777216  |     80     | 257.486 ms |       2.05% | 126.210 ms |       1.15% | -131276.699 us | -50.98% |   FAIL   |

When row_width is 32, did not get improvement.

Comment on lines 439 to 443
auto const idx = static_cast<size_type>(threadIdx.x + blockIdx.x * blockDim.x);
if (idx >= (num_rows * cudf::detail::warp_size)) { return; }

auto const lane_idx = idx % cudf::detail::warp_size;
auto const str_idx = idx / cudf::detail::warp_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The num_rows * cudf::detail::warp_size can overflow size_type

Suggested change
auto const idx = static_cast<size_type>(threadIdx.x + blockIdx.x * blockDim.x);
if (idx >= (num_rows * cudf::detail::warp_size)) { return; }
auto const lane_idx = idx % cudf::detail::warp_size;
auto const str_idx = idx / cudf::detail::warp_size;
auto const idx = cudf::detail::grid_1d::global_thread_id();
auto const str_idx = idx / cudf::detail::warp_size;
if (str_idx >= num_rows) { return; }
auto const lane_idx = idx % cudf::detail::warp_size;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@davidwendt
Copy link
Contributor

davidwendt commented Sep 11, 2024

The benchmark run fails with combine=true with the warp_parallel kernel enabled:

$ benchmarks/STRINGS_NVBENCH -d 0 -b find_string --axis api=contains
RMM memory resource = pool
CUIO host memory resource = pinned_pool
# Devices

## [0] `Quadro GV100`
* SM Version: 700 (PTX Version: 700)
* Number of SMs: 80
* SM Default Clock Rate: 1627 MHz
* Global Memory: 15872 MiB Free / 32491 MiB Total
* Global Memory Bus Peak: 870 GB/sec (4096-bit DDR @850MHz)
* Max Shared Memory: 96 KiB/SM, 48 KiB/Block
* L2 Cache Size: 6144 KiB
* Maximum Active Blocks: 32/SM
* Maximum Active Threads: 2048/SM, 1024/Block
* Available Registers: 65536/SM, 65536/Block
* ECC Enabled: No

# Log

Run:  [1/36] find_string [Device=0 api=contains row_width=32 num_rows=260000 hit_rate=20]
Pass: Cold: 0.370707ms GPU, 0.375213ms CPU, 0.67s total GPU, 0.71s total wall, 1808x 
Run:  [2/36] find_string [Device=0 api=contains row_width=64 num_rows=260000 hit_rate=20]
Pass: Cold: 0.549474ms GPU, 0.553811ms CPU, 0.98s total GPU, 1.02s total wall, 1792x 
Run:  [3/36] find_string [Device=0 api=contains row_width=128 num_rows=260000 hit_rate=20]
/cudf/cpp/build/_deps/nvbench-src/nvbench/blocking_kernel.cu:113: Cuda API call returned error: cudaErrorIllegalAddress: an illegal memory access was encountered

@res-life
Copy link
Contributor Author

/ok to test

@res-life
Copy link
Contributor Author

TODO: test perf again.

@res-life
Copy link
Contributor Author

/ok to test

@res-life
Copy link
Contributor Author

res-life commented Sep 18, 2024

Still have bugs, will fix ASAP.
[Done]

@res-life
Copy link
Contributor Author

/ok to test

@res-life
Copy link
Contributor Author

/ok to test

@res-life
Copy link
Contributor Author

res-life commented Sep 19, 2024

10 targets:

|      api       |  row_width  |  num_rows  |  hit_rate  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |           Diff |   %Diff |  Status  |
|----------------|-------------|------------|------------|------------|-------------|------------|-------------|----------------|---------|----------|
| multi-contains |     32      |   260000   |     20     |   1.218 ms |      65.51% | 268.035 us |      78.93% |    -950.452 us | -78.00% |   FAIL   |
| multi-contains |     64      |   260000   |     20     |   3.110 ms |      34.86% | 501.955 us |      55.99% |   -2608.098 us | -83.86% |   FAIL   |
| multi-contains |     128     |   260000   |     20     |   2.732 ms |      34.97% |   2.278 ms |      24.60% |    -454.906 us | -16.65% |   PASS   |
| multi-contains |     256     |   260000   |     20     |   3.858 ms |      30.00% |   2.935 ms |      19.55% |    -922.921 us | -23.92% |   FAIL   |
| multi-contains |     512     |   260000   |     20     |   6.166 ms |      23.28% |   4.234 ms |      18.92% |   -1931.388 us | -31.32% |   FAIL   |
| multi-contains |    1024     |   260000   |     20     |  10.958 ms |      17.84% |   7.056 ms |      16.60% |   -3902.223 us | -35.61% |   FAIL   |
| multi-contains |     32      |  1953000   |     20     |   6.061 ms |      23.65% |   1.503 ms |      25.71% |   -4558.334 us | -75.21% |   FAIL   |
| multi-contains |     64      |  1953000   |     20     |  19.317 ms |      12.36% |   3.178 ms |      18.26% |  -16139.825 us | -83.55% |   FAIL   |
| multi-contains |     128     |  1953000   |     20     |  19.104 ms |      12.10% |  17.400 ms |      12.08% |   -1704.042 us |  -8.92% |   PASS   |
| multi-contains |     256     |  1953000   |     20     |  28.448 ms |      12.48% |  22.440 ms |      10.52% |   -6008.533 us | -21.12% |   FAIL   |
| multi-contains |     512     |  1953000   |     20     |  46.219 ms |       9.19% |  32.867 ms |      10.45% |  -13351.815 us | -28.89% |   FAIL   |
| multi-contains |    1024     |  1953000   |     20     |  82.983 ms |       6.03% |  54.032 ms |       8.60% |  -28950.742 us | -34.89% |   FAIL   |
| multi-contains |     32      |  16777216  |     20     |  51.669 ms |       9.30% |  14.716 ms |      18.01% |  -36952.819 us | -71.52% |   FAIL   |
| multi-contains |     64      |  16777216  |     20     | 173.188 ms |       3.82% |  32.558 ms |      12.99% | -140629.639 us | -81.20% |   FAIL   |
| multi-contains |     32      |   260000   |     80     |   1.172 ms |      57.81% | 289.641 us |     125.07% |    -882.034 us | -75.28% |   FAIL   |
| multi-contains |     64      |   260000   |     80     |   2.927 ms |      35.41% | 561.831 us |      93.10% |   -2365.275 us | -80.81% |   FAIL   |
| multi-contains |     128     |   260000   |     80     |   2.928 ms |      32.18% |   2.266 ms |      46.43% |    -662.390 us | -22.62% |   PASS   |
| multi-contains |     256     |   260000   |     80     |   4.074 ms |      25.41% |   3.188 ms |      35.26% |    -886.029 us | -21.75% |   PASS   |
| multi-contains |     512     |   260000   |     80     |   6.402 ms |      19.86% |   4.671 ms |      30.59% |   -1731.671 us | -27.05% |   FAIL   |
| multi-contains |    1024     |   260000   |     80     |  11.119 ms |      14.54% |   8.237 ms |      24.30% |   -2881.433 us | -25.92% |   FAIL   |
| multi-contains |     32      |  1953000   |     80     |   6.361 ms |      23.97% |   1.588 ms |      56.19% |   -4773.543 us | -75.04% |   FAIL   |
| multi-contains |     64      |  1953000   |     80     |  18.283 ms |      12.31% |   3.546 ms |      37.89% |  -14736.689 us | -80.60% |   FAIL   |
| multi-contains |     128     |  1953000   |     80     |  20.067 ms |      10.31% |  18.718 ms |      18.45% |   -1349.610 us |  -6.73% |   PASS   |
| multi-contains |     256     |  1953000   |     80     |  29.301 ms |       9.70% |  24.991 ms |      14.36% |   -4310.135 us | -14.71% |   FAIL   |
| multi-contains |     512     |  1953000   |     80     |  50.853 ms |      16.78% |  34.222 ms |      16.97% |  -16630.364 us | -32.70% |   FAIL   |
| multi-contains |    1024     |  1953000   |     80     |  86.514 ms |      11.20% |  53.475 ms |       9.21% |  -33038.528 us | -38.19% |   FAIL   |
| multi-contains |     32      |  16777216  |     80     |  53.945 ms |      15.76% |  12.412 ms |      14.38% |  -41532.866 us | -76.99% |   FAIL   |
| multi-contains |     64      |  16777216  |     80     | 157.562 ms |       5.01% |  26.559 ms |      12.01% | -131002.626 us | -83.14% |   FAIL   |

17 targets, will trigger splitting targets into groups for long strings.

|      api       |  row_width  |  num_rows  |  hit_rate  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |           Diff |   %Diff |  Status  |
|----------------|-------------|------------|------------|------------|-------------|------------|-------------|----------------|---------|----------|
| multi-contains |     32      |   260000   |     20     |   1.897 ms |      45.79% | 379.193 us |      80.83% |   -1517.685 us | -80.01% |   FAIL   |
| multi-contains |     64      |   260000   |     20     |   5.116 ms |      26.24% | 670.106 us |      54.93% |   -4445.432 us | -86.90% |   FAIL   |
| multi-contains |     128     |   260000   |     20     |   4.602 ms |      28.92% |   4.188 ms |      26.63% |    -414.147 us |  -9.00% |   PASS   |
| multi-contains |     256     |   260000   |     20     |   6.583 ms |      25.17% |   5.115 ms |      24.46% |   -1468.011 us | -22.30% |   PASS   |
| multi-contains |     512     |   260000   |     20     |  10.581 ms |      18.20% |   7.341 ms |      22.93% |   -3239.323 us | -30.62% |   FAIL   |
| multi-contains |    1024     |   260000   |     20     |  18.769 ms |      13.08% |  11.707 ms |      15.58% |   -7062.500 us | -37.63% |   FAIL   |
| multi-contains |     32      |  1953000   |     20     |  10.407 ms |      18.16% |   2.102 ms |      33.86% |   -8304.688 us | -79.80% |   FAIL   |
| multi-contains |     64      |  1953000   |     20     |  32.780 ms |       7.96% |   4.344 ms |      23.12% |  -28436.560 us | -86.75% |   FAIL   |
| multi-contains |     128     |  1953000   |     20     |  32.429 ms |       8.71% |  30.392 ms |      11.18% |   -2036.997 us |  -6.28% |   PASS   |
| multi-contains |     256     |  1953000   |     20     |  47.826 ms |       7.99% |  38.908 ms |      10.68% |   -8917.890 us | -18.65% |   FAIL   |
| multi-contains |     512     |  1953000   |     20     |  78.474 ms |       5.52% |  55.672 ms |       8.11% |  -22801.487 us | -29.06% |   FAIL   |
| multi-contains |    1024     |  1953000   |     20     | 147.366 ms |       9.68% |  90.115 ms |       5.53% |  -57250.488 us | -38.85% |   FAIL   |
| multi-contains |     32      |  16777216  |     20     |  87.389 ms |       6.52% |  18.063 ms |      12.45% |  -69325.973 us | -79.33% |   FAIL   |
| multi-contains |     64      |  16777216  |     20     | 295.148 ms |       3.85% |  37.715 ms |       9.51% | -257432.763 us | -87.22% |   FAIL   |
| multi-contains |     32      |   260000   |     80     |   1.897 ms |      45.14% | 332.214 us |     100.03% |   -1564.822 us | -82.49% |   FAIL   |
| multi-contains |     64      |   260000   |     80     |   4.984 ms |      35.32% | 656.064 us |      62.40% |   -4328.034 us | -86.84% |   FAIL   |
| multi-contains |     128     |   260000   |     80     |   5.002 ms |      32.97% |   3.685 ms |      27.45% |   -1316.759 us | -26.32% |   PASS   |
| multi-contains |     256     |   260000   |     80     |   7.020 ms |      27.36% |   4.751 ms |      25.48% |   -2268.563 us | -32.32% |   FAIL   |
| multi-contains |     512     |   260000   |     80     |  11.013 ms |      23.13% |   6.907 ms |      22.24% |   -4105.964 us | -37.28% |   FAIL   |
| multi-contains |    1024     |   260000   |     80     |  19.126 ms |      16.63% |  11.269 ms |      17.34% |   -7857.560 us | -41.08% |   FAIL   |
| multi-contains |     32      |  1953000   |     80     |  10.463 ms |      21.13% |   1.944 ms |      33.11% |   -8518.859 us | -81.42% |   FAIL   |
| multi-contains |     64      |  1953000   |     80     |  31.039 ms |      12.54% |   3.975 ms |      22.84% |  -27063.687 us | -87.19% |   FAIL   |
| multi-contains |     128     |  1953000   |     80     |  34.141 ms |      10.37% |  26.625 ms |      10.75% |   -7515.375 us | -22.01% |   FAIL   |
| multi-contains |     256     |  1953000   |     80     |  49.650 ms |       8.39% |  35.039 ms |      10.16% |  -14611.786 us | -29.43% |   FAIL   |
| multi-contains |     512     |  1953000   |     80     |  79.993 ms |       5.78% |  51.117 ms |       7.35% |  -28875.400 us | -36.10% |   FAIL   |
| multi-contains |    1024     |  1953000   |     80     | 141.384 ms |       2.82% |  84.648 ms |       5.97% |  -56736.002 us | -40.13% |   FAIL   |
| multi-contains |     32      |  16777216  |     80     |  84.886 ms |       5.75% |  16.247 ms |      13.63% |  -68639.457 us | -80.86% |   FAIL   |
| multi-contains |     64      |  16777216  |     80     | 262.663 ms |       2.30% |  33.554 ms |       9.06% | -229109.558 us | -87.23% |   FAIL   |

@res-life res-life changed the base branch from branch-24.10 to branch-24.12 October 9, 2024 02:10
@vyasr vyasr removed the cuDF (Java) label Oct 10, 2024
@res-life
Copy link
Contributor Author

Replaced with #16900, so close this.

@res-life res-life closed this Oct 22, 2024
rapids-bot bot pushed a commit that referenced this pull request Nov 12, 2024
Add new `cudf::strings::contains_multiple` API to search multiple targets within a strings column.
Output is a table where the number of columns is the number of targets and each row is a boolean indicating that target was found at the row or not.
This PR is to help in collaboration with #16641

Authors:
  - David Wendt (https://github.com/davidwendt)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Chong Gao (https://github.com/res-life)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Chong Gao (https://github.com/res-life)
  - Yunsong Wang (https://github.com/PointKernel)
  - MithunR (https://github.com/mythrocks)
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Bradley Dice (https://github.com/bdice)

URL: #16900
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants