[RELEASE] cudf v24.12 #17406

Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #16994

Add empty string column condition for write_json bypass make_strings_children for empty column because when grid size is zero, it throws cuda error. Authors: - Karthikeyan (https://github.com/karthikeyann) - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - David Wendt (https://github.com/davidwendt) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #16995

This PR removes an unused unused import in cudf which was causing errors in doc builds. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17005

This PR adds two new jobs to the project automations. One to extract the version number from the branch name, and one to set the project `Release` field to the version found. Authors: - Ben Jarmak (https://github.com/jarmak-nv) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17001

With this set of changes I get a clean run of clang-tidy (with one caveat that I'll explain in the follow-up PR to add clang-tidy to pre-commit/CI). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) - David Wendt (https://github.com/davidwendt) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #16956

Closes #16735 Authors: - https://github.com/brandon-b-miller - Lawrence Mitchell (https://github.com/wence-) Approvers: - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) URL: #16776

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17006

Everything in the expression evaluation now operates on columns without names. DataFrame construction takes either a mapping from string-valued names to columns, or a sequence of pairs of names and columns. This removes some duplicate code in the NamedColumn class (by removing it) where we had to fight the inheritance hierarchy. - Closes #16272 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Murray (https://github.com/Matt711) URL: #16962

We use the pairwise approach of Chan, Golub, and LeVeque (1983). - Closes #16444 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) - Robert (Bobby) Evans (https://github.com/revans2) URL: #16448

The cudf tests already treat tests that are expected to fail but pass as errors, but at the time we introduced that change, we didn't do the same for the other packages. Do that now, it turns out there are only a few xpassing tests. While here, it turns out that having multiple different pytest configuration files does not work. `pytest.ini` takes precedence over other options, and it's "first file wins". Consequently, the merge of #16851 turned off `xfail_strict = true` (and other options) for many of the subpackages. To fix this, migrate all pytest configuration into the appropriate section of the `pyproject.toml` files, so that all tool configuration lives in the same place. We also add a section in the developer guide to document this choice. - Closes #12391 - Closes #16974 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - James Lamb (https://github.com/jameslamb) - Matthew Roeschke (https://github.com/mroeschke) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #16977

As part of JSON validation, field, value and string tokens are validated. Right now the code has single transform_inclusive_scan. Since this transform functor is a heavy operation, it slows down the entire scan drastically. This PR splits transform and scan in validation. The runtime of validation went from 200ms to 20ms. Also, a few hardcoded string comparisons are moved to trie. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) - Robert (Bobby) Evans (https://github.com/revans2) URL: #16996

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17007

Contributes to rapidsai/build-planning#106 Proposes specifying the RAPIDS version in `conda install` calls in CI that install CI artifacts, to reduce the risk of CI jobs picking up artifacts from other releases. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17013

Contributes to #15162 Also I believe the cpp docstrings were incorrect, but could use a second look. Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - https://github.com/brandon-b-miller - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #17003

We have recently observed a number of seg faults in our Python tests. From some investigation, the error comes from the import of pyarrow loading the bundled libarrow.so, and in particular when that library runs a jemalloc function `background_thread_entry`. We have observed similar (but not identical) errors in the past that have to do with as-yet unsolved problems in the way that arrow handles multi-threaded environments. The error is currently only observed on arm runners and with pyarrow 17.0.0. In my tests the error is highly sensitive to everything from import order to unrelated code segments, suggesting a race condition, some form of memory corruption, or perhaps symbol resolution errors at runtime. As a result, I have had limited success in drilling down further into specific causes, especially since attempts to rebuild libarrow.so also squash the error and I therefore cannot use debug symbols. From some offline discussion we decided that avoiding the problematic version is a sufficient fix for now. Due to the sensitivity, I am simply skipping 17.0.0 in this PR. I suspect that future builds of pyarrow will also usually not exhibit this bug (although it may recur occasionally on specific versions of pyarrow). Therefore, rather than lowering the upper bound I would prefer to allow us to float and see if and when this problem reappears. Since our DFG+RBB combination for wheel builds does not yet support any matrix entry other than `cuda`, I'm using environment markers to specify the constraint rather than a matrix entry in dependencies.yaml. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17018

…nd` (#16485) Refactors `histogram` reduce and groupby aggregations using `cuco::static_set::insert_and_find`. Speed improvement results [here](#16485 (comment)) and [here](#16485 (comment)). Authors: - Srinivas Yadav (https://github.com/srinivasyadav18) - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: #16485

…17026) the same issue as NVIDIA/spark-rapids-jni#2475 due to rapidsai/kvikio#464 Port the fix from NVIDIA/spark-rapids-jni#2476, verified locally Authors: - Peixin (https://github.com/pxLi) Approvers: - Nghia Truong (https://github.com/ttnghia) URL: #17026

Forward-merge branch-24.10 into branch-24.12

cuda::std::optional shouldn't be used for host types such as `std::vector` as it requires the constructors of the `T` types to be host+device. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) - MithunR (https://github.com/mythrocks) - Nghia Truong (https://github.com/ttnghia) URL: #17015

When instantiating a `cudf.pandas` proxy array, a DtoH transfer occurs so that the data buffer is set correctly. We do this because functions which utilize NumPy's C API can utilize the data buffer directly instead of going through `__array__`. This PR documents this limitation. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) URL: #16955

…17020) One of the `host_span` constructors was not updated when we added `is_device_accessible`, so the value was not assigned. This PR fixes this simple error and adds tests that checks that this property is correctly set when creating `host_span`s. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17020

Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - https://github.com/brandon-b-miller URL: #16990

This PR updates all the RMM imports to use pylibrmm/librmm now that `rmm._lib` is deprecated . It should be merged after [rmm/1676](rapidsai/rmm#1676). Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Charles Blackmon-Luca (https://github.com/charlesbluca) URL: #16913

Fixes the libcudf regex parsing logic when handling nested fixed quantifiers. The logic handles fixed quantifiers by simple repeating the previous instruction. If the previous item is a group (capture or non-capture) that group may also contain an internal fixed quantifier as well. Found while working on #16730 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #16798

Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #16997

Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - James Lamb (https://github.com/jameslamb) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17025

…oint (#17048) Contributes to #15162 I don't think there are any types in this file that needs to be exposed on the Python side; they're just used internally in pylibcudf. Also moves this to `libcudf/fixed_point` matching the libcudf location more closely Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17048

…7010) Contributes to #15162 ~I just assumed since the associated libcudf files just publicly expose C types, we just want to match the name spacing when importing from pylibcudf (avoid importing from `pylibcudf.libcudf`) and not necessary expose a Python equivalent?~ ~Let me know if I am misunderstanding how to expose these types.~ #17010 (comment) Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17010

Follow-up to #17013 Changes relative to that PR: * switches to pinning CI conda installs to the output of `rapids-version` (`{major}.{minor}.{patch}`) instead of `rapids-version-major-minor` (`{major}.{minor}`), to get a bit more protection in the presence of hotfix releases * restores some exporting of variables needed for docs builds I made some mistakes in #17013 (comment). Missed that this project's Doxygen setup is expecting to find `RAPIDS_VERSION` and `RAPIDS_VERSION_MAJOR_MINOR` defined in the environment. https://github.com/rapidsai/cudf/blob/7173b52fce25937bb69e22a083a5de4655078fa1/cpp/doxygen/Doxyfile#L41 https://github.com/rapidsai/cudf/blob/7173b52fce25937bb69e22a083a5de4655078fa1/cpp/doxygen/Doxyfile#L2229 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #17042

@wence-

Adding python bindings to [`cudf::pack()`](https://docs.rapids.ai/api/libcudf/legacy/group__copy__split#ga86716e7ec841541deb6edc7e91fcb9e4), [`cudf::unpack()`](https://docs.rapids.ai/api/libcudf/legacy/group__copy__split#ga1d62a18c2e6f087a92289c63693762cc), and [`cudf::packed_columns`](https://docs.rapids.ai/api/libcudf/legacy/structcudf_1_1packed__columns). This is the first step to support serialization of cudf.polars' IR. cc. @wence- @rjzamora # Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Lawrence Mitchell (https://github.com/wence-) URL: #17012

This PR replaces the deprecated cuco APIs with the new ones, ensuring the code is up to date with the latest API changes. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Nghia Truong (https://github.com/ttnghia) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17052

This PR removes unused hash detail implementations. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17056

…block scans (#16830) This is a collection of a few small optimizations and tweaks for the parquet reader fixed-width mukernels (flat & nested, lists not implemented yet). The benchmark changes are negligible, this is mainly cleanup and code in preparation for the upcoming list mukernel. 1) If not reading the whole page (chunked reads) exit sooner 2) By having each thread keep track of the current valid_count (and not saving-to or reading-from the nesting_info until the end), we don't need to synchronize the block threads as frequently, so these extra syncs are removed. 3) For (non-list) nested columns that aren't nullable, we don't need to loop over the whole nesting depth; only the last level of nesting is used. After removing this loop, the non-nullable code for nested and flat hierarchies is identical, so they're extracted and consolidated into a new function. 4) When doing block scans in the parquet reader we also need to know the per-warp results of the scan. Because cub doesn't return those, we then do an additional warp-wide ballot that is unnecessary. This introduces code that does a block scan manually, saving the intermediate results. However using this code in the flat & nested kernels uses 8 more registers, so it isn't used yet. 5) By doing an exclusive-scan instead of an inclusive-scan, we don't need the extra "- 1's" that were everywhere. Authors: - Paul Mattione (https://github.com/pmattione-nvidia) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - https://github.com/nvdbaranec URL: #16830

…fka.py (#17041) This PR corrects a typo in the `python/custreamz/README.md` file by changing the uppercase `'CSV'` to lowercase `'csv'`. This change aligns the documentation with the `message_format` options defined in `python/custreamz/custreamz/kafka.py`, ensuring consistency across the codebase. Authors: - Hirota Akio (https://github.com/a-hirota) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Murray (https://github.com/Matt711) URL: #17041

This PR seeks to break up `expr.py` into a less unwieldy monolith. Authors: - https://github.com/brandon-b-miller Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Murray (https://github.com/Matt711) URL: #17014

Part of splitting the original bulk shared memory groupby PR #16619. This PR separates `flatten_single_pass_aggs` into its own translation unit without making any code modifications. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Shruti Shivakumar (https://github.com/shrshi) - Karthikeyan (https://github.com/karthikeyann) - David Wendt (https://github.com/davidwendt) URL: #17053

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17021

…16828) Closes #16717. This PR adds a new example to read multiple parquet files using multiple threads. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) - Basit Ayantunde (https://github.com/lamarrr) URL: #16828

…map` (#17049) Part of #12261. This PR refactors ORC writer's dictionary encoding to migrate from `cuco::legacy::static_map` to the new `cuco::static_map`. No performance impact measured. Results [here](#17049 (comment)). Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Vukasin Milovanovic (https://github.com/vuule) URL: #17049

This merge request follows up on #16658. It removes the dependency on GTest by cudftestutil. It satisfies the requirement that we only need API compatibility with the GTest API and we don't expose the GTest symbols to our consumers nor ship any binary artifact. The source files defining the symbols are late-binded to the resulting executable (via library INTERFACE sources). The user has to link to manually link the GTest and GMock libraries to the final executable as illustrated below. Closes #16658 ### Usage CMakeLists.txt: ```cmake add_executable(test1 test1.cpp) target_link_libraries(test1 PRIVATE GTest::gtest GTest::gmock GTest::gtest_main cudf::cudftestutil cudf::cudftestutil_impl) ``` Authors: - Basit Ayantunde (https://github.com/lamarrr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Robert Maynard (https://github.com/robertmaynard) - David Wendt (https://github.com/davidwendt) - Mike Sarahan (https://github.com/msarahan) URL: #16839

This will make sure that profilers are available by default for everyone using our devcontainers. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - James Lamb (https://github.com/jameslamb) URL: #17066

…evice buffers are not ready (#17074) This fixes a bug in ORC reader when `device_read_async` is called while the destination device buffers are not ready to write in. In detail, this bug is because `device_read_async` does not use the user-provided stream but its own generated stream for data copying. As such, the copying ops could happen before the destination device buffers are being allocated, causing data corruption. This bug only shows up in certain conditions, and also hard to reproduce. It occurs when copying buffers with small sizes (below `gds_threshold`) and most likely to show up with setting `rmm_mode=async`. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - David Wendt (https://github.com/davidwendt) URL: #17074

This PR adds clang-tidy checks to our CI. clang-tidy will be run in nightly CI via CMake. For now, only the parts of the code base that were already made compliant in the PRs leading up to this have been enabled, namely cudf source and test cpp files. Over time we can add more files like benchmarks and examples, add or subtract more rules, or enable linting of cu files (see https://gitlab.kitware.com/cmake/cmake/-/issues/25399). This PR is intended to be the starting point enabling systematic linting, at which point everything else should be significantly easier. Resolves #584 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) URL: #16958

This work is part of splitting the original bulk shared memory groupby PR #16619. This PR renames the file originally titled `multi_pass_kernels.cuh`, which contains the `var_hash_functor`, to `var_hash_functor.cuh`. It also includes cleanups such as utilizing `cuda::std::` utilities in device code and removing redundant template parameters. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - David Wendt (https://github.com/davidwendt) URL: #17034

…n `INT_MAX` bytes (#17057) Addresses #17017 Libcudf does not support parsing regular JSON inputs of size greater than `INT_MAX` bytes. Note that the batched reader can only be used for JSON lines inputs. Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) URL: #17057

Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) - https://github.com/brandon-b-miller Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/brandon-b-miller - Matthew Murray (https://github.com/Matt711) URL: #16991

Fixes regex logic handling of a pattern with a fixed quantifier that includes a zero-range. Added new gtests for this specific case. Bug was introduced in #16798 Closes #17065 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Vyas Ramasubramani (https://github.com/vyasr) - MithunR (https://github.com/mythrocks) - Basit Ayantunde (https://github.com/lamarrr) URL: #17067

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - https://github.com/brandon-b-miller URL: #17070

This PR removes a lot of unnecessary `std::move`'s from pylibcudf. These were necessary with older versions of Cython, but newer versions appear to generate the correct C++ without needing the extra hints. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #16983

It is unclear whether the performance gains here are entirely from huge pages themselves or whether invoking madvise with huge pages is primarily serving to trigger an eager population of the pages (huge or not). We attempted to provide alternate flags to `madvise` like `MADV_WILLNEED` and that was not sufficient to recover performance, so either huge pages themselves are doing something special or specifying huge pages is causing `madvise` to trigger a page migration that no other flag does. In any case, this change returns us to the performance before the switch to the C data interface, and this code is lifted straight out of our old implementation so I am comfortable making use of it and knowing that it is not problematic. We should explore further optimizations in this direction, though. Resolves #17075. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Mark Harris (https://github.com/harrism) URL: #17097

Resolves #8795. Also needed for #16998. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17102

… with `__init__` (#17044) Updated the `poll` method in `kafka.py` to use `self.ck_consumer.poll(timeout)` instead of `self.ck.poll(timeout)`. This change ensures consistency with the `__init__` method where `self.ck_consumer` is initialized. Authors: - Hirota Akio (https://github.com/a-hirota) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17044

This implements batch construction of strings columns, allowing to create a large number of strings columns at once with minimal overhead of kernel launch and stream synchronization. There should be only one stream sync in the entire column construction process. Benchmark: #17035 (comment) Closes #16486. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - David Wendt (https://github.com/davidwendt) - Yunsong Wang (https://github.com/PointKernel) URL: #17035

Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) - Matthew Murray (https://github.com/Matt711) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Matthew Murray (https://github.com/Matt711) URL: #16790

Fixes #17045 This PR removes randomness in our pytests and switches from using `np.random.seed` to `np.random.default_rng` in all of the codebase. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Jake Awe (https://github.com/AyodeAwe) - Lawrence Mitchell (https://github.com/wence-) - Benjamin Zaitlen (https://github.com/quasiben) URL: #17008

This PR adds conda recipes for `cudf-polars`. This is needed to get `cudf-polars` into RAPIDS Docker containers and the `rapids` metapackage. Closes #16816. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Matthew Murray (https://github.com/Matt711) - James Lamb (https://github.com/jameslamb) - Lawrence Mitchell (https://github.com/wence-) URL: #17037

Fixes: #17111 This PR fixes `DataFrame._from_arrays` to properly access `ndim` attribute and also corrects two validations in `Series` & `DataFrame` constructors. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17112

…ther container/span types (#17079) Discovered that the way `host_span`s are created from `hostdevice_vector`, `hostdevice_span`, `hostdevice_2dvector` and `host_2dspan` (yes, these are all real types!) does not propagate the `is_device_accesible` flag. In most of the cases these spans use pinned memory, so we're incorrect most of the time. This PR fixed the way these conversions work. Adjusted some APIs to make it a bit harder to avoid passing the `is_device_accesible` flag. Removed a few unused functions in `span.hpp` to keep the file as light as possible (it's included EVERYWHERE). Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Shruti Shivakumar (https://github.com/shrshi) URL: #17079

…ormance improvement on Grace Hopper (#17092) On Grace Hopper, file I/O takes a special path that calls `cudaHostRegister` to circumvent a performance issue. Recent benchmark shows that this workaround is no longer necessary . This PR is making a clean-up. Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - David Wendt (https://github.com/davidwendt) URL: #17092

… PQ reader to 1B (#17059) This PR limits the number of keys to use at a time to calculate column `sizes` and `page_start_values` to 1B averting possible OOM and UB from implicit typecasting of `size_t` iterator to `size_type` iterators in `thrust::reduce_by_key`. Closes #16985 Closes #17086 ## Resolved - [x] Add tests - [x] Debug with fingerprinting structs table for a possible bug in PQ writer (nothing seems wrong with the writer as pyarrow is able to read the written parquet files). Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Bradley Dice (https://github.com/bdice) - Vukasin Milovanovic (https://github.com/vuule) - Yunsong Wang (https://github.com/PointKernel) URL: #17059

Apart of #15162. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17072

This work is part of splitting the original bulk shared memory groupby PR #16619. It introduces two device-side element aggregators: - `shmem_element_aggregator`: aggregates data from global memory sources to shared memory targets, - `gmem_element_aggregator`: aggregates from shared memory sources to global memory targets. These two aggregators are similar to the `elementwise_aggregator` functionality. Follow-up work is tracked via #17032. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - David Wendt (https://github.com/davidwendt) URL: #17031

…onment variable (#17004) Adds an environment variable, `LIBCUDF_MMAP_ENABLED`, to control whether we memory map the input file in the data source. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Tianyu Liu (https://github.com/kingcrimsontianyu) URL: #17004

…et to GDS or ALWAYS (#17122) When `LIBCUDF_CUFILE_POLICY` is set to `GDS` or `ALWAYS`, cuDF uses an internal implementation to call the cuFile API and harness the GDS feature. Recent tests with these two settings were unsuccessful due to program crash. Specifically, for the `PARQUET_READER_NVBENCH`'s `parquet_read_io_compression` benchmark: - GDS write randomly crashed with segmentation fault (SIGSEGV). - GDS read randomly crashed with bus error (SIGBUS). - At the time of crash, stack frame is randomly corrupted. The root cause is the use of dangling reference, which occurs when a variable is captured by reference by nested lambdas. This PR performs a hotfix that turns out to be a 1-char change. Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #17122

) Errors reported here: https://github.com/rapidsai/cudf/actions/runs/11398977412/job/31716929242 Just adding `[[nodiscard]]` to a few member functions. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Shruti Shivakumar (https://github.com/shrshi) URL: #17124

This PR disables Parquet reader's wide lists table gtest by default as it takes several minutes to complete with memcheck. See the discussion on PR #17059 (this [comment](#17059 (comment))) for more context. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17120

The legacy Dask cuDF implementation uses a custom code path for GroupBy aggregations. However, when query-planning is enabled (the default), we use the same algorithm as the pandas backend. This PR ports the custom "fused aggregation" code path over to the dask-expr version of Dask cuDF. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #17009

Depends on #16945 Added `cudf::detail::device_scalar`, derived from `rmm::device_scalar`. The new class overrides function members that perform copies between host and device. New implementation uses a `cudf::detail::host_vector` as a bounce buffer to avoid performing a pageable copy. Replaced `rmm::device_scalar` with `cudf::detail::device_scalar` across libcudf. Authors: - Vukasin Milovanovic (https://github.com/vuule) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Basit Ayantunde (https://github.com/lamarrr) - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) URL: #16947

Fixes #17129 Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) - Alessandro Bellina (https://github.com/abellina) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17130

…7019) This PR replaced old tree algorithm in JSON reader, with experimental algorithms and removed the experimental namespace. Changes are old tree algorithm code removal, experimental namespace removal, code of `scatter_offsets` moved, always call new tree algorithm. No functional change is made in this PR. All unit tests should pass with this change. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Shruti Shivakumar (https://github.com/shrshi) - Vukasin Milovanovic (https://github.com/vuule) URL: #17019

…time (#17089) This work is part of splitting the original bulk shared memory groupby PR #16619. This PR splits the hash-based groupby file into multiple translation units and uses explicit template instantiations to help reduce build time. It also includes some minor cleanups without significant functional changes. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17089

This PR ignores loud dask warnings about legacy dask dataframe implementation is going to be soon removed: dask/dask#11437 Note: We only see this error for `DASK_DATAFRAME__QUERY_PLANNING=False` cases, `DASK_DATAFRAME__QUERY_PLANNING=True` are passing fine. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) - Peter Andreas Entschev (https://github.com/pentschev) - Richard (Rick) Zamora (https://github.com/rjzamora) URL: #17137

…nting_transform_iterator` fits in `size_type` (#17118) This PR adds a compile time check to enforce that the `start` argument to `cudf::detail::counting_transform_iterator`, which is used to determine the type of `counting_iterator`, is of a type that fits in `int32_t` (aka `size_type`). The PR also modifies the instances of `counting_transform_iterator` that need to work with `counting_iterators` of type > `int32_t` to manually created `counting_transform_iterators` using thrust. More context in this [comment](https://github.com/rapidsai/cudf/pull/17059/files#r1803925659). Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - David Wendt (https://github.com/davidwendt) - Yunsong Wang (https://github.com/PointKernel) - Vukasin Milovanovic (https://github.com/vuule) - Tianyu Liu (https://github.com/kingcrimsontianyu) URL: #17118

As part of in-progress multi-GPU work, we will likely want to: 1. Introduce additional nodes into the `IR` namespace; 2. Implement rewrite rules for `IR` trees to express needed communication patterns; 3. Write visitors that translate expressions into an appropriate description for whichever multi-GPU approach we end up taking. It was already straightforward to write generic visitors for `Expr` nodes, since those uniformly have a `.children` property for their dependents. In contrast, the `IR` nodes were more ad-hoc. To solve this, pull out the generic implementation from `Expr` into an abstract `Node` class. Now `Expr` nodes just inherit from this, and `IR` nodes do so similarly. Redoing the `IR` nodes is a little painful because we want to make them hashable, so we have to provide a bunch of custom `get_hashable` implementations (the schema dict, for example, is not hashable). With these generic facilities in place, we can now implement traversal and visitor infrastructure. Specifically, we provide: - a mechanism for pre-order traversal of an expression DAG, yielding each unique node exactly once. This is useful if one wants to know if an expression contains some particular node; - a mechanism for writing recursive visitors and then wrapping a caching scheme around the outside. This is useful for rewrites. Some example usages are shown in tests. Authors: - Lawrence Mitchell (https://github.com/wence-) - Richard (Rick) Zamora (https://github.com/rjzamora) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Murray (https://github.com/Matt711) - Richard (Rick) Zamora (https://github.com/rjzamora) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17016

Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Murray (https://github.com/Matt711) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17023

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Mark Harris (https://github.com/harrism) URL: #17084

Recent benchmarks have shown that setting the environment variable `KVIKIO_NTHREADS=8` in cuDF usually leads to optimal I/O performance. This PR internally sets the default KvikIO thread pool size to 8. The env `KVIKIO_NTHREADS` will still be honored if users explicitly set it. Fixes #16718 Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Vukasin Milovanovic (https://github.com/vuule) URL: #17126

The full push-down automata that tokenizes the input JSON string, as well as the bracket-brace FST over-estimates the total buffer size required for the translated output and indices. This PR splits the `transduce` calls for both FSTs into two invocations. The first invocation estimates the size of the translated buffer and the translated indices, and the second call performs the DFA run. Authors: - Shruti Shivakumar (https://github.com/shrshi) - Karthikeyan (https://github.com/karthikeyann) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Basit Ayantunde (https://github.com/lamarrr) URL: #16978

Follow-up to #16684 There is currently a bug in `dask_cudf.read_parquet(..., filesystem="arrow")` when the files are larger than the `"dataframe.parquet.minimum-partition-size"` config. More specifically, when the files are not aggregated together, the output will be `pd.DataFrame` instead of `cudf.DataFrame`. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #17099

This PR introduces the necessary changes to the cuDF jni to support the issue described in [NVIDIA/spark-rapids#11554](NVIDIA/spark-rapids#11554). For further information, refer to the details in the [comment](NVIDIA/spark-rapids#11554 (comment)). Issue #15961 adds support for handling multiple line delimiters. This PR extends that functionality to JNI, which was previously missing, and also includes a test to validate the changes. Authors: - Suraj Aralihalli (https://github.com/SurajAralihalli) Approvers: - MithunR (https://github.com/mythrocks) - Robert (Bobby) Evans (https://github.com/revans2) URL: #17139

Closes #17117 Related to #12086 This PR replaces the synchronous execution policy with an asynchronous one to eliminate unnecessary synchronization. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - David Wendt (https://github.com/davidwendt) - Shruti Shivakumar (https://github.com/shrshi) - Jason Lowe (https://github.com/jlowe) - Nghia Truong (https://github.com/ttnghia) URL: #17146

…d to `cudf::io`) (#17132) Issue #15620 Replaced the calls to `cudaMemcpyAsync` with the new `cuda_memcpy`/`cuda_memcpy_async` utility, which optionally avoids using the copy engine. Changes are limited to cuIO to make the PR easier to review (repetitive enough as-is!). Also took the opportunity to use `cudf::detail::host_vector` and its factories to enable wider pinned memory use. Skipped a few instances of `cudaMemcpyAsync`; few are under `io::comp`, which we don't want to invest in further (if possible). The other `cudaMemcpyAsync` instances are D2D copies, which `cuda_memcpy`/`cuda_memcpy_async` don't support. Perhaps they should, just to make the use ubiquitous. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Paul Mattione (https://github.com/pmattione-nvidia) - Nghia Truong (https://github.com/ttnghia) URL: #17132

Fixes #16987 Use managed memory to generate the parquet data, and write parquet data to host buffer. Replace use of parquet_device_buffer with cuio_source_sink_pair Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - David Wendt (https://github.com/davidwendt) - Tianyu Liu (https://github.com/kingcrimsontianyu) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17039

…#17150) This is an improvement PR that uses the full name of `rmm.DeviceBuffer` in the sphinx config file. Its a follow-up to this [comment](#16913 (comment)). Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17150

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17085

Polars 1.11 is out, with slight updates to the IR, so we can correctly raise for dynamic groupbys and see inequality joins. These changes adapt to that and do a first pass at supporting inequality joins (by translating to cross + filter). A followup (#17000) will use libcudf's conditional joins. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) - Mike Sarahan (https://github.com/msarahan) URL: #17154

Removes unused variable that contains host copy of the group_offsets data. This host variable appears to have been made obsolete by a combination of #16897 and #16780 Found while working on #17149 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Nghia Truong (https://github.com/ttnghia) URL: #17151

… module (#17148) This PR moves `segmented_gather` out of the copying module and into the lists module. And it uses the pylibcudf `segmented_gather` implementation in cudf python. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #17148

) Fixes a missing stream sync when copying a temporary host vector to device. The host vector could be destroyed before the copy is completed. Updates the code to use vector factory function `make_device_uvector_sync()` instead of `cudaMemcpyAsync` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #17149

Deprecates the current nvtext minhash functions some of which will be replaced in #16756 with a different signature. The others will no longer be used and removed in future release. The existing gtests and benchmarks will be retained for rework in the future release as well. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #17152

Moves the `nvtext::generate_ngrams` and `nvtext::generate_character_ngrams` benchmarks from google-bench to nvbench. Target parameters are exposed to help with profiling. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #17173

This PR is replacing the `VAULT_HOST` variable with `AWS_ROLE_ARN`. This is required to use the new token service to get AWS credentials. Authors: - Jordan Jacobelli (https://github.com/jjacobelli) Approvers: - Bradley Dice (https://github.com/bdice) - Paul Taylor (https://github.com/trxcllnt) URL: #17134

since #15312 moved formatting from Black to Rufft, it would make sense also unify import formatting under the same ruff so use build-in `I` rule instead of additional `isort` Authors: - Jirka Borovec (https://github.com/Borda) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/jakirkham URL: #16685

Contributes to #15162 Could use some advice how to type the input of `from_dlpack` and outut of `to_dlpack` which are PyCapsule objects. EDIT: I notice Cython just types them as object https://github.com/cython/cython/blob/master/Cython/Includes/cpython/pycapsule.pxd. Stylistically do we want add `object var_name` or just leave untyped? Authors: - Matthew Roeschke (https://github.com/mroeschke) - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Murray (https://github.com/Matt711) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17055

…_binop (#17181) Changes `cudf::detail::inplace_bitmask_binop()` to use `make_device_uvector()` instead of `cudaMemcpyAsync()` Found while working on #17149 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #17181

This work is part of splitting the original bulk shared memory groupby PR #16619. This PR introduces the `compute_mapping_indices` API, which is used by the shared memory groupby. libcudf will opt for the shared memory code path when the aggregation request is compatible with shared memory, i.e. there is enough shared memory space and no dictionary aggregation requests. Aggregating with shared memory involves two steps. The first step, introduced in this PR, calculates the offset for each input key within the shared memory aggregation storage, as well as the offset when merging the shared memory results into global memory. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Mark Harris (https://github.com/harrism) - Nghia Truong (https://github.com/ttnghia) URL: #17147

Adds text to the contributing guide mentioning 2 cpp-codeowner approvals are required for any C++changes. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #17182

This removes a file for a feature that we intended to use, but never was. The other parts of that feature were already removed, but this was missed. Authors: - Robert (Bobby) Evans (https://github.com/revans2) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #17189

Contributes to rapidsai/build-planning#108 Contributes to rapidsai/build-planning#111 Proposes some small packaging/CI changes, matching similar changes being made across RAPIDS. * building `libcudf` wheels with `--no-build-isolation` (for better `sccache` hit rate) * printing `sccache` stats to CI logs * updating to the latest `rapids-dependency-file-generator` (v1.16.0) * always explicitly specifying `cpp` / `python` in calls to `rapids-upload-wheels-to-s3` # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17088

This merge request implements benchmarks to compare the strings AST and BINARY_OPs. It also moves out the common string input generator function to a common benchmarks header as it is repeated across other benchmarks. Authors: - Basit Ayantunde (https://github.com/lamarrr) Approvers: - David Wendt (https://github.com/davidwendt) - Yunsong Wang (https://github.com/PointKernel) URL: #17128

This PR cherry-picks out the suggestions from IWYU generated in #17078. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #17170

@staticmethod

This fixes a bug where `Column.from_column_view` is not verifying the existence of a string column's offsets child column prior to accessing it, resulting in a segmentation fault when passing a `column_view` from `Column.view()` to `Column.from_column_view(...)`. The issue can be reproduced with: ``` import cudf from cudf.core.column.column import as_column df = cudf.DataFrame({'a': cudf.Series([[]], dtype=cudf.core.dtypes.ListDtype('string'))}) s = df['a'] col = as_column(s) col2 = cudf._lib.column.Column.back_and_forth(col) print(col) print(col2) ``` where `back_and_forth` is defined as: ``` @staticmethod def back_and_forth(Column input_column): cdef column_view input_column_view = input_column.view() return Column.from_column_view(input_column_view, input_column) ``` I don't have the expertise to write the appropriate tests for this without introducing the `back_and_forth` function as an API, which seems undesirable. Authors: - Christopher Harris (https://github.com/cwharris) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17193

I think most PRs remain unassigned, so this PR auto assigns the PR to the PR author. I think this will help keep our project boards up-to-date. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #16969

With `decltype(&pclose) ` for the destructor type of the `unique_ptr`, gcc makes the signature inherit the attributes of `pclose`. The compiler then ignores this attribute as it doesn't apply within the context with a warning, and since we have `-Werror` on for ignored attributes, the build fails. This happens on gcc 13.2.0. Authors: - Basit Ayantunde (https://github.com/lamarrr) Approvers: - David Wendt (https://github.com/davidwendt) - Paul Mattione (https://github.com/pmattione-nvidia) - Shruti Shivakumar (https://github.com/shrshi) URL: #17188

In Spark, the `DecimalType` has a specific number of digits to represent the numbers. However, when creating a data Schema, only type and name of the column are stored, thus we lose that precision information. As such, it would be difficult to reconstruct the original decimal types from cudf's `Schema` instance. This PR adds a `precision` member variable to the `Schema` class in cudf Java, allowing it to store the precision number of the original decimal column. Partially contributes to NVIDIA/spark-rapids#11560. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) URL: #17176

This is the first patch in a series of patches that should make it so that all java host memory allocations go through the DefaultHostMemoryAllocator unless another allocator is explicitly provided. This is to make it simpler to track/control host memory usage. Authors: - Robert (Bobby) Evans (https://github.com/revans2) Approvers: - Jason Lowe (https://github.com/jlowe) - Alessandro Bellina (https://github.com/abellina) URL: #17197

This merge request unifies the parameter names of the AST and BINARYOP benchmark suites and makes it easier to perform parameter sweeps and compare the outputs of both benchmarks. Authors: - Basit Ayantunde (https://github.com/lamarrr) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #17200

…17203) The Auto Assign GHA workflow fails with this [error](https://github.com/rapidsai/cudf/actions/runs/11580081781). This PR fixes this error. xref #16969 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17203

This PR refactors fixed-width parquet list reader decoding into its own set of micro-kernels, templatizing the existing fixed-width microkernels. When skipping rows for lists, this will skip ahead the decoding of the definition, repetition, and dictionary rle_streams as well. The list kernel uses 128 threads per block and 71 registers per thread, so I've changed the launch_bounds to enforce a minimum of 8 blocks per SM. This causes a small register spill but the benchmarks are still faster, as seen below: DEVICE_BUFFER list benchmarks (decompress + decode, not bound by IO): run_length 1, cardinality 0, no byte_limit: 24.7% faster run_length 32, cardinality 1000, no byte_limit: 18.3% faster run_length 1, cardinality 0, 500kb byte_limit: 57% faster run_length 32, cardinality 1000, 500kb byte_limit: 53% faster Compressed list of ints on hard drive: 5.5% faster Sample real data on hard drive (many columns not lists): 0.5% faster Authors: - Paul Mattione (https://github.com/pmattione-nvidia) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - https://github.com/nvdbaranec - Nghia Truong (https://github.com/ttnghia) URL: #16538

This is the first pr of [a larger one](NVIDIA/spark-rapids-jni#2532) to introduce a new serialization format. It make `ai.rapids.cudf.HostMemoryBuffer#copyFromStream` public. For more background, see NVIDIA/spark-rapids-jni#2496 Authors: - Renjie Liu (https://github.com/liurenjie1024) - Jason Lowe (https://github.com/jlowe) Approvers: - Jason Lowe (https://github.com/jlowe) - Alessandro Bellina (https://github.com/abellina) URL: #17179

) Adds a section on `Empty Columns` to the libcudf DEVELOPER_GUIDE Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Basit Ayantunde (https://github.com/lamarrr) - Vukasin Milovanovic (https://github.com/vuule) URL: #17183

This updates cudf to use nvcomp 4.1.0.6. The version is updated in rapids-cmake in rapidsai/rapids-cmake#709. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) - Jake Awe (https://github.com/AyodeAwe) URL: #17201

Addresses #16999 Authors: - Shruti Shivakumar (https://github.com/shrshi) - Karthikeyan (https://github.com/karthikeyann) - Nghia Truong (https://github.com/ttnghia) Approvers: - Basit Ayantunde (https://github.com/lamarrr) - Nghia Truong (https://github.com/ttnghia) URL: #17098

…t filters (#17141) Previously, we always applied parquet filters by post-filtering. This negates much of the potential gain from having filters available at read time, namely discarding row groups. To fix this, implement, with the new visitor system of #17016, conversion to pylibcudf expressions. We must distinguish two types of expressions, ones that we can evaluate via `cudf::compute_column`, and the more restricted set of expressions that the parquet reader understands, this is handled by having a state that tracks the usage. The former style will be useful when we implement inequality joins. While here, extend the support in pylibcudf expressions to handle all supported literal types and expose `compute_column` so we can test the correctness of the broader (non-parquet) implementation. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17141

Closes #17177 When appending to a parquet dataset with Dask cuDF, the original metadata must be converted from `pq.FileMetaData` to `bytes` before it can be passed down to `cudf.io.merge_parquet_filemetadata`. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #17198

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - https://github.com/brandon-b-miller URL: #17143

This work is part of splitting the original bulk shared memory groupby PR #16619. This PR introduces the `compute_shared_memory_aggs` API, which is utilized by the shared memory groupby. The shared memory groupby process consists of two main steps. The first step was introduced in #17147, and this PR implements the second step, where the actual aggregations are performed based on the offsets from the first step. Each thread block is designed to handle up to 128 unique keys. If this limit is exceeded, there won't be enough space to store temporary aggregation results in shared memory, so a flag is set to indicate that follow-up global memory aggregations are needed to complete the remaining aggregation requests. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #17162

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Muhammad Haseeb (https://github.com/mhaseeb123) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17100

* Fixed/modified some title headers * Fixed/added pylibcudf section docstrings Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Murray (https://github.com/Matt711) URL: #17217

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17101

@vyasr

This PR creates `pylibcudf` hashing APIs and modifies the cuDF Cython to leverage them. cc @vyasr Authors: - https://github.com/brandon-b-miller Approvers: - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: #15418

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17096

…ts (#17202) Input strings column containing unsanitized nulls may result in undefined behavior. This PR fixes the input data to not include string characters in null rows in gtests for `REDUCTION_TESTS`. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #17202

Adds the `jaccard_index` API to the generated docs. Also noticed `minhash` is not present and so added here as well. Also removed duplicate `rsplit` entry from the `.rst` file Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17199

Moves the `cudf::strings::concatenate` benchmark source from google-bench to nvbench. This also removes the restrictions on the parameters to allows specifying arbitrary number of rows and string width. Reference #16948 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Mark Harris (https://github.com/harrism) - Nghia Truong (https://github.com/ttnghia) URL: #17211

…instance (#17214) When calling `Schema.Builder.build()`, the value `topLevelPrecision` should be passed into the constructor of the `Schema` class. However, it was forgotten. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) URL: #17214

Adds the `TokenizeVocabulary` class to the cuDF API guide. Also removes the `SubwordTokenizer` which is to be deprecated in the future. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17208

…7209) Moves the 'cudf/fixed_point/floating_conversion.hpp' to `cudf/fixed_point/detail/` subdirectory since it only contains declarations and definition in the `detail` namespace. It had previously been its own module. https://docs.rapids.ai/api/libcudf/stable/modules.html Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Shruti Shivakumar (https://github.com/shrshi) - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: #17209

Add stream parameter to public APIs: ``` cudf::partition cudf::round_robin_partition ``` Added stream gtest for above two functions and for `hash_partition`. Reference: #13744 Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: #17213

This PR follow up #17100 to address the last review here #17100 (review) Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17220

) closes #17187 Adds similar logic as implemented in pandas: https://github.com/pandas-dev/pandas/blob/main/pandas/core/groupby/groupby.py#L751-L758 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17216

closes #17087 For binops, cudf tries to convert a 0D numpy array to a numpy scalar via `.dtype.type(value)`, but `.dtype.type` requires other parameters if its a `numpy.datetime64` or `numpy.timedelta64`. Indexing via `[()]` will perform this conversion correctly. Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17226

No new updates are required, we must just no longer xfail a test if running with 1.12 Authors: - Lawrence Mitchell (https://github.com/wence-) - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17227

…CI (#17131) Follow-up to #16570 (comment) Proposes using the new `rapids-generate-pip-constraints` tool from `gha-tools` to generate a list of pip constraints pinning to the oldest supported verisons of dependencies here. ## Notes for Reviewers ### How I tested this rapidsai/gha-tools#114 (comment) You can also see one the most recent `wheel-tests-cudf` builds here: * oldest-deps: numpy 1.x ([build link](https://github.com/rapidsai/cudf/actions/runs/11615430314/job/32347576688?pr=17131)) * latest-deps: numpy 2.x ([build link](https://github.com/rapidsai/cudf/actions/runs/11615430314/job/32347577095?pr=17131)) # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17131

Contributes to #13744 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #16925

This PR makes small improvements for the I/O code. Specifically, - Place type constraint on a template class to allow only for rvalue argument. In addition, replace `std::move` with `std::forward` to make the code more *apparently* consistent with the convention, i.e. use `std::move()` on the rvalue references, and `std::forward` on the forwarding references (Effective modern C++ item 25). - Alleviate (but not completely resolve) an existing cuFile driver close issue by removing the explicit driver close call. See #17121 - Minor typo fix (`struct` → `class`). Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: #17105

… 4, and compatibility mode to ON (#17185) This PR adjusts the default KvikIO parameters in light of recent discussions. - Set KvikIO compatibility mode to ON (previously unspecified). This is to avoid the overhead of KvikIO validating the cuFile library when most of the time clients are not using cuFile/GDS. - Set KvikIO thread pool size to 4 (previous 8). See the reason below. In addition, this PR updates the documentation on `LIBCUDF_CUFILE_POLICY`. --- It is reported that Dask-cuDF on a 8-GPU node with Lustre file system has major performance regression when the KvikIO thread pool size is 8. |KVIKIO_NTHREADS| 8 | 4 | 2 | 1 | |----------------------------|---|----|---|----------| |Dask-cuDF time [s]| 16 | 3.9 | 4.0 | 4.3 | |cuDF time [s]| 3.4 | 3.4 | 3.8 | 4.9 | Additional benchmark on Grace Hopper ([Parquet](https://docs.google.com/spreadsheets/d/1ZxuFTcu67kMVpESHwT0Cr-CAeAP7YmLDrcHxNTt22aU), [CSV](https://docs.google.com/spreadsheets/d/1yFLO-cdxG6jjPwHMtoUbPGMXilRaglush2U6KdrEAvA)) indicates no performance regression by switching the thread pool size from 8 to 4. For the time being, we choose 4 as an empirical sweet spot. Closes #16512 Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: #17185

…17231) Added an axis that controls the number of times each thread reads its input. Running with a higher number of iterations should better show how work from different threads pipelines. The new axis, "num_iterations", is added to all multi-threaded Parquet reader benchmarks. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Paul Mattione (https://github.com/pmattione-nvidia) URL: #17231

Add stream parameter to public APIs: ``` nvtext::subword_tokenize nvtext::load_vocabulary_file ``` Added stream gtest. Reference: #13744 Authors: - Shruti Shivakumar (https://github.com/shrshi) - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17206

This is step 3 in a process of getting java host memory allocation to be plugable under a single allocation API. This is really only used for large memory allocations, which is what matters. This changes the most common java host memory allocation API to call into the plugable host memory allocation API. The reason that this had to be done in multiple steps is because the Spark Plugin code was calling into the common memory allocation API, and memory allocation would end up calling itself recursively. Step 1. Create a new API that will not be called recursively (#17197) Step 2. Have the Java plugin use that new API instead of the old one to avoid any recursive invocations (NVIDIA/spark-rapids#11671) Step 3. Update the common API to use the new backend (this) There are likely to be more steps after this that involve cleaning up and removing APIs that are no longer needed. This is marked as breaking even though it does not break any APIs, it changes the semantics enough that it feels like a breaking change. This is blocked and should not be merged in until Step 2 is merged in, to avoid breaking the Spark plugin. Authors: - Robert (Bobby) Evans (https://github.com/revans2) Approvers: - Nghia Truong (https://github.com/ttnghia) - Alessandro Bellina (https://github.com/abellina) URL: #17204

Expose these join types to pylibcudf, they will be useful for implement inequality joins in cudf polars. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #17235

If we consider the `pylibcudf.libcudf` namespace to eventually be more "private", this PR replaces that usage, specifically when accessing enums, with their public counterparts Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #17237

Fixes: #17166 This PR fixes the discoverability of the submodules of attributes and modules inside `pd.util`. Somehow `importlib.import_module("pandas.util").__dict__` doesn't display submodules and only root level attributes. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17215

The "legacy" DataFrame API is now deprecated (dask/dask#11437). The main purpose of this PR is to start isolating legacy code in Dask cuDF. **Old layout**: ``` dask_cudf/ ├── expr/ │ ├── _collection.py │ ├── _expr.py │ ├── _groupby.py ├── io/ │ ├── tests/ │ ├── ... │ ├── parquet.py │ ├── ... ├── tests/ ├── accessors.py ├── backends.py ├── core.py ├── groupby.py ├── sorting.py ``` **New layout**: ``` dask_cudf/ ├── _expr/ │ ├── accessors.py │ ├── collection.py │ ├── expr.py │ ├── groupby.py ├── _legacy/ │ ├── io/ │ ├── core.py │ ├── groupby.py │ ├── sorting.py ├── io/ │ ├── tests/ │ ├── ... │ ├── parquet.py │ ├── ... ├── tests/ ├── backends.py ├── core.py ``` **Notes** - This PR adds some backward compatibility to the expr-based API that was previously missing: The user can now import collection classes from `dask_cudf.core` (previously led to a "silent" bug when query-planning was enabled). - The user can also import various IO functions from `dask_cudf.io` (and sub-modules like `dask_cudf.io.parquet`), but they will typically get a deprecation warning. - This PR is still technically "breaking" in the sense that the user can no longer import *some* functions/classes from `dask_cudf.io.*`. Also, the `groupby`, `sorting`, and `accessors` modules have simply moved. It *should* be uncommon for down-stream code to import from these modules. It's also worth noting that query-planning was already causing problems for these users if they *were* doing this. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #17205

Closes #17127 - This PR implements the proposal in #17127 - This change technically "breaks" with the existing `IR.evaluate` convention. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17175

This PR deprecates the single component extraction methods (eg. `cudf::datetime::extract_year`) that are already covered by `cudf::datetime::extract_datetime_component`. xref #17143 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - David Wendt (https://github.com/davidwendt) - Karthikeyan (https://github.com/karthikeyann) URL: #17221

## Description    The case-sensitive name KvikIO is will throw off `find_package` searches, particularly after rapidsai/devcontainers#414 make the usage consistent in devcontainers. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

Due to a bug in cuda-python we must disallow cuda-python 12.6.1 and 11.8.4. This PR disallows those versions. It also silences new cuda-python deprecation warnings so that our test suite passes. See rapidsai/build-planning#116 for more information. --------- Co-authored-by: James Lamb <[email protected]>

Update cudf to use the new KvikIO shared library: rapidsai/kvikio#527 #### Tasks - [x] Wait for the [KvikIO shared library PR](rapidsai/kvikio#527) to be merged. - [x] Revert the use of the [KvikIO shared library](rapidsai/kvikio#527) in CI: 2d8eeaf. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - James Lamb (https://github.com/jameslamb) URL: #17239

Follow-up to #17253 Contributes to rapidsai/build-planning#116 That PR used `!=` requirements to skip a particular version of `cuda-python` that `cudf` and `pylibcudf` were incompatible with. A newer version of `cuda-python` (12.6.2 for CUDA 12, 11.8.5 for CUDA 11) was just released, and it also causes some build issues for RAPIDS libraries: rapidsai/cuvs#445 (comment) To unblock CI across RAPIDS, this proposes **temporarily** switching to ceilings on the `cuda-python` dependency here. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #17264

Closes #11396. Fixes the example in the documentation of `get_dremel_data()` Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - David Wendt (https://github.com/davidwendt) - Vukasin Milovanovic (https://github.com/vuule) - Mike Wilson (https://github.com/hyperbolic2346) - MithunR (https://github.com/mythrocks) URL: #17242

Moves the `cpp/benchmarks/string/convert_numerics.cpp` and `cpp/benchmarks/string/convert_fixed_point.cpp` benchmark implementations from google-bench to nvbench. Authors: - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Shruti Shivakumar (https://github.com/shrshi) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17255

This merge request follows up on #10744. It attempts to simplify managing expressions by adding a class called an ast tree. The ast tree manages and holds related expressions together. When the tree is destroyed, all the expressions are also destroyed. Ideally we would use a bump allocator for allocating the expressions instead of `std::vector<std::unique_ptr<expression>>`. We'd also ideally use a `cuda::std::inplace_vector` for storing the operands of the `operation` class, but that's in a newer version of CCCL. Authors: - Basit Ayantunde (https://github.com/lamarrr) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #17156

Depends on #16991 Part of #17060 Implements cross casting from string <-> numeric types in `cudf-polars` Authors: - https://github.com/brandon-b-miller - Matthew Murray (https://github.com/Matt711) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Muhammad Haseeb (https://github.com/mhaseeb123) - Matthew Murray (https://github.com/Matt711) URL: #17076

Fixes deprecation warning introduced by #17221 ``` [165+3+59=226] Building CXX object benchmarks/CMakeFiles/NDSH_Q09_NVBENCH.dir/ndsh/q09.cpp.o /cudf/cpp/benchmarks/ndsh/q09.cpp: In function 'void run_ndsh_q9(nvbench::state&, std::unordered_map<std::__cxx11::basic_string<char>, cuio_source_sink_pair>&)': /cudf/cpp/benchmarks/ndsh/q09.cpp:148:33: warning: 'std::unique_ptr<cudf::column> cudf::datetime::extract_year(const cudf::column_view&, rmm::cuda_stream_view, rmm::device_async_resource_ref)' is deprecated [-Wdeprecated-declarations] 148 | auto o_year = cudf::datetime::extract_year(joined_table->column("o_orderdate")); | ^~~~~~~~~~~~ In file included from /cudf/cpp/benchmarks/ndsh/q09.cpp:21: /cudf/cpp/include/cudf/datetime.hpp:70:46: note: declared here 70 | [[deprecated]] std::unique_ptr<cudf::column> extract_year( | ^~~~~~~~~~~~ /cudf/cpp/benchmarks/ndsh/q09.cpp:148:45: warning: 'std::unique_ptr<cudf::column> cudf::datetime::extract_year(const cudf::column_view&, rmm::cuda_stream_view, rmm::device_async_resource_ref)' is deprecated [-Wdeprecated-declarations] 148 | auto o_year = cudf::datetime::extract_year(joined_table->column("o_orderdate")); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /cudf/cpp/benchmarks/ndsh/q09.cpp:21: /cudf/cpp/include/cudf/datetime.hpp:70:46: note: declared here 70 | [[deprecated]] std::unique_ptr<cudf::column> extract_year( | ^~~~~~~~~~~~ ``` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Shruti Shivakumar (https://github.com/shrshi) URL: #17254

Combines the `benchmarks/string/copy.cu` and `benchmarks/string/gather.cpp` source files which both had separate gather benchmarks for strings. The result is a new `copy.cpp` that has both gather and scatter benchmarks. Also changes the default parameters to remove the need to restrict the values. Authors: - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) - Basit Ayantunde (https://github.com/lamarrr) URL: #17223

Implement remote IO read using KvikIO's S3 backend. For now, this is an experimental feature for parquet read only. Enable by defining `CUDF_KVIKIO_REMOTE_IO=ON`. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Paul Mattione (https://github.com/pmattione-nvidia) - Vukasin Milovanovic (https://github.com/vuule) - Shruti Shivakumar (https://github.com/shrshi) - Richard (Rick) Zamora (https://github.com/rjzamora) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #16499

Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #17232

This PR unpins the max `pyarrow` version allowed to `18`. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17256

This adds support for the bool type to reading parquet microkernels. Both plain (bit-packed) and RLE-encoded bool decode is supported, using separate code paths. This PR also massively reduces boilerplate code, as most of the template info needed is already encoded in the kernel mask. Also the superfluous level_t template parameter on rle_run has been removed. And bools have been added to the parquet benchmarks. Performance: register count drops from 62 -> 56, both plain and RLE-encoded bool decoding are now 46% faster (uncompressed). Reading sample customer data shows no change. NDS tests show no change. Authors: - Paul Mattione (https://github.com/pmattione-nvidia) Approvers: - Yunsong Wang (https://github.com/PointKernel) - https://github.com/nvdbaranec - Vukasin Milovanovic (https://github.com/vuule) URL: #17157

Moves the `cpp/benchmarks/string/convert_datetime.cpp` and `cpp/benchmarks/string/convert_duration.cpp` benchmark implementations from google-bench to nvbench. Authors: - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) URL: #17229

…ython (#17270) Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17270

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17271

…mulative `offsets` exceeds the large strings threshold. (#17207) This PR implements a method to correctly set the large-string property for column chunks in a in the Chunked Parquet Reader subpass if the cumulative string offsets have exceeded the large strings threshold. Fixes #17158 Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) - David Wendt (https://github.com/davidwendt) URL: #17207

This PR adds optional column order to enforce column order in the output. This feature is required by spark from_json. Optional `column_order` is added to `schema_element`, and it is validated during reader_option creation. The column order can be specified at root level and for any struct in any level. • For root level, the dtypes should be schema_element with type STRUCT. (schema_element is added to variant dtypes) • For nested level, column_order can be specified for any STRUCT type. (could be a map of schema_element , or schema_element) If the column order is not specified, the order of columns is same as the order of columns that appear in json file. Closes #17240 (metadata updated) Closes #17091 (will return all nulls column if not present in input json) Closes #17090 (fixed with new schema_element as dtype) Closes #16799 (output columns are created from column_order if present) Authors: - Karthikeyan (https://github.com/karthikeyann) - Nghia Truong (https://github.com/ttnghia) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: #17029

Updates the benchmark utility `create_random_utf8_string_column` to support large strings. Replaces the hardcoded `size_type` offsets with the offsetalator and related utilities. Reference #16948 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - MithunR (https://github.com/mythrocks) URL: #17224

Fixes call to `data_type{}` ctor in `json_test.cpp`. The 2-parameter ctor is for fixed-point-types only and will assert in a debug build if used incorrectly: https://github.com/rapidsai/cudf/blob/2db58d58b4a986c2c6fad457f291afb1609fd458/cpp/include/cudf/types.hpp#L277-L280 Partial stack trace from a gdb run ``` #5 0x000077b1530bc71b in __assert_fail_base (fmt=0x77b153271130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x58c3e4baaa98 "id == type_id::DECIMAL32 || id == type_id::DECIMAL64 || id == type_id::DECIMAL128", file=0x58c3e4baaa70 "/cudf/cpp/include/cudf/types.hpp", line=279, function=<optimized out>) at ./assert/assert.c:92 #6 0x000077b1530cde96 in __GI___assert_fail ( assertion=0x58c3e4baaa98 "id == type_id::DECIMAL32 || id == type_id::DECIMAL64 || id == type_id::DECIMAL128", file=0x58c3e4baaa70 "/cudf/cpp/include/cudf/types.hpp", line=279, function=0x58c3e4baaa38 "cudf::data_type::data_type(cudf::type_id, int32_t)") at ./assert/assert.c:101 #7 0x000058c3e48ba594 in cudf::data_type::data_type (this=0x7fffdd3f7530, id=cudf::type_id::STRING, scale=0) at /cudf/cpp/include/cudf/types.hpp:279 #8 0x000058c3e49215d9 in JsonReaderTest_MixedTypesWithSchema_Test::TestBody (this=0x58c3e5ea13a0) at /cudf/cpp/tests/io/json/json_test.cpp:2887 ``` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17273

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17275

This PR adds [`include-what-you-use`](https://github.com/include-what-you-use/include-what-you-use/) to the CI job running clang-tidy. Like clang-tidy, IWYU runs via CMake integration and only runs on cpp files, not cu files. This should help us shrink binaries and reduce compilation times in cases where headers are being included unnecessarily, and it helps keep our include lists clean. The IWYU suggestions for additions are quite noisy and the team determined this to be unnecessary, so this PR instead post-filters the outputs to only show the removals. The final suggestions are uploaded to a file that is uploaded to the GHA page so that it can be downloaded, inspected, and easily applied locally. Resolves #581. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Mark Harris (https://github.com/harrism) - David Wendt (https://github.com/davidwendt) - Yunsong Wang (https://github.com/PointKernel) - James Lamb (https://github.com/jameslamb) - Karthikeyan (https://github.com/karthikeyann) URL: #17078

…read_json` directly (#17180) With this PR, `Table.readJSON` will return the output from libcudf `read_json` directly without the need of reordering the columns to match with the input schema, as well as generating all-nulls columns for the ones in the input schema that do not exist in the JSON data. This is because libcudf `read_json` already does these thus we no longer have to do it. Depends on: * #17029 Partially contributes to NVIDIA/spark-rapids#11560. Closes #17002 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) URL: #17180

Implement inequality joins by using the newly-exposed conditional join from pylibcudf. - Closes #16926 Authors: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17000

Fixes: #17165 Fixes: #14481 This PR properly wraps the result of custom iterator. ```python In [2]: import pandas as pd In [3]: s = pd.Series([10, 1, 2, 3, 4, 5]*1000000) # Without custom_iter: In [4]: %timeit for i in s: True 6.34 s ± 25.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # This PR: In [4]: %timeit for i in s: True 6.16 s ± 17.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # On `branch-24.12`: 1.53 s ± 6.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` I think `custom_iter` has to exist. Here is why, invoking any sort of `iteration` on GPU objects will raise errors and thus in the end we fall-back to CPU. Instead of trying to move the objects from host to device memory (if the object is on host memory only), we will avoid a CPU-to-GPU transfer. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17251

Make constructor of DeviceMemoryBufferView and ContiguousTable public. Authors: - Renjie Liu (https://github.com/liurenjie1024) Approvers: - Jason Lowe (https://github.com/jlowe) URL: #17265

Related to rapidsai/build-planning#33 and rapidsai/build-planning#74 The last use of CMake function `install_aliased_imported_targets()` here was removed in #16946. This proposes removing the file holding its definition. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #17276

dask/dask-expr#1159 made upstream changes in `dask-expr` to use `TaskSpec`, this PR updates `dask-cudf` to be compatible with those changes. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Richard (Rick) Zamora (https://github.com/rjzamora) URL: #17285

This PR enhances groupby performance for low-cardinality input cases. When applicable, it leverages shared memory for initial aggregation, followed by global memory aggregation to reduce atomic contention and improve performance. Authors: - Yunsong Wang (https://github.com/PointKernel) - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - David Wendt (https://github.com/davidwendt) - Mike Wilson (https://github.com/hyperbolic2346) - Vyas Ramasubramani (https://github.com/vyasr) URL: #16619

Apart of #15162. Also adds tests for `pylibcudf.filling`. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17277

Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #17245

Addressing comments in https://github.com/rapidsai/cudf/pull/17008/files#r1823318321 and https://github.com/rapidsai/cudf/pull/17008/files#r1823318898 Didn't touch the `_fuzz_testing` directory because maybe we don't want that to be deterministic? Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) - GALI PREM SAGAR (https://github.com/galipremsagar) - James Lamb (https://github.com/jameslamb) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17272

Numba-cuda 0.0.18 (not yet released) contains some changes that might break pynvjitlink patching. In order to avoid breaking RAPIDS CI whilst working through that after releasing numba-cuda 0.0.18 but before the next pynvjitlink, this PR makes use of numba-cuda 0.0.17 or less a requirement. Authors: - Graham Markall (https://github.com/gmarkall) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - https://github.com/brandon-b-miller - Vyas Ramasubramani (https://github.com/vyasr) URL: #17280

Shouldn't need to use the "private" `pylibcudf.libcudf` types anymore now that the Python side enums are exposed Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17287

Similar to #17287. Also remove a `plc` naming shadowing Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17288

…#17266) Fixes #17261 Removes delimiter symbol group from whitespace normalization FST since it is run post-tokenization. Authors: - Shruti Shivakumar (https://github.com/shrshi) - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Karthikeyan (https://github.com/karthikeyann) URL: #17266

Fixes: #17140 This PR fixes slow-downs in `DataFrame.__seitem__` by properly passing in CPU objects where needed instead of passing a GPU object and then failing and performing a GPU -> CPU transfer. `DataFrame.__setitem__` first argument can be a column(pd.Index), in our fast path this will be converted to `cudf.Index` and thus there will be failure from cudf side and then the transfer to CPU + slow-path executes, this is the primary reason for slowdown. This PR maintains a dict mapping of such special functions where we shouldn't be converting the objects to fast path. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17222

Adds stream parameter to ``` cudf::quantile cudf::quantiles cudf::percentile_approx ``` Added stream gtests to verify correct stream forwarding. Reference: #13744 Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Paul Mattione (https://github.com/pmattione-nvidia) - David Wendt (https://github.com/davidwendt) URL: #17257

Compile flag to enable/disable remote IO through KvikIO: `CUDF_KVIKIO_REMOTE_IO` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Tianyu Liu (https://github.com/kingcrimsontianyu) - Bradley Dice (https://github.com/bdice) URL: #17291

JNI build does not require kvikIO, to unblock the build use `CUDF_KVIKIO_REMOTE_IO=OFF` in cpp build phase. this should be merged after #17291 Authors: - Peixin (https://github.com/pxLi) Approvers: - Nghia Truong (https://github.com/ttnghia) URL: #17293

…16960) Closes #16690. The purpose of this PR is to list all of the unique operations that are unsupported by `cudf.polars` when running a query. 1. Question: How to traverse the tree to report the error nodes? Should this be done upstream in Polars? 2. Instead of traversing the query afterwards, we should probably catch each unsupported feature as we translate the IR. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) URL: #16960

Introduce new nvtext minhash API that takes a single seed for hashing and 2 parameter vectors to calculate the minhash results from the seed hash: ``` std::unique_ptr<cudf::column> minhash_permuted( cudf::strings_column_view const& input, uint32_t seed, cudf::device_span<uint32_t const> parameter_a, cudf::device_span<uint32_t const> parameter_b, cudf::size_type width, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr); ``` The `seed` is used to hash the `input` using rolling set of substrings `width` characters wide. The hashes are then combined with the values in `parameter_a` and `parameter_b` to calculate a set of 32-bit (or 64-bit) values for each row. Only the minimum value is returned per element of `a` and `b` when combined with all the hashes for a row. Each output row is a set of M values where `M = parameter_a.size() = parameter_b.size()` This implementation is significantly faster than the current minhash which computes hashes for multiple seeds. Included in this PR is also the `minhash64_permuted()` API that is identical but uses 64-bit values for the seed and the parameter values. Also included are new tests and a benchmark as well as the pylibcudf and cudf interfaces. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) - Karthikeyan (https://github.com/karthikeyann) - Yunsong Wang (https://github.com/PointKernel) URL: #16756

Having looked at a bunch of the automation options, I just did it by hand. A followup will add some automation to add docstrings (so we can see those via LSP integration in editors) and do some simple validation. - Closes #15190 Authors: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Murray (https://github.com/Matt711) URL: #17258

Add new `cudf::strings::contains_multiple` API to search multiple targets within a strings column. Output is a table where the number of columns is the number of targets and each row is a boolean indicating that target was found at the row or not. This PR is to help in collaboration with #16641 Authors: - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) - Chong Gao (https://github.com/res-life) - Bradley Dice (https://github.com/bdice) Approvers: - Chong Gao (https://github.com/res-life) - Yunsong Wang (https://github.com/PointKernel) - MithunR (https://github.com/mythrocks) - Tianyu Liu (https://github.com/kingcrimsontianyu) - Bradley Dice (https://github.com/bdice) URL: #16900

Contributes to rapidsai/build-planning#110 Proposes adding 2 types of validation on wheels in CI, to ensure we continue to produce wheels that are suitable for PyPI. * checks on wheel size (compressed), - *to be sure they're under PyPI limits* - *and to prompt discussion on PRs that significantly increase wheel sizes* * checks on README formatting - *to ensure they'll render properly as the PyPI project homepages* - *e.g. like how https://github.com/scikit-learn/scikit-learn/blob/main/README.rst becomes https://pypi.org/project/scikit-learn/* ## Notes for Reviewers ### How I tested this Initially set the size threshold for `libcudf` to a value that I knew it'd violate (75MB compressed, when the wheels are 400+ MB compressed). Saw CI fail as expected, and print a summary with the expected contents. ```text checking 'final_dist/libcudf_cu11-24.12.0a333-py3-none-manylinux_2_28_aarch64.whl' ----- package inspection summary ----- file size * compressed size: 0.4G * uncompressed size: 0.6G * compression space saving: 34.6% contents * directories: 164 * files: 1974 (2 compiled) size by extension * .so - 0.6G (97.0%) * .h - 6.7M (1.0%) * no-extension - 4.8M (0.7%) * .cuh - 3.8M (0.6%) * .hpp - 2.2M (0.3%) * .a - 1.1M (0.2%) * .inl - 0.8M (0.1%) * .cmake - 0.1M (0.0%) * .md - 8.3K (0.0%) * .py - 4.0K (0.0%) * .pc - 0.2K (0.0%) * .txt - 34.0B (0.0%) largest files * (0.6G) libcudf/lib64/libcudf.so * (3.3M) libcudf/bin/flatc * (1.0M) libcudf/lib64/libflatbuffers.a * (0.5M) libcudf/include/libcudf/rapids/libcudacxx/cuda/std/__atomic/functions/cuda_ptx_generated.h * (0.2M) libcudf_cu11-24.12.0a333.dist-info/RECORD ------------ check results ----------- 1. [distro-too-large-compressed] Compressed size 0.4G is larger than the allowed size (75.0M). errors found while checking: 1 ``` ([build link](https://github.com/rapidsai/cudf/actions/runs/11748370606/job/32732391718?pr=17284#step:13:3062)) Updated that threshold in `python/libcudf/pyproject.toml`, and saw the build succeed (but the summary still printed). # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17284

@Matt711

Polars 1.13 is out, so add support for that. I needed to change some of the logic in the callback raising after @Matt711's changes, I am not sure why tests were passing previously. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Matthew Murray (https://github.com/Matt711) - GALI PREM SAGAR (https://github.com/galipremsagar) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17299

…ed (#17260) Issue #17259 Avoid checking `_gds_read_preferred_threshold` threshold when deciding whether `device_read`/device_write` is preferred to host IO + copy. The reasons are twofold: 1. KvikIO already has an internal threshold for GDS use so we don't need to check on our end as well. 2. Without actual GDS use, kvikIO uses a pinned bounce buffer to efficiently copy to/from the device. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Tianyu Liu (https://github.com/kingcrimsontianyu) - Basit Ayantunde (https://github.com/lamarrr) URL: #17260

Closes #14975 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17268

Adds stream parameter to `cudf::transpose`. Verifies correct stream forwarding with stream gtests. Reference: #13744 Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #17294

This change helps shrink RAPIDS wheels. It should not affect Spark builds since those use the build directory of cudf and statically link in those components to its final binary. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Mark Harris (https://github.com/harrism) - Bradley Dice (https://github.com/bdice) URL: #17308

Closes #15397 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Matthew Roeschke (https://github.com/mroeschke) URL: #17267

Authors: - Basit Ayantunde (https://github.com/lamarrr) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) - David Wendt (https://github.com/davidwendt) URL: #17292

CMake's `FindCUDAToolkit` has supported cuFile since 3.25. Use this support and remove the custom `FindcuFile` module. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) - Yunsong Wang (https://github.com/PointKernel) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17298

This fixes a synchronization bug in the parquet microkernels for plain-decoding bools. This closes [several](NVIDIA/spark-rapids#11715) timing [issues](NVIDIA/spark-rapids#11716) found during testing of spark-rapids. Authors: - Paul Mattione (https://github.com/pmattione-nvidia) Approvers: - Bradley Dice (https://github.com/bdice) - Vukasin Milovanovic (https://github.com/vuule) URL: #17302

This PR adds Polars tests to our nightly runs now that [we no longer only fail conditional on certain files changing in PRs](#17299). This PR also updates the IWYU jobs to use [the version released three days ago, which supports clang 19 like we need](https://github.com/include-what-you-use/include-what-you-use/releases/tag/0.23). It also fixes a couple of errors in the CMake for how we were setting compile flags for IWYU. Closes #16383 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17306

Move `cpp/benchmark/string/filter.cpp` from google-test to nvbench Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17269

* Removed `ctypedef const scalar constscalar` usage * Use `dtype_to_pylibcudf_type` where appropriate * Use pylibcudf enums instead of `pylibcudf.libcudf` types Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17309

Closes #16443 Authors: - Brian Tepera (https://github.com/btepera) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17314

Use pylibcudf’s pack and unpack to implement Dask compatible serialization. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Lawrence Mitchell (https://github.com/wence-) - Richard (Rick) Zamora (https://github.com/rjzamora) - Vyas Ramasubramani (https://github.com/vyasr) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) URL: #17062

This is Java JNI interface for [multiple contains PR](#16900) Authors: - Chong Gao (https://github.com/res-life) Approvers: - Alessandro Bellina (https://github.com/abellina) - Robert (Bobby) Evans (https://github.com/revans2) URL: #17281

Resolves #3155. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - Robert Maynard (https://github.com/robertmaynard) URL: #17312

Fixed the logic in the CSV reader that led to empty output instead of producing a table with a single column and one row. Added tests to make sure the new logic does not cause regressions. Also did some small clean up around the fix. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #17305

…17316) Contributes to rapidsai/build-planning#118 Modifies `libcudf.load_library()` in the following ways: * prefer wheel-provided `libcudf.so` to system installation * expose environment variable `RAPIDS_LIBCUDF_PREFER_SYSTEM_LIBRARY` for switching that preference * load `libcudf.so` with `RTLD_LOCAL`, to prevent adding symbols to the global namespace ([dlopen docs](https://linux.die.net/man/3/dlopen)) ## Notes for Reviewers ### How I tested this Locally (x86_64, CUDA 12, Python 3.12), built `libcudf`, `pylibcudf`, and `cudf` wheels from this branch, then `libcuspatial` and `cuspatial` from the corresponding cuspatial branch. Ran `cuspatial`'s unit tests, and tried setting the environment variable and inspecting `ld`'s logs to confirm that the environment variable changed the loading and search behavior. e.g. ```shell # clear ld cache to avoid cheating rm -f /etc/ld.so.cache ldconfig # try using an env variable to say "prefer the system-installed version" LD_DEBUG=libs \ LD_DEBUG_OUTPUT=/tmp/out.txt \ RAPIDS_LIBCUDF_PREFER_SYSTEM_LIBRARY=true \ python -c "import cuspatial; print(cuspatial.__version__)" cat /tmp/out.txt.* > prefer-system.txt # (then manually looked through those logs to confirm it searched places like /usr/lib64 and /lib64) ``` # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: #17316

…lly linked (#17322) Had an issue crop up in spark-rapids-jni where we statically link arrow and the build started to fail due to change #17308. Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Gera Shegalov (https://github.com/gerashegalov) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #17322

This updates the java APIs related to datetime processing so that they match the CUDF APIs. Authors: - Robert (Bobby) Evans (https://github.com/revans2) Approvers: - MithunR (https://github.com/mythrocks) - Jason Lowe (https://github.com/jlowe) - Gera Shegalov (https://github.com/gerashegalov) URL: #17329

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17319

…nd `timedelta` columns (#17331) This PR: - [x] Adds support for `find_and_replace` in `DateTimeColumn` and `TimeDeltaColumn`, such that when `.replace` is called on a series or dataframe with these columns, we don't error and replace the values correctly. - [x] Fixed various type combination edge cases that were previously incorrectly handled and updated stale tests associated with them. - [x] Added a small parquet file in pytests that has multiple rows that uncovered these bugs. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17331

This PR provides access to the libcudf chunked parquet reader through the `cudf-polars` gpu engine, inspired by the cuDF python implementation. Closes #16818 Authors: - https://github.com/brandon-b-miller - GALI PREM SAGAR (https://github.com/galipremsagar) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) URL: #16944

The reference in JNI was missed in #17298. Replace it with `FindCUDAToolkit`. Also backport `FindCUDAToolkit` from CMake 3.31 to get https://gitlab.kitware.com/cmake/cmake/-/commit/b38a8e77cb3c8401b3022a68f07a4fd77b290524. Also add an option to statically link `cuFile`. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) - Gera Shegalov (https://github.com/gerashegalov) URL: #17315

This is a prototype implementation of rapidsai/build-infra#139 The work that this builds on: * rapidsai/gha-tools#118, which adds a shell wrapper that automatically creates spans for the commands that it wraps. It also uses the `opentelemetry-instrument` command to set up monkeypatching for supported Python libraries, if the command is python-based * https://github.com/rapidsai/shared-workflows/tree/add-telemetry, which installs the gha-tools work from above and sets necessary environment variables. This is only done for the conda-cpp-build.yaml shared workflow at the time of submitting this PR. The goal of this PR is to observe telemetry data sent from a GitHub Actions build triggered by this PR as a proof of concept. Once it all works, the remaining work is: * merge rapidsai/gha-tools#118 * Move the opentelemetry-related install stuff in https://github.com/rapidsai/shared-workflows/compare/add-telemetry?expand=1#diff-ca6188672785b5d214aaac2bf77ce0528a48481b2a16b35aeb78ea877b2567bcR118-R125 into https://github.com/rapidsai/ci-imgs, and rebuild ci-imgs * expand coverage to other shared workflows * Incorporate the changes from this PR to other jobs and to other repos Authors: - Mike Sarahan (https://github.com/msarahan) Approvers: - Bradley Dice (https://github.com/bdice) URL: #16924

Updates cmake to 3.28.6 in the JNI Dockerfile used to build the cudf jar. This helps avoid a bug in older cmake where FindCUDAToolkit can fail to find cufile libraries. Authors: - Jason Lowe (https://github.com/jlowe) Approvers: - Nghia Truong (https://github.com/ttnghia) - Gera Shegalov (https://github.com/gerashegalov) URL: #17342

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #17246

Moves `cpp/benchmarks/string/translate.cpp` implementation from google-bench to nvbench. This is benchmark for the `cudf::strings::translate` API. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #17325

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17318

Fixes #17068 Fixes #12299 This PR introduces a new datasource for compressed inputs which enables batching and byte range reading of multi-source JSONL files using the reallocate-and-retry policy. Moreover. instead of using a 4:1 compression ratio heuristic, the device buffer size is estimated accurately for GZIP, ZIP, and SNAPPY compression types. For remaining types, the files are first decompressed then batched. ~~TODO: Reuse existing JSON tests but with an additional compression parameter to verify correctness.~~ ~~Handled by #17219, which implements compressed JSON writer required for the above test.~~ Multi-source compressed input tests added! Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Karthikeyan (https://github.com/karthikeyann) URL: #17161

This PR ensures that we have nightly coverage of more of the CUDA/Python/arch versions that we claim to support for dask-cudf and cudf-polars wheels. In addition, this PR ensures that we do not attempt to run the dbgen executable in the Polars repository on systems with too old of a glibc to support running them. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17320

… 0 (#17321) This PR fixes reading string columns in Parquet using chunked parquet reader when `nrows` and `input_pass_limit` are > 0. Closes #17311 Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Ed Seidl (https://github.com/etseidl) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) - https://github.com/nvdbaranec - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17321

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17345

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17344

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17347

Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17346

1.13 was yanked for some reason, but 1.14 doesn't bring anything new and difficult. Authors: - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/brandon-b-miller - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17355

Depends on #17161 for implementations of compression and decompression functions (`io/comp/comp.cu`, `io/comp/comp.hpp`, `io/comp/io_uncomp.hpp` and `io/comp/uncomp.cpp`) Adds support for writing GZIP- and SNAPPY-compressed JSON to the JSON writer. Verifies correctness using a parameterized test in `tests/io/json/json_writer.cpp` Authors: - Shruti Shivakumar (https://github.com/shrshi) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Karthikeyan (https://github.com/karthikeyann) - Vukasin Milovanovic (https://github.com/vuule) URL: #17323

Contributes to rapidsai/build-planning#118 The pattern introduced in #17316 breaks editable installs in devcontainers. In that type of build, `libcudf.so` is built outside of the wheel but **not installed**, so it can't be found by `ld`. Extension modules in `cudf` and `pylibcudf` are able to find it via RPATHs instead. This proposes: * try-catching the entire library-loading attempt, to silently do nothing in cases like that * ~adding imports of the `cudf` and `pylibcudf` libraries in the `devcontainers` CI job, as a smoke test to catch issues like this in the future~ *(edit: removed those, [`devcontainer` builds run on CPU nodes](https://github.com/rapidsai/shared-workflows/blob/4e84062f333ce5649bc65029d3979569e2d0a045/.github/workflows/build-in-devcontainer.yaml#L19))* ## Notes for Reviewers ### How I tested this Tested this approach on rapidsai/kvikio#553 # Authors: - James Lamb (https://github.com/jameslamb) - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) - Matthew Murray (https://github.com/Matt711) URL: #17338

For large columns, the computed stride might end up overflowing size_type. To fix this, use the grid_1d helper. See also #10368. - Closes #17353 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) - Tianyu Liu (https://github.com/kingcrimsontianyu) - Muhammad Haseeb (https://github.com/mhaseeb123) - Nghia Truong (https://github.com/ttnghia) URL: #17354

Move `cpp/benchmark/string/replace.cpp` implementation from google-test to nvbench This covers strings replace APIs: - `cudf::strings::replace` scalar version - `cudf::strings::replace_multiple` column version - `cudf::strings::replace_slice` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Shruti Shivakumar (https://github.com/shrshi) URL: #17301

…17278) This PR introduces a minor optimization for distinct inner joins by using the `find` results to selectively copy matches to the output. This approach eliminates the need for the costly `retrieve` operation, which relies on expensive atomic operations. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) URL: #17278

…mn (#17279) Follow up to #16760 `transform.compute_column` (backing `.eval`) requires an `Expression` object created by a private routine in cudf Python. Since this routine will be needed for any user of the public `transform.compute_column`, moving it to pylibcudf. Authors: - Matthew Roeschke (https://github.com/mroeschke) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17279

…17333) This pull request modifies the read_gdf method in kafka.py to pass the lines parameter only when the message_format is "json". This prevents lines from being passed to other formats (e.g., CSV, Avro, ORC, Parquet), which do not support this parameter. Authors: - Hirota Akio (https://github.com/a-hirota) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17333

This PR adapts cuDF to a breaking API change in KvikIO (rapidsai/kvikio#547) introduced recently, which adds the `AUTO` compatibility mode to file I/O. This PR causes no behavioral changes in cuDF: If the environment variable `KVIKIO_COMPAT_MODE` is left unset, cuDF by default still enables the compatibility mode in KvikIO. This is the same with the previous behavior (#17185). Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Vukasin Milovanovic (https://github.com/vuule) URL: #17377

Depends on #17161 for implementations of compression and decompression functions (`io/comp/comp.cu`, `io/comp/comp.hpp`, `io/comp/io_uncomp.hpp` and `io/comp/uncomp.cpp`)\ Depends on #17323 for compressed JSON writer implementation. Adds benchmark to measure performance of the JSON reader for compressed inputs. Authors: - Shruti Shivakumar (https://github.com/shrshi) - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - MithunR (https://github.com/mythrocks) - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17219

Deselect `test_join_4_columns_with_validity` which is failing in nightly CI tests and is reproducible in some systems (xref pola-rs/polars#19870), but apparently not all. Deselect `test_read_web_file` as well that fails on rockylinux8 due to SSL CA issues. Authors: - Peter Andreas Entschev (https://github.com/pentschev) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #17362

It's time to clean up the `dask_cudf.read_parquet` API and prioritize GPU-specific optimizations. To this end, it makes sense to expose our own `read_parquet` API within Dask cuDF. **Notes**: - The "new" `dask_cudf.read_parquet` API is only relevant when query-planning is enabled (the default). - Using `filesystem="arrow"` now uses `cudf.read_parquet` when reading from local storage (rather than PyArrow). - (specific to Dask cuDF): The default `blocksize` argument is now specific to the "smallest" NVIDIA device detected within the active dask cluster (or the first device visible to the the client). More specifically, we use `pynvml` to find this representative device size, and we set `blocksize` to be 1/32 this size. - The user may also pass in something like `blocksize=0.125` to use `1/8` the minimum device size (or `blocksize='1GiB'` to bypass the default logic altogether). - (specific to Dask cuDF): When `blocksize` is `None`, we disable partition fusion at optimization time. - (specific to Dask cuDF): When `blocksize` is **not** `None`, we use the parquet metadata from the first few files to inform partition fusion at optimization time (instead of a rough column-count ratio). Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) - Vyas Ramasubramani (https://github.com/vyasr) - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17250

This merge request adds benchmarks for the Arrow Interop APIs: - `from_arrow_host` - `to_arrow_host` - `from_arrow_device` - `to_arrow_device` Closes #17104 Authors: - Basit Ayantunde (https://github.com/lamarrr) Approvers: - David Wendt (https://github.com/davidwendt) URL: #17194

Closes #17036 (WIP, generated by a quick `sed` script) Authors: - https://github.com/brandon-b-miller - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17109

Follow up to #16944 That PR added `config: GPUEngine` to the arguments of every `IR.do_evaluate` function. In order to simplify future multi-GPU development, this PR extracts the necessary configuration argument at `IR` translation time instead. Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) - Lawrence Mitchell (https://github.com/wence-) Approvers: - https://github.com/brandon-b-miller - Lawrence Mitchell (https://github.com/wence-) URL: #17339

Move `cpp/benchmarks/string/url_decode.cu` implementation from google-bench to nvbench. This benchmark is for the `cudf::strings::url_decode` API. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Nghia Truong (https://github.com/ttnghia) URL: #17328

closes #17360 Technically I suppose this was more of an enhancement since the documentation suggested only a single label was supported, but I'll mark as a bug since the error message was not informative. Authors: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17373

Moves the `cpp/benchmarks/string/repeat_strings.cpp` implementation from google-bench to nvbench. This covers the overloads of the `cudf::strings::repeat_strings` API. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #17304

#17250 started using `pynvml` but did not add the proper dependency, this change fixes the missing dependency. Authors: - Peter Andreas Entschev (https://github.com/pentschev) - https://github.com/jakirkham Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - https://github.com/jakirkham URL: #17386

This is likely the easiest fix for avoiding CI errors from this part of the code. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) URL: #17389

Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - David Wendt (https://github.com/davidwendt) - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) URL: #17163

This PR enables Unified memory as the default memory resource for `cudf_polars` --------- Co-authored-by: Vyas Ramasubramani <[email protected]> Co-authored-by: Vyas Ramasubramani <[email protected]> Co-authored-by: Matthew Murray <[email protected]> Co-authored-by: Lawrence Mitchell <[email protected]> Co-authored-by: Matthew Murray <[email protected]>

[RELEASE] cudf v24.12 #17406

Are you sure you want to change the base?

[RELEASE] cudf v24.12 #17406

Commits on Oct 5, 2024

Commits on Oct 7, 2024

Commits on Oct 8, 2024

Commits on Oct 9, 2024

Commits on Oct 10, 2024

Commits on Oct 11, 2024

Commits on Oct 12, 2024

Commits on Oct 14, 2024

Commits on Oct 15, 2024

Commits on Oct 16, 2024

Commits on Oct 17, 2024

Commits on Oct 18, 2024

Commits on Oct 19, 2024

Commits on Oct 21, 2024

Commits on Oct 22, 2024

Commits on Oct 23, 2024

Commits on Oct 24, 2024

Commits on Oct 25, 2024

Commits on Oct 28, 2024

Commits on Oct 29, 2024

Commits on Oct 30, 2024

Commits on Oct 31, 2024

Commits on Nov 1, 2024

Commits on Nov 2, 2024

Commits on Nov 4, 2024

Commits on Nov 5, 2024

Commits on Nov 6, 2024

Commits on Nov 7, 2024

Commits on Nov 8, 2024

Commits on Nov 9, 2024

Commits on Nov 12, 2024

Commits on Nov 13, 2024

Commits on Nov 14, 2024

Commits on Nov 15, 2024

Commits on Nov 16, 2024

Commits on Nov 18, 2024

Commits on Nov 19, 2024

Commits on Nov 20, 2024

An error occurred while loading commit statuses

An error occurred while loading commit statuses

An error occurred while loading commit statuses

An error occurred while loading commit statuses

An error occurred while loading commit statuses

Commits on Nov 21, 2024

An error occurred while loading commit statuses

An error occurred while loading commit statuses

An error occurred while loading commit statuses

Commits on Nov 22, 2024

An error occurred while loading commit statuses