-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] cudf v24.12 #17406
Open
raydouglass
wants to merge
334
commits into
main
Choose a base branch
from
branch-24.12
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[RELEASE] cudf v24.12 #17406
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #16994
Add empty string column condition for write_json bypass make_strings_children for empty column because when grid size is zero, it throws cuda error. Authors: - Karthikeyan (https://github.com/karthikeyann) - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - David Wendt (https://github.com/davidwendt) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #16995
This PR removes an unused unused import in cudf which was causing errors in doc builds. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17005
This PR adds two new jobs to the project automations. One to extract the version number from the branch name, and one to set the project `Release` field to the version found. Authors: - Ben Jarmak (https://github.com/jarmak-nv) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17001
With this set of changes I get a clean run of clang-tidy (with one caveat that I'll explain in the follow-up PR to add clang-tidy to pre-commit/CI). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) - David Wendt (https://github.com/davidwendt) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #16956
Closes #16735 Authors: - https://github.com/brandon-b-miller - Lawrence Mitchell (https://github.com/wence-) Approvers: - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) URL: #16776
Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17006
Everything in the expression evaluation now operates on columns without names. DataFrame construction takes either a mapping from string-valued names to columns, or a sequence of pairs of names and columns. This removes some duplicate code in the NamedColumn class (by removing it) where we had to fight the inheritance hierarchy. - Closes #16272 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Murray (https://github.com/Matt711) URL: #16962
We use the pairwise approach of Chan, Golub, and LeVeque (1983). - Closes #16444 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) - Robert (Bobby) Evans (https://github.com/revans2) URL: #16448
The cudf tests already treat tests that are expected to fail but pass as errors, but at the time we introduced that change, we didn't do the same for the other packages. Do that now, it turns out there are only a few xpassing tests. While here, it turns out that having multiple different pytest configuration files does not work. `pytest.ini` takes precedence over other options, and it's "first file wins". Consequently, the merge of #16851 turned off `xfail_strict = true` (and other options) for many of the subpackages. To fix this, migrate all pytest configuration into the appropriate section of the `pyproject.toml` files, so that all tool configuration lives in the same place. We also add a section in the developer guide to document this choice. - Closes #12391 - Closes #16974 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - James Lamb (https://github.com/jameslamb) - Matthew Roeschke (https://github.com/mroeschke) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #16977
As part of JSON validation, field, value and string tokens are validated. Right now the code has single transform_inclusive_scan. Since this transform functor is a heavy operation, it slows down the entire scan drastically. This PR splits transform and scan in validation. The runtime of validation went from 200ms to 20ms. Also, a few hardcoded string comparisons are moved to trie. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) - Robert (Bobby) Evans (https://github.com/revans2) URL: #16996
Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #17007
Contributes to rapidsai/build-planning#106 Proposes specifying the RAPIDS version in `conda install` calls in CI that install CI artifacts, to reduce the risk of CI jobs picking up artifacts from other releases. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17013
Contributes to #15162 Also I believe the cpp docstrings were incorrect, but could use a second look. Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - https://github.com/brandon-b-miller - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: #17003
We have recently observed a number of seg faults in our Python tests. From some investigation, the error comes from the import of pyarrow loading the bundled libarrow.so, and in particular when that library runs a jemalloc function `background_thread_entry`. We have observed similar (but not identical) errors in the past that have to do with as-yet unsolved problems in the way that arrow handles multi-threaded environments. The error is currently only observed on arm runners and with pyarrow 17.0.0. In my tests the error is highly sensitive to everything from import order to unrelated code segments, suggesting a race condition, some form of memory corruption, or perhaps symbol resolution errors at runtime. As a result, I have had limited success in drilling down further into specific causes, especially since attempts to rebuild libarrow.so also squash the error and I therefore cannot use debug symbols. From some offline discussion we decided that avoiding the problematic version is a sufficient fix for now. Due to the sensitivity, I am simply skipping 17.0.0 in this PR. I suspect that future builds of pyarrow will also usually not exhibit this bug (although it may recur occasionally on specific versions of pyarrow). Therefore, rather than lowering the upper bound I would prefer to allow us to float and see if and when this problem reappears. Since our DFG+RBB combination for wheel builds does not yet support any matrix entry other than `cuda`, I'm using environment markers to specify the constraint rather than a matrix entry in dependencies.yaml. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17018
…nd` (#16485) Refactors `histogram` reduce and groupby aggregations using `cuco::static_set::insert_and_find`. Speed improvement results [here](#16485 (comment)) and [here](#16485 (comment)). Authors: - Srinivas Yadav (https://github.com/srinivasyadav18) - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: #16485
…17026) the same issue as NVIDIA/spark-rapids-jni#2475 due to rapidsai/kvikio#464 Port the fix from NVIDIA/spark-rapids-jni#2476, verified locally Authors: - Peixin (https://github.com/pxLi) Approvers: - Nghia Truong (https://github.com/ttnghia) URL: #17026
Forward-merge branch-24.10 into branch-24.12
cuda::std::optional shouldn't be used for host types such as `std::vector` as it requires the constructors of the `T` types to be host+device. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) - MithunR (https://github.com/mythrocks) - Nghia Truong (https://github.com/ttnghia) URL: #17015
When instantiating a `cudf.pandas` proxy array, a DtoH transfer occurs so that the data buffer is set correctly. We do this because functions which utilize NumPy's C API can utilize the data buffer directly instead of going through `__array__`. This PR documents this limitation. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) URL: #16955
…17020) One of the `host_span` constructors was not updated when we added `is_device_accessible`, so the value was not assigned. This PR fixes this simple error and adds tests that checks that this property is correctly set when creating `host_span`s. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17020
Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - https://github.com/brandon-b-miller URL: #16990
This PR updates all the RMM imports to use pylibrmm/librmm now that `rmm._lib` is deprecated . It should be merged after [rmm/1676](rapidsai/rmm#1676). Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Charles Blackmon-Luca (https://github.com/charlesbluca) URL: #16913
Fixes the libcudf regex parsing logic when handling nested fixed quantifiers. The logic handles fixed quantifiers by simple repeating the previous instruction. If the previous item is a group (capture or non-capture) that group may also contain an internal fixed quantifier as well. Found while working on #16730 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #16798
Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #16997
Contributes to #15162 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - James Lamb (https://github.com/jameslamb) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17025
…oint (#17048) Contributes to #15162 I don't think there are any types in this file that needs to be exposed on the Python side; they're just used internally in pylibcudf. Also moves this to `libcudf/fixed_point` matching the libcudf location more closely Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17048
…7010) Contributes to #15162 ~I just assumed since the associated libcudf files just publicly expose C types, we just want to match the name spacing when importing from pylibcudf (avoid importing from `pylibcudf.libcudf`) and not necessary expose a Python equivalent?~ ~Let me know if I am misunderstanding how to expose these types.~ #17010 (comment) Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Matthew Murray (https://github.com/Matt711) URL: #17010
Follow-up to #17013 Changes relative to that PR: * switches to pinning CI conda installs to the output of `rapids-version` (`{major}.{minor}.{patch}`) instead of `rapids-version-major-minor` (`{major}.{minor}`), to get a bit more protection in the presence of hotfix releases * restores some exporting of variables needed for docs builds I made some mistakes in #17013 (comment). Missed that this project's Doxygen setup is expecting to find `RAPIDS_VERSION` and `RAPIDS_VERSION_MAJOR_MINOR` defined in the environment. https://github.com/rapidsai/cudf/blob/7173b52fce25937bb69e22a083a5de4655078fa1/cpp/doxygen/Doxyfile#L41 https://github.com/rapidsai/cudf/blob/7173b52fce25937bb69e22a083a5de4655078fa1/cpp/doxygen/Doxyfile#L2229 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #17042
Adding python bindings to [`cudf::pack()`](https://docs.rapids.ai/api/libcudf/legacy/group__copy__split#ga86716e7ec841541deb6edc7e91fcb9e4), [`cudf::unpack()`](https://docs.rapids.ai/api/libcudf/legacy/group__copy__split#ga1d62a18c2e6f087a92289c63693762cc), and [`cudf::packed_columns`](https://docs.rapids.ai/api/libcudf/legacy/structcudf_1_1packed__columns). This is the first step to support serialization of cudf.polars' IR. cc. @wence- @rjzamora # Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Matthew Murray (https://github.com/Matt711) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Lawrence Mitchell (https://github.com/wence-) URL: #17012
Move `cpp/benchmarks/string/url_decode.cu` implementation from google-bench to nvbench. This benchmark is for the `cudf::strings::url_decode` API. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Nghia Truong (https://github.com/ttnghia) URL: #17328
closes #17360 Technically I suppose this was more of an enhancement since the documentation suggested only a single label was supported, but I'll mark as a bug since the error message was not informative. Authors: - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17373
Moves the `cpp/benchmarks/string/repeat_strings.cpp` implementation from google-bench to nvbench. This covers the overloads of the `cudf::strings::repeat_strings` API. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #17304
#17250 started using `pynvml` but did not add the proper dependency, this change fixes the missing dependency. Authors: - Peter Andreas Entschev (https://github.com/pentschev) - https://github.com/jakirkham Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - https://github.com/jakirkham URL: #17386
This is likely the easiest fix for avoiding CI errors from this part of the code. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) URL: #17389
Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - David Wendt (https://github.com/davidwendt) - Matthew Roeschke (https://github.com/mroeschke) - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) URL: #17163
raydouglass
requested review from
jameslamb,
wence-,
galipremsagar,
mhaseeb123 and
vuule
November 21, 2024 20:50
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
github-actions
bot
added
libcudf
Affects libcudf (C++/CUDA) code.
Python
Affects Python cuDF API.
CMake
CMake build issue
Java
Affects Java cuDF API.
cudf.pandas
Issues specific to cudf.pandas
cudf.polars
Issues specific to cudf.polars
pylibcudf
Issues specific to the pylibcudf package
labels
Nov 21, 2024
ttnghia
approved these changes
Nov 21, 2024
mhaseeb123
approved these changes
Nov 21, 2024
This PR enables Unified memory as the default memory resource for `cudf_polars` --------- Co-authored-by: Vyas Ramasubramani <[email protected]> Co-authored-by: Vyas Ramasubramani <[email protected]> Co-authored-by: Matthew Murray <[email protected]> Co-authored-by: Lawrence Mitchell <[email protected]> Co-authored-by: Matthew Murray <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CMake
CMake build issue
cudf.pandas
Issues specific to cudf.pandas
cudf.polars
Issues specific to cudf.polars
Java
Affects Java cuDF API.
libcudf
Affects libcudf (C++/CUDA) code.
pylibcudf
Issues specific to the pylibcudf package
Python
Affects Python cuDF API.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-24.12
and v24.12 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-24.12
until release (merging of this PR).What is the purpose of this PR?
branch-24.12
intomain
for the release