-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward-merge branch-24.12 into branch-25.02 #17343
Commits on Nov 15, 2024
-
add telemetry setup to test (#16924)
This is a prototype implementation of rapidsai/build-infra#139 The work that this builds on: * rapidsai/gha-tools#118, which adds a shell wrapper that automatically creates spans for the commands that it wraps. It also uses the `opentelemetry-instrument` command to set up monkeypatching for supported Python libraries, if the command is python-based * https://github.com/rapidsai/shared-workflows/tree/add-telemetry, which installs the gha-tools work from above and sets necessary environment variables. This is only done for the conda-cpp-build.yaml shared workflow at the time of submitting this PR. The goal of this PR is to observe telemetry data sent from a GitHub Actions build triggered by this PR as a proof of concept. Once it all works, the remaining work is: * merge rapidsai/gha-tools#118 * Move the opentelemetry-related install stuff in https://github.com/rapidsai/shared-workflows/compare/add-telemetry?expand=1#diff-ca6188672785b5d214aaac2bf77ce0528a48481b2a16b35aeb78ea877b2567bcR118-R125 into https://github.com/rapidsai/ci-imgs, and rebuild ci-imgs * expand coverage to other shared workflows * Incorporate the changes from this PR to other jobs and to other repos Authors: - Mike Sarahan (https://github.com/msarahan) Approvers: - Bradley Dice (https://github.com/bdice) URL: #16924
Configuration menu - View commit details
-
Copy full SHA for 8664fad - Browse repository at this point
Copy the full SHA 8664fadView commit details -
Update cmake to 3.28.6 in JNI Dockerfile (#17342)
Updates cmake to 3.28.6 in the JNI Dockerfile used to build the cudf jar. This helps avoid a bug in older cmake where FindCUDAToolkit can fail to find cufile libraries. Authors: - Jason Lowe (https://github.com/jlowe) Approvers: - Nghia Truong (https://github.com/ttnghia) - Gera Shegalov (https://github.com/gerashegalov) URL: #17342
Configuration menu - View commit details
-
Copy full SHA for e683647 - Browse repository at this point
Copy the full SHA e683647View commit details
Commits on Nov 16, 2024
-
Use pylibcudf contiguous split APIs in cudf python (#17246)
Apart of #15162 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #17246
Configuration menu - View commit details
-
Copy full SHA for 9cc9071 - Browse repository at this point
Copy the full SHA 9cc9071View commit details
Commits on Nov 18, 2024
-
Move strings translate benchmarks to nvbench (#17325)
Moves `cpp/benchmarks/string/translate.cpp` implementation from google-bench to nvbench. This is benchmark for the `cudf::strings::translate` API. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #17325
Configuration menu - View commit details
-
Copy full SHA for e4de8e4 - Browse repository at this point
Copy the full SHA e4de8e4View commit details -
Move cudf._lib.unary to cudf.core._internals (#17318)
Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17318
Configuration menu - View commit details
-
Copy full SHA for aeb6a30 - Browse repository at this point
Copy the full SHA aeb6a30View commit details -
Reading multi-source compressed JSONL files (#17161)
Fixes #17068 Fixes #12299 This PR introduces a new datasource for compressed inputs which enables batching and byte range reading of multi-source JSONL files using the reallocate-and-retry policy. Moreover. instead of using a 4:1 compression ratio heuristic, the device buffer size is estimated accurately for GZIP, ZIP, and SNAPPY compression types. For remaining types, the files are first decompressed then batched. ~~TODO: Reuse existing JSON tests but with an additional compression parameter to verify correctness.~~ ~~Handled by #17219, which implements compressed JSON writer required for the above test.~~ Multi-source compressed input tests added! Authors: - Shruti Shivakumar (https://github.com/shrshi) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Karthikeyan (https://github.com/karthikeyann) URL: #17161
Configuration menu - View commit details
-
Copy full SHA for 03ac845 - Browse repository at this point
Copy the full SHA 03ac845View commit details -
Test the full matrix for polars and dask wheels on nightlies (#17320)
This PR ensures that we have nightly coverage of more of the CUDA/Python/arch versions that we claim to support for dask-cudf and cudf-polars wheels. In addition, this PR ensures that we do not attempt to run the dbgen executable in the Polars repository on systems with too old of a glibc to support running them. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #17320
Configuration menu - View commit details
-
Copy full SHA for d514517 - Browse repository at this point
Copy the full SHA d514517View commit details -
Fix reading Parquet string cols when
nrows
andinput_pass_limit
>…… 0 (#17321) This PR fixes reading string columns in Parquet using chunked parquet reader when `nrows` and `input_pass_limit` are > 0. Closes #17311 Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Ed Seidl (https://github.com/etseidl) - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) - https://github.com/nvdbaranec - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17321
Configuration menu - View commit details
-
Copy full SHA for 43f2f68 - Browse repository at this point
Copy the full SHA 43f2f68View commit details -
Remove cudf._lib.hash in favor of inlining pylibcudf (#17345)
Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17345
Configuration menu - View commit details
-
Copy full SHA for 18b40dc - Browse repository at this point
Copy the full SHA 18b40dcView commit details -
Remove cudf._lib.concat in favor of inlining pylibcudf (#17344)
Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17344
Configuration menu - View commit details
-
Copy full SHA for ba21673 - Browse repository at this point
Copy the full SHA ba21673View commit details -
Remove cudf._lib.quantiles in favor of inlining pylibcudf (#17347)
Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17347
Configuration menu - View commit details
-
Copy full SHA for 02c35bf - Browse repository at this point
Copy the full SHA 02c35bfView commit details -
Remove cudf._lib.labeling in favor of inlining pylibcudf (#17346)
Contributes to #17317 Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17346
Configuration menu - View commit details
-
Copy full SHA for 302e625 - Browse repository at this point
Copy the full SHA 302e625View commit details
Commits on Nov 19, 2024
-
1.13 was yanked for some reason, but 1.14 doesn't bring anything new and difficult. Authors: - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/brandon-b-miller - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17355
Configuration menu - View commit details
-
Copy full SHA for 5f9a97f - Browse repository at this point
Copy the full SHA 5f9a97fView commit details -
Writing compressed output using JSON writer (#17323)
Depends on #17161 for implementations of compression and decompression functions (`io/comp/comp.cu`, `io/comp/comp.hpp`, `io/comp/io_uncomp.hpp` and `io/comp/uncomp.cpp`) Adds support for writing GZIP- and SNAPPY-compressed JSON to the JSON writer. Verifies correctness using a parameterized test in `tests/io/json/json_writer.cpp` Authors: - Shruti Shivakumar (https://github.com/shrshi) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Karthikeyan (https://github.com/karthikeyann) - Vukasin Milovanovic (https://github.com/vuule) URL: #17323
Configuration menu - View commit details
-
Copy full SHA for 384abae - Browse repository at this point
Copy the full SHA 384abaeView commit details -
fix library-loading issues in editable installs (#17338)
Contributes to rapidsai/build-planning#118 The pattern introduced in #17316 breaks editable installs in devcontainers. In that type of build, `libcudf.so` is built outside of the wheel but **not installed**, so it can't be found by `ld`. Extension modules in `cudf` and `pylibcudf` are able to find it via RPATHs instead. This proposes: * try-catching the entire library-loading attempt, to silently do nothing in cases like that * ~adding imports of the `cudf` and `pylibcudf` libraries in the `devcontainers` CI job, as a smoke test to catch issues like this in the future~ *(edit: removed those, [`devcontainer` builds run on CPU nodes](https://github.com/rapidsai/shared-workflows/blob/4e84062f333ce5649bc65029d3979569e2d0a045/.github/workflows/build-in-devcontainer.yaml#L19))* ## Notes for Reviewers ### How I tested this Tested this approach on rapidsai/kvikio#553 # Authors: - James Lamb (https://github.com/jameslamb) - Matthew Murray (https://github.com/Matt711) Approvers: - Bradley Dice (https://github.com/bdice) - Matthew Murray (https://github.com/Matt711) URL: #17338
Configuration menu - View commit details
-
Copy full SHA for 9c5cd81 - Browse repository at this point
Copy the full SHA 9c5cd81View commit details -
Fix integer overflow in compiled binaryop (#17354)
For large columns, the computed stride might end up overflowing size_type. To fix this, use the grid_1d helper. See also #10368. - Closes #17353 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) - Tianyu Liu (https://github.com/kingcrimsontianyu) - Muhammad Haseeb (https://github.com/mhaseeb123) - Nghia Truong (https://github.com/ttnghia) URL: #17354
Configuration menu - View commit details
-
Copy full SHA for c7bfa77 - Browse repository at this point
Copy the full SHA c7bfa77View commit details -
Move strings replace benchmarks to nvbench (#17301)
Move `cpp/benchmark/string/replace.cpp` implementation from google-test to nvbench This covers strings replace APIs: - `cudf::strings::replace` scalar version - `cudf::strings::replace_multiple` column version - `cudf::strings::replace_slice` Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Shruti Shivakumar (https://github.com/shrshi) URL: #17301
Configuration menu - View commit details
-
Copy full SHA for 03c055f - Browse repository at this point
Copy the full SHA 03c055fView commit details -
Optimize distinct inner join to use set
find
instead ofretrieve
(#……17278) This PR introduces a minor optimization for distinct inner joins by using the `find` results to selectively copy matches to the output. This approach eliminates the need for the costly `retrieve` operation, which relies on expensive atomic operations. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) URL: #17278
Configuration menu - View commit details
-
Copy full SHA for 56061bd - Browse repository at this point
Copy the full SHA 56061bdView commit details
Commits on Nov 20, 2024
-
Add compute_column_expression to pylibcudf for transform.compute_colu…
…mn (#17279) Follow up to #16760 `transform.compute_column` (backing `.eval`) requires an `Expression` object created by a private routine in cudf Python. Since this routine will be needed for any user of the public `transform.compute_column`, moving it to pylibcudf. Authors: - Matthew Roeschke (https://github.com/mroeschke) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: #17279
Configuration menu - View commit details
-
Copy full SHA for 7158ee0 - Browse repository at this point
Copy the full SHA 7158ee0View commit details -
Bug fix: restrict lines=True to JSON format in Kafka read_gdf method (#…
…17333) This pull request modifies the read_gdf method in kafka.py to pass the lines parameter only when the message_format is "json". This prevents lines from being passed to other formats (e.g., CSV, Avro, ORC, Parquet), which do not support this parameter. Authors: - Hirota Akio (https://github.com/a-hirota) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #17333
Configuration menu - View commit details
-
Copy full SHA for 05365af - Browse repository at this point
Copy the full SHA 05365afView commit details -
Adapt to KvikIO API change in the compatibility mode (#17377)
This PR adapts cuDF to a breaking API change in KvikIO (rapidsai/kvikio#547) introduced recently, which adds the `AUTO` compatibility mode to file I/O. This PR causes no behavioral changes in cuDF: If the environment variable `KVIKIO_COMPAT_MODE` is left unset, cuDF by default still enables the compatibility mode in KvikIO. This is the same with the previous behavior (#17185). Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Vukasin Milovanovic (https://github.com/vuule) URL: #17377
Configuration menu - View commit details
-
Copy full SHA for 6f83b58 - Browse repository at this point
Copy the full SHA 6f83b58View commit details -
Benchmarking JSON reader for compressed inputs (#17219)
Depends on #17161 for implementations of compression and decompression functions (`io/comp/comp.cu`, `io/comp/comp.hpp`, `io/comp/io_uncomp.hpp` and `io/comp/uncomp.cpp`)\ Depends on #17323 for compressed JSON writer implementation. Adds benchmark to measure performance of the JSON reader for compressed inputs. Authors: - Shruti Shivakumar (https://github.com/shrshi) - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - MithunR (https://github.com/mythrocks) - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: #17219
Configuration menu - View commit details
-
Copy full SHA for fc08fe8 - Browse repository at this point
Copy the full SHA fc08fe8View commit details -
Deselect failing polars tests (#17362)
Deselect `test_join_4_columns_with_validity` which is failing in nightly CI tests and is reproducible in some systems (xref pola-rs/polars#19870), but apparently not all. Deselect `test_read_web_file` as well that fails on rockylinux8 due to SSL CA issues. Authors: - Peter Andreas Entschev (https://github.com/pentschev) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #17362
Configuration menu - View commit details
-
Copy full SHA for a2a62a1 - Browse repository at this point
Copy the full SHA a2a62a1View commit details -
Add new
dask_cudf.read_parquet
API (#17250)It's time to clean up the `dask_cudf.read_parquet` API and prioritize GPU-specific optimizations. To this end, it makes sense to expose our own `read_parquet` API within Dask cuDF. **Notes**: - The "new" `dask_cudf.read_parquet` API is only relevant when query-planning is enabled (the default). - Using `filesystem="arrow"` now uses `cudf.read_parquet` when reading from local storage (rather than PyArrow). - (specific to Dask cuDF): The default `blocksize` argument is now specific to the "smallest" NVIDIA device detected within the active dask cluster (or the first device visible to the the client). More specifically, we use `pynvml` to find this representative device size, and we set `blocksize` to be 1/32 this size. - The user may also pass in something like `blocksize=0.125` to use `1/8` the minimum device size (or `blocksize='1GiB'` to bypass the default logic altogether). - (specific to Dask cuDF): When `blocksize` is `None`, we disable partition fusion at optimization time. - (specific to Dask cuDF): When `blocksize` is **not** `None`, we use the parquet metadata from the first few files to inform partition fusion at optimization time (instead of a rough column-count ratio). Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) - Vyas Ramasubramani (https://github.com/vyasr) - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Lawrence Mitchell (https://github.com/wence-) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #17250
Configuration menu - View commit details
-
Copy full SHA for 3111aa4 - Browse repository at this point
Copy the full SHA 3111aa4View commit details