-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce libcuvs wheels #594
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
endif() | ||
|
||
# --- RAFT ---# | ||
set(CUVS_USE_RAFT_STATIC OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the wheel-test
jobs (CUDA 11.8, arm64) is failing, with a few cases like this:
FAILED python/cuvs/cuvs/test/test_ivf_pq.py::test_ivf_pq_search_params[params2] - cuvs.common.exceptions.CuvsException: cuBLAS error encountered at: file=/pyenv/versions/3.10.16/lib/python3.10/site-packages/libraft/include/raft/linalg/detail/cublaslt_wrappers.hpp line=261: call='cublasLtMatmul(resource::get_cublaslt_handle(res), mm_desc->desc, alpha, a_ptr, mm_desc->a, b_ptr, mm_desc->b, beta, c_ptr, mm_desc->c, c_ptr, mm_desc->c, &(mm_desc->heuristics.algo), nullptr, 0, stream)', Reason=13:CUBLAS_STATUS_EXECUTION_FAILED
Obtained 49 stack frames
#1 in /pyenv/versions/3.12.8/lib/python3.12/site-packages/libcuvs/lib64/libcuvs.so: raft::cublas_error::cublas_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0xb0 [0xfffdca6a5360]
#2 in /pyenv/versions/3.12.8/lib/python3.12/site-packages/libcuvs/lib64/libcuvs.so: void raft::linalg::detail::legacy_matmul<false, float, float, float, float>(raft::resources const&, bool, bool, unsigned long, unsigned long, unsigned long, float const*, float const*, unsigned long, float const*, unsigned long, float const*, float*, unsigned long, CUstream_st*) +0x6c0 [0xfffdca6c4900]
All the failures are in one set of test cases:
FAILED python/cuvs/cuvs/test/test_ivf_pq.py
I don't think those are related to this PR... I see them on other PRs like #593 (which only remove whitespace):
https://github.com/rapidsai/cuvs/actions/runs/12875298461/job/35896833763?pr=593
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed by rapidsai/raft#2548... will leave this thread open until I confirm that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SKBUILD_EXTRA_CMAKE_ARGS="${EXTRA_CMAKE_ARGS}" | ||
if [[ "${EXTRA_CMAKE_ARGS}" != *"DFIND_CUVS_CPP"* ]]; then | ||
SKBUILD_EXTRA_CMAKE_ARGS="${SKBUILD_EXTRA_CMAKE_ARGS};-DFIND_CUVS_CPP=ON" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just picking a mostly-arbitrary place on the diff to start this thread... the rust
build is failing here.
warning: argument unused during compilation: '-L/opt/conda/envs/rust/targets/x86_64-linux/lib' [-Wunused-command-line-argument]
warning: argument unused during compilation: '-L/opt/conda/envs/rust/targets/x86_64-linux/lib/stubs' [-Wunused-command-line-argument]
/usr/include/limits.h:124:16: fatal error: 'limits.h' file not found
clang diag: warning: argument unused during compilation: '-L/opt/conda/envs/rust/targets/x86_64-linux/lib' [-Wunused-command-line-argument]
clang diag: warning: argument unused during compilation: '-L/opt/conda/envs/rust/targets/x86_64-linux/lib/stubs' [-Wunused-command-line-argument]
thread 'main' panicked at cuvs-sys/build.rs:97:10:
Unable to generate cagra_c bindings: ClangDiagnostic("/usr/include/limits.h:124:16: fatal error: 'limits.h' file not found\n")
I strongly suspect it's related to this PR, because I've seen it on multiple runs and because that build is succeeding on branch builds and other PRs. For example, it just passed 6 minutes ago on #596: https://github.com/rapidsai/cuvs/actions/runs/12895689316/job/35960708413?pr=596
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm able to reproduce this locally.
docker run \
--gpus 1 \
--rm \
--env RAPIDS_BUILD_TYPE=pull-request \
--env RAPIDS_REF_NAME=pull-request/594 \
--env RAPIDS_REPOSITORY=rapidsai/cuvs \
--env RAPIDS_SHA=97c56178cd0e07e4b6b138bb0904af78379f1bb3 \
-v $(pwd):/opt/work \
-w /opt/work \
-it rapidsai/ci-conda \
bash
ci/build_rust.sh
Working on a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed 86648b9 updating this.
Normally I'd say that should be it's own PR, but since we're so CI-constrained right now and since this change only matters for the benefit of this PR, I think we should just include it in the scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like clang
is needed to build docs here too
Unable to find libclang: "couldn't find any valid shared libraries matching: ['libclang.so', 'libclang-.so', 'libclang.so.', 'libclang-.so.'], set the
LIBCLANG_PATH
environment variable to a path where one of these files can be found (invalid: [])"
, because the Rust stuff is rebuilt in docs builds
Lines 44 to 48 in 1c91e1f
rapids-logger "Build Rust docs" | |
pushd rust | |
export LIBCLANG_PATH=$(dirname $(find /opt/conda -name libclang.so | head -n 1)) | |
cargo doc -p cuvs --no-deps | |
popd |
Added it in baae0a7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks fine to me. One nit (an extra blank line), please just wrap that into your next commit that reverts the commented C++ tests.
Contributes to rapidsai/build-planning#33 Adjusts `rapids-build-utils` manifest for release 25.02 to account for the introduction of new `libcuvs` wheels (rapidsai/cuvs#594). ## Notes for Reviewers This shouldn't be merged still pointing at my forks. Plan: 1. see CI pass here 2. see all CI except devcontainers pass on rapidsai/cuvs#594 3. point this PR back at upstream and admin-merge it 4. re-run devcontainers CI on rapidsai/cuvs#594 and see it pass --------- Co-authored-by: Bradley Dice <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final touches look good. Re-approving.
I will pre-emptively merge this. All CI has passed except for one C++ job which we expect to take a while (see #602 for a proposed improvement to runtime). |
/merge |
Proposes updating the metadata here to account for new wheels added in the 25.02 release cycle. This shouldn't be merged until all of these are: * [x] `libcugraph` (rapidsai/cugraph#4804) * [x] `libcuml` (rapidsai/cuml#6199 ) * [x] `libcuvs` (rapidsai/cuvs#594)
Contributes to rapidsai/build-planning#33.
Proposes packaging
libcuvs
as a wheel, which is then re-used bycuvs-cu{11,12}
.Similar changes were recently made in RAFT: rapidsai/raft#2531
As part of this, also proposes:
CUVS_COMPILE_DYNAMIC_ONLY
, to allow building/installing only the dynamic shared library (i.e. skipping the static library)rapids-cmake
's preferred CMake style (similar introduce libraft wheels raft#2531 (comment))clang
pins across the project, and pinning toclang
16 for Rust builds (introduce libcuvs wheels #594 (comment))Notes for Reviewers
Benefits of these changes
libcuvs
instead of statically linking in the pieces they need)Wheel contents
libcuvs
:libcuvs.so
andlibcuvs_c.so
(shared library)cuvs
:cuvs
Python / Cython code and compiled Cython extensionsSize changes (CUDA 12, Python 3.12, x86_64)
libcuvs
cuvs
NOTES: size = compressed, "before" = 2025-01-22 nightlies
how I calculated those (click me)
How I tested this