Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce libcuvs wheels #594

Merged
merged 19 commits into from
Jan 23, 2025
Merged

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Jan 21, 2025

Contributes to rapidsai/build-planning#33.

Proposes packaging libcuvs as a wheel, which is then re-used by cuvs-cu{11,12}.

Similar changes were recently made in RAFT: rapidsai/raft#2531

As part of this, also proposes:

Notes for Reviewers

Benefits of these changes

Wheel contents

libcuvs:

  • libcuvs.so and libcuvs_c.so (shared library)
  • cuVS headers
  • vendored dependencies (hnswlib)

cuvs:

  • cuvs Python / Cython code and compiled Cython extensions

Size changes (CUDA 12, Python 3.12, x86_64)

wheel num files (before) num files (this PR) size (before) size (this PR)
libcuvs --- 67 --- 843M
cuvs 88 84 845M 2M
TOTAL 88 131 845M 845M

NOTES: size = compressed, "before" = 2025-01-22 nightlies

how I calculated those (click me)
docker run \
    --rm \
    --network host \
    --env RAPIDS_NIGHTLY_DATE=2025-01-22 \
    --env CUVS_NIGHTLY_SHA=f1de1b2 \
    --env CUVS_PR="pull-request/594" \
    --env CUVS_PR_SHA="97c56178cd0e07e4b6b138bb0904af78379f1bb3" \
    --env RAPIDS_PY_CUDA_SUFFIX=cu12 \
    --env WHEEL_DIR_BEFORE=/tmp/wheels-before \
    --env WHEEL_DIR_AFTER=/tmp/wheels-after \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.12 \
    bash

# --- nightly wheels --- #
mkdir -p ./wheels-before

export RAPIDS_BUILD_TYPE=branch
export RAPIDS_REF_NAME="branch-25.02"

# cuvs
RAPIDS_PY_WHEEL_NAME="cuvs_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuvs \
RAPIDS_SHA=${CUVS_NIGHTLY_SHA} \
    rapids-download-wheels-from-s3 python ./wheels-before

# --- wheels from CI --- #
mkdir -p ./wheels-after

export RAPIDS_BUILD_TYPE="pull-request"

# libcuvs
RAPIDS_PY_WHEEL_NAME="libcuvs_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuvs \
RAPIDS_REF_NAME="${CUVS_PR}" \
RAPIDS_SHA="${CUVS_PR_SHA}" \
    rapids-download-wheels-from-s3 cpp ./wheels-after

# cuvs
RAPIDS_PY_WHEEL_NAME="cuvs_${RAPIDS_PY_CUDA_SUFFIX}" \
RAPIDS_REPOSITORY=rapidsai/cuvs \
RAPIDS_REF_NAME="${CUVS_PR}" \
RAPIDS_SHA="${CUVS_PR_SHA}" \
    rapids-download-wheels-from-s3 python ./wheels-after

pip install pydistcheck
pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-before/*.whl \
| grep -E '^checking|files: | compressed' \
> ./before.txt

# get more exact sizes
du -sh ./wheels-before/*

pydistcheck \
    --inspect \
    --select 'distro-too-large-compressed' \
    ./wheels-after/*.whl \
| grep -E '^checking|files: | compressed' \
> ./after.txt

# get more exact sizes
du -sh ./wheels-after/*

How I tested this

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change DO NOT MERGE labels Jan 21, 2025
Copy link

copy-pr-bot bot commented Jan 21, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

.github/workflows/build.yaml Outdated Show resolved Hide resolved
ci/check_style.sh Show resolved Hide resolved
ci/release/update-version.sh Show resolved Hide resolved
python/libcuvs/pyproject.toml Show resolved Hide resolved
endif()

# --- RAFT ---#
set(CUVS_USE_RAFT_STATIC OFF)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the wheel-test jobs (CUDA 11.8, arm64) is failing, with a few cases like this:

FAILED python/cuvs/cuvs/test/test_ivf_pq.py::test_ivf_pq_search_params[params2] - cuvs.common.exceptions.CuvsException: cuBLAS error encountered at: file=/pyenv/versions/3.10.16/lib/python3.10/site-packages/libraft/include/raft/linalg/detail/cublaslt_wrappers.hpp line=261: call='cublasLtMatmul(resource::get_cublaslt_handle(res), mm_desc->desc, alpha, a_ptr, mm_desc->a, b_ptr, mm_desc->b, beta, c_ptr, mm_desc->c, c_ptr, mm_desc->c, &(mm_desc->heuristics.algo), nullptr, 0, stream)', Reason=13:CUBLAS_STATUS_EXECUTION_FAILED
Obtained 49 stack frames
#1 in /pyenv/versions/3.12.8/lib/python3.12/site-packages/libcuvs/lib64/libcuvs.so: raft::cublas_error::cublas_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0xb0 [0xfffdca6a5360]
#2 in /pyenv/versions/3.12.8/lib/python3.12/site-packages/libcuvs/lib64/libcuvs.so: void raft::linalg::detail::legacy_matmul<false, float, float, float, float>(raft::resources const&, bool, bool, unsigned long, unsigned long, unsigned long, float const*, float const*, unsigned long, float const*, unsigned long, float const*, float*, unsigned long, CUstream_st*) +0x6c0 [0xfffdca6c4900]

(build link)

All the failures are in one set of test cases:

FAILED python/cuvs/cuvs/test/test_ivf_pq.py

I don't think those are related to this PR... I see them on other PRs like #593 (which only remove whitespace):
https://github.com/rapidsai/cuvs/actions/runs/12875298461/job/35896833763?pr=593

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed by rapidsai/raft#2548... will leave this thread open until I confirm that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to see CUDA 11 wheel tests actually pass in CI on this PR before resolving this thread... but that requires merging #601 which requires merging #596.

SKBUILD_EXTRA_CMAKE_ARGS="${EXTRA_CMAKE_ARGS}"
if [[ "${EXTRA_CMAKE_ARGS}" != *"DFIND_CUVS_CPP"* ]]; then
SKBUILD_EXTRA_CMAKE_ARGS="${SKBUILD_EXTRA_CMAKE_ARGS};-DFIND_CUVS_CPP=ON"
fi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just picking a mostly-arbitrary place on the diff to start this thread... the rust build is failing here.

  warning: argument unused during compilation: '-L/opt/conda/envs/rust/targets/x86_64-linux/lib' [-Wunused-command-line-argument]
  warning: argument unused during compilation: '-L/opt/conda/envs/rust/targets/x86_64-linux/lib/stubs' [-Wunused-command-line-argument]
  /usr/include/limits.h:124:16: fatal error: 'limits.h' file not found
  clang diag: warning: argument unused during compilation: '-L/opt/conda/envs/rust/targets/x86_64-linux/lib' [-Wunused-command-line-argument]
  clang diag: warning: argument unused during compilation: '-L/opt/conda/envs/rust/targets/x86_64-linux/lib/stubs' [-Wunused-command-line-argument]
  thread 'main' panicked at cuvs-sys/build.rs:97:10:
  Unable to generate cagra_c bindings: ClangDiagnostic("/usr/include/limits.h:124:16: fatal error: 'limits.h' file not found\n")

(build link)

I strongly suspect it's related to this PR, because I've seen it on multiple runs and because that build is succeeding on branch builds and other PRs. For example, it just passed 6 minutes ago on #596: https://github.com/rapidsai/cuvs/actions/runs/12895689316/job/35960708413?pr=596

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to reproduce this locally.

docker run \
    --gpus 1 \
    --rm \
    --env RAPIDS_BUILD_TYPE=pull-request \
    --env RAPIDS_REF_NAME=pull-request/594 \
    --env RAPIDS_REPOSITORY=rapidsai/cuvs \
    --env RAPIDS_SHA=97c56178cd0e07e4b6b138bb0904af78379f1bb3 \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/ci-conda \
    bash

ci/build_rust.sh

Working on a fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it! Looks like having cython in the rapids_build group in dependencies.yaml was load-bearing 😅

With it in that group, the build environment for rust builds gets clang-19. Without it, it gets clang==8.0.0.

image

I'll update the pin here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed 86648b9 updating this.

Normally I'd say that should be it's own PR, but since we're so CI-constrained right now and since this change only matters for the benefit of this PR, I think we should just include it in the scope here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like clang is needed to build docs here too

Unable to find libclang: "couldn't find any valid shared libraries matching: ['libclang.so', 'libclang-.so', 'libclang.so.', 'libclang-.so.'], set the LIBCLANG_PATH environment variable to a path where one of these files can be found (invalid: [])"

(failed docs build link)

, because the Rust stuff is rebuilt in docs builds

cuvs/ci/build_docs.sh

Lines 44 to 48 in 1c91e1f

rapids-logger "Build Rust docs"
pushd rust
export LIBCLANG_PATH=$(dirname $(find /opt/conda -name libclang.so | head -n 1))
cargo doc -p cuvs --no-deps
popd

Added it in baae0a7

.github/workflows/pr.yaml Outdated Show resolved Hide resolved
@jameslamb jameslamb changed the title WIP: [DO NOT MERGE] introduce libcuvs wheels introduce libcuvs wheels Jan 22, 2025
@jameslamb jameslamb requested a review from bdice January 22, 2025 19:38
@jameslamb jameslamb marked this pull request as ready for review January 22, 2025 19:38
@jameslamb jameslamb requested review from a team as code owners January 22, 2025 19:38
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks fine to me. One nit (an extra blank line), please just wrap that into your next commit that reverts the commented C++ tests.

python/libcuvs/pyproject.toml Outdated Show resolved Hide resolved
bdice added a commit to rapidsai/devcontainers that referenced this pull request Jan 22, 2025
Contributes to rapidsai/build-planning#33

Adjusts `rapids-build-utils` manifest for release 25.02 to account for
the introduction of new `libcuvs` wheels
(rapidsai/cuvs#594).

## Notes for Reviewers

This shouldn't be merged still pointing at my forks. Plan:

1. see CI pass here
2. see all CI except devcontainers pass on
rapidsai/cuvs#594
3. point this PR back at upstream and admin-merge it
4. re-run devcontainers CI on rapidsai/cuvs#594
and see it pass

---------

Co-authored-by: Bradley Dice <[email protected]>
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final touches look good. Re-approving.

@bdice
Copy link
Contributor

bdice commented Jan 23, 2025

I will pre-emptively merge this. All CI has passed except for one C++ job which we expect to take a while (see #602 for a proposed improvement to runtime).

@bdice
Copy link
Contributor

bdice commented Jan 23, 2025

/merge

@rapids-bot rapids-bot bot merged commit b62b11a into rapidsai:branch-25.02 Jan 23, 2025
62 checks passed
jameslamb added a commit to rapidsai/rapids-metadata that referenced this pull request Jan 24, 2025
Proposes updating the metadata here to account for new wheels added in
the 25.02 release cycle.

This shouldn't be merged until all of these are:

* [x] `libcugraph` (rapidsai/cugraph#4804)
* [x] `libcuml`  (rapidsai/cuml#6199 )
* [x] `libcuvs` (rapidsai/cuvs#594)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CMake cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants