Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency changes #408

Merged
merged 1 commit into from
Nov 5, 2024
Merged

Consistency changes #408

merged 1 commit into from
Nov 5, 2024

Conversation

ericcurtin
Copy link
Collaborator

Ensure llama.cpp version is the same accross containers. Removing some duplicate actions, etc.

@ericcurtin
Copy link
Collaborator Author

PTAL @bmahabirbu I didn't test the cuda changes FWIW, don't have the hardware

@bmahabirbu
Copy link
Collaborator

bmahabirbu commented Nov 4, 2024

I tested it and it crashed before loading the model with a segmentation fault. I originally thought it had something to do with the linker not updating the correct libggml.so file but its just actually do to the second cmakle --install line for whisper. It renames some llama stuff despite the mv commands causing issues

mv build/bin/main /usr/bin/whisper-main && mv build/bin/server /usr/bin/whisper-server && \
if [ -f build/lib/libwhisper.so ]; then mv build/lib/libwhisper.so /usr/lib/; fi && \
cmake --install build && \
mv build/bin/main ${INSTALL_PREFIX}/bin/whisper-main && \
Copy link
Collaborator

@bmahabirbu bmahabirbu Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 34

cmake --install build && \

Comment this out and we're good to go

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line installs the libwhisper stuff:

$ ls /usr/lib64/libw*
/usr/lib64/libwhisper.so  /usr/lib64/libwhisper.so.1  /usr/lib64/libwhisper.so.1.7.1

@ericcurtin ericcurtin force-pushed the container-consistency branch from b1a89f7 to db9501e Compare November 4, 2024 21:43
@@ -46,16 +45,14 @@ ARG HUGGINGFACE_HUB_VERSION=0.26.2
ARG OMLMD_VERSION=0.1.6

# Install minimal runtime dependencies
RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \
dnf install -y python3 python3-pip && dnf clean all && rm -rf /var/cache/*dnf*
RUN dnf install -y python3 python3-pip nvidia-driver-cuda-libs && \
Copy link
Collaborator Author

@ericcurtin ericcurtin Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this (nvidia-driver-cuda-libs) for this lib:

/lib64/libcuda.so.1

$ ldd /usr/bin/llama-cli
	linux-vdso.so.1 (0x00007f219dc64000)
	libllama.so => /lib64/libllama.so (0x00007f219dae4000)
	libggml.so => /lib64/libggml.so (0x00007f2188000000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f2187c00000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f219da09000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f219d9ee000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f2187800000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f219dc66000)
	libcudart.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12 (0x00007f2187400000)
	libcublas.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.12 (0x00007f2180a00000)
	libcublasLt.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.12 (0x00007f215f000000)
	libcuda.so.1 => /lib64/libcuda.so.1 (0x00007f215c000000)
	libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f219d9a5000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f219d99e000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f219d999000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f219d994000)

which makes me wonder, how was this working before without explicitly installing that.

Copy link
Collaborator

@bmahabirbu bmahabirbu Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its due to cuda container toolkit passing the library from the nvidia driver on the host to the container. If you install the driver inside the container you'll get an error like this

Error: OCI runtime error: error executing hook `/usr/bin/nvidia-container-toolkit`

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove nvidia-driver-cuda-libs? Is it breaking nvidia-container-toolkit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder can we CI some of this stuff even without Nvidia hardware in the CI runners

@ericcurtin
Copy link
Collaborator Author

I tested it and it crashed before loading the model with a segmentation fault. I originally thought it had something to do with the linker not updating the correct libggml.so file but its just actually do to the second cmakle --install line for whisper. It renames some llama stuff despite the mv commands causing issues

Yes I see what you mean, we have two different versions of /usr/lib64/libggml.so clashing

@ericcurtin
Copy link
Collaborator Author

We must somehow align them to the same version...

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Nov 4, 2024

We have at least three options to solve this I can think of, 1. split llama.cpp and whisper.cpp into two containerfiles

Or...

  1. Ensure at build-time of the container images that llama.cpp and whisper.cpp versions are ABI compatible...

Or...

  1. Statically link libggml.so into the whisper binaries

@ericcurtin ericcurtin force-pushed the container-consistency branch from db9501e to 85ccafe Compare November 4, 2024 22:28
@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Nov 4, 2024

I fixed it by building whisper with -DBUILD_SHARED_LIBS=NO , so using static linking for whisper.cpp (but not llama.cpp) seems like an ok option for now.

@bmahabirbu
Copy link
Collaborator

I fixed it by building whisper with -DBUILD_SHARED_LIBS=NO , so using static linking for whisper.cpp (but not llama.cpp) seems like an ok option for now.

Nice, I'll give it a test run as well

@bmahabirbu
Copy link
Collaborator

I fixed it by building whisper with -DBUILD_SHARED_LIBS=NO , so using static linking for whisper.cpp (but not llama.cpp) seems like an ok option for now.

Nice, I'll give it a test run as well

Confirmed working! (Had to remove the dnf install nvidia-driver-cuda-libs but that's just due to me setting my environment up with cuda container toolkit)

@ericcurtin
Copy link
Collaborator Author

I fixed it by building whisper with -DBUILD_SHARED_LIBS=NO , so using static linking for whisper.cpp (but not llama.cpp) seems like an ok option for now.

Nice, I'll give it a test run as well

Confirmed working! (Had to remove the dnf install nvidia-driver-cuda-libs but that's just due to me setting my environment up with cuda container toolkit)

Well lets remove it if that's the case, makes me wonder if somebody could run this without the nvidia-container-toolkit... But anyway don't want to break a working setup

@ericcurtin ericcurtin force-pushed the container-consistency branch from 85ccafe to 6a48e33 Compare November 4, 2024 22:49
@bmahabirbu
Copy link
Collaborator

I fixed it by building whisper with -DBUILD_SHARED_LIBS=NO , so using static linking for whisper.cpp (but not llama.cpp) seems like an ok option for now.

Nice, I'll give it a test run as well

Confirmed working! (Had to remove the dnf install nvidia-driver-cuda-libs but that's just due to me setting my environment up with cuda container toolkit)

Well lets remove it if that's the case, makes me wonder if somebody could run this without the nvidia-container-toolkit... But anyway don't want to break a working setup

Sounds good to me! I'll investigate some more to see if there is a way to run it without the container toolkit

@ericcurtin ericcurtin force-pushed the container-consistency branch 7 times, most recently from f897510 to e74c28e Compare November 5, 2024 01:36
Ensure llama.cpp version is the same accross containers. Removing some duplicate
actions, etc.

Signed-off-by: Eric Curtin <[email protected]>
@ericcurtin ericcurtin force-pushed the container-consistency branch from e74c28e to 7d657f8 Compare November 5, 2024 03:48
@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Nov 5, 2024

PTAL @rhatdan this is ready. A lot of this PR is bringing consistency to build and install approaches in the various container images.

One thing this fixes in llama.cpp and whisper.cpp is that they use different versions of libggml and we were getting mismatching ABI's sometimes if in a function call was in one but not the other. The second installed version of libggml would overwrite the first.

We fixes this by using statically linking in libggml in the two whisper.cpp binaries so llama.cpp can use different versions of libggml like upstream llama.cpp/whisper.cpp intends.

@rhatdan rhatdan merged commit 0048ee3 into main Nov 5, 2024
12 checks passed
@ericcurtin ericcurtin deleted the container-consistency branch November 5, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants