Cortex.cpp: Local Engines and Dependencies #1117
Replies: 10 comments
-
On my perspective, we should download CUDA toolkit separately. We support multiple engines: cortex.llamacpp and cortex.tensorrt-llm, both need CUDA toolkit to run. CUDA is backward compatible so we only need the latest CUDA toolkit version that supported by nvidia-driver version.
Edit: I just checked the cuda matrix compatibility and it is incorrect that CUDA is always backward compatible Related ticket: https://github.com/janhq/cortex/issues/1047 Edit 2: The above image is forward compatibility between cuda and nvidia-version
So yes, CUDA is backward compatible within a CUDA major release |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I'm referring this table to check for the compatibility between driver and toolkit |
Beta Was this translation helpful? Give feedback.
-
Can I verify my understanding of the issue: Decision
My initial thoughts
This will be disk-space inefficient. However, the alternative seems to be dependency hell, which I think is even worse. Folder Structure
That said, am open to all ideas, especially @vansangpfiev's |
Beta Was this translation helpful? Give feedback.
-
If disk-space inefficient is acceptable, I think we can go with option 1.
|
Beta Was this translation helpful? Give feedback.
-
Thanks @vansangpfiev and @dan-homebrew I'm confirming that we agree with:
Question 2: Storing CUDA dependencies under corresponding engines.
Caveats:
Additional thought |
Beta Was this translation helpful? Give feedback.
-
/.cortex
/deps
/cuda
cuda-11.5 or whatever versioning
/engines
/cortex.llamacpp
/bin
/cortex.tensorrt-llm
/bin
|
Beta Was this translation helpful? Give feedback.
-
@0xSage , here's my thought. Please correct me if I'm wrong @nguyenhoangthuan99 @vansangpfiev
|
Beta Was this translation helpful? Give feedback.
-
For 3, I think we can do the maintenance and updates by versioning: generate a file (for example version.txt) for each release, which has metadata for engine version and cuda version. We will update cuda dependencies if needed. |
Beta Was this translation helpful? Give feedback.
-
@vansangpfiev @namchuai @0xSage Quick responses: Per-Engine Dependencies
I also agree with @vansangpfiev: let's co-locate all CUDA dependencies with the engine folder. Simple > Complex, especially since model files are >4gb. Updating Engines
I also think we need to think through the CLI and API commands:
NamingI wonder whether it is better for us to have clearer naming for Cortex engines:
This articulates the concept of Cortex engines more clearly. Hopefully, with a clear API, the community can also step in to help build backends. We would need to reason through
|
Beta Was this translation helpful? Give feedback.
-
Motivation
Do we package the cuda toolkit to the engine?
Yes? Then will have to do the same for
llamacpp
,tensorrt-llm
andonnx
?No? Will download separatedly
Folder structures (e.g if user have llamacpp, tensorrt at the same time)?
Resources
Llamacpp release
Currently we are downloading toolkit dependency via
https://catalog.jan.ai/dist/cuda-dependencies/<version>/<platform>/cuda.tar.gz
cc @vansangpfiev @nguyenhoangthuan99 @dan-homebrew
Update sub-tasks:
Related
cortex engines
commands #1072Beta Was this translation helpful? Give feedback.
All reactions