-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eventually exchange ollama with vllm and use https://www.nvidia.com/en-us/ai/ for Nvidia #21
Comments
vLLM testing https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/vllm Run vLLM on jetson via terminal: OR
-> uses image: https://hub.docker.com/r/dustynv/vllm Serve model in container: -> Models are fetched from HF model hub If you want to run llama3.2 you first need create an HF account, request access to meta models and use an api key inside of the vLLM container because otherwise the model cannot be downloaded.
(will reduce utilization: dusty-nv/jetson-containers#704) I tried multiple models to serve llama3.2 1B, qwen2.5 0.5B, deepseek and I was not able to run one of these on the hw. Either the model is to big to run or exceptions are thrown. I see there is also an overhead of memory allocation in vLLM: dusty-nv/jetson-containers#795 . Here the exception:
|
@jbaumgartl this is the issue with ollama: dusty-nv/jetson-containers#814 (comment) Ollama is broken on all platforms, I added some comments to the issue how you can run the service and models. |
https://github.com/dusty-nv/jetson-containers/blob/master/packages/llm/vllm/test.py
The text was updated successfully, but these errors were encountered: