We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refering to this doc (https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/lora.md) llama-7b with two loras runs successfully.
But when my lora model is larger than 2G, sending weights with input, error is
It seems like, a lora larger than 2G is not supported in this doc, right?
When I lauch tritonserver with lora_prefetch_dir in tensorrt_llm/config.pbtxt, run inflight_batcher_llm_client.py with only id, error is
tritonserver also says
How to solve it please, and can I load lora weights when launching tritonserver?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Refering to this doc (https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/lora.md) llama-7b with two loras runs successfully.
But when my lora model is larger than 2G, sending weights with input, error is
It seems like, a lora larger than 2G is not supported in this doc, right?
When I lauch tritonserver with lora_prefetch_dir in tensorrt_llm/config.pbtxt, run inflight_batcher_llm_client.py with only id, error is
tritonserver also says
How to solve it please, and can I load lora weights when launching tritonserver?
The text was updated successfully, but these errors were encountered: