-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need to update the version of llama.cpp #3305
Comments
Hi @wapleeeeee, Thanks for trying Tabby. May I know your interest in using the As for the update of llama cpp server, we will generally update it to the newer version with tabby, we are currently working on Please also notice that Tabby supports Model HTTP API, you can manually setup a llama cpp server or ollama server and connect to it by Model HTTP API, for more information, please refer to the doc https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/ |
Thanks so much for your reply! Actually, we are going to using Minicpm3-4B for our product. Before that, we should test the coding ability of Minicpm3-4B. I use the Model HTTP API successfully with vllm. Thanks for your advice. I tried to modify the ~/.tabby/config.toml but it seemed not work. Is there any way to solve that? Here's my request:
Here is my error:
|
I set the ~/.tabby/config.toml with
but the prompt_template seems not work. I check the request from vllm server found the request is:
The "suffix" param causes the 400 Bad Request. I recheck the document completion part but there is not any cases or Instructions. How can I solve this? |
Hi @wapleeeeee, it's great that Tabby can help! We have looked into the inference backend support, and found out that vLLM claims it's OpenAI compatible, but actually, it does not implement The OpenAI completion kind is marked as legacy from OpenAI and different services have their own implementation, maybe we have to look deeper into the implementation of OpenAI completion kind, and figure out a solution for it. I also noticed that you created a discussion about this, let's leave this issue to the update of llama.cpp and discuss the API support here #3323 |
Please describe the feature you want
I want to apply local model 【Minicpm3-4B】 for test. But the error appeared:
llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:110: <chat>: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'minicpm3'
The latest version of tabby only support llama.cpp @ 5ef07e2 whose last update is 2 months ago.
I've noticed there was a PR at llama.cpp last month: ggml-org/llama.cpp#9322.
I wonder if you can update the llama.cpp version at the next version of tabby.
The text was updated successfully, but these errors were encountered: