Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to update the version of llama.cpp #3305

Closed
wapleeeeee opened this issue Oct 23, 2024 · 4 comments · Fixed by #3347
Closed

Need to update the version of llama.cpp #3305

wapleeeeee opened this issue Oct 23, 2024 · 4 comments · Fixed by #3347
Labels

Comments

@wapleeeeee
Copy link

Please describe the feature you want

I want to apply local model 【Minicpm3-4B】 for test. But the error appeared:
llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:110: <chat>: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'minicpm3'

The latest version of tabby only support llama.cpp @ 5ef07e2 whose last update is 2 months ago.

I've noticed there was a PR at llama.cpp last month: ggml-org/llama.cpp#9322.

I wonder if you can update the llama.cpp version at the next version of tabby.

@wapleeeeee wapleeeeee added the enhancement New feature or request label Oct 23, 2024
@zwpaper
Copy link
Member

zwpaper commented Oct 24, 2024

Hi @wapleeeeee, Thanks for trying Tabby.

May I know your interest in using the Minicpm3-4B with Tabby?

As for the update of llama cpp server, we will generally update it to the newer version with tabby, we are currently working on v0.19.0 release, and I believe we can handle this update in the next release if necessary.

Please also notice that Tabby supports Model HTTP API, you can manually setup a llama cpp server or ollama server and connect to it by Model HTTP API, for more information, please refer to the doc https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/

@wapleeeeee
Copy link
Author

Thanks so much for your reply!

Actually, we are going to using Minicpm3-4B for our product. Before that, we should test the coding ability of Minicpm3-4B.
We found tabby is a great tool which can both help us coding and test for the potential risk.

I use the Model HTTP API successfully with vllm. Thanks for your advice.
But now there's a situation, vllm can't accept {"input": {"prefix": xx, "suffix": xx}} format for 'v1/completion' ('v1/chat/completion' do work).

I tried to modify the ~/.tabby/config.toml but it seemed not work. Is there any way to solve that?

Here's my request:

curl -X 'POST' -H "Authorization:Bearer token-abc123"\
  'http://localhost:8015/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "language": "python",
  "segments": {
    "prefix": "def fib(n):\n    ",
    "suffix": "\n        return fib(n - 1) + fib(n - 2)"
  }
}'

Here is my error:

{"object":"error","message":"[{'type': 'missing', 'loc': ('body', 'model'), 'msg': 'Field required', 'input': {'language': 'python', 'segments': {'prefix': 'def fib(n):\\n    ', 'suffix': '\\n        return fib(n - 1) + fib(n - 2)'}}}, {'type': 'missing', 'loc': ('body', 'prompt'), 'msg': 'Field required', 'input': {'language': 'python', 'segments': {'prefix': 'def fib(n):\\n    ', 'suffix': '\\n        return fib(n - 1) + fib(n - 2)'}}}, {'type': 'extra_forbidden', 'loc': ('body', 'language'), 'msg': 'Extra inputs are not permitted', 'input': 'python'}, {'type': 'extra_forbidden', 'loc': ('body', 'segments'), 'msg': 'Extra inputs are not permitted', 'input': {'prefix': 'def fib(n):\\n    ', 'suffix': '\\n        return fib(n - 1) + fib(n - 2)'}}]","type":"BadRequestError","param":null,"code":400}

@wapleeeeee
Copy link
Author

I set the ~/.tabby/config.toml with

[model.completion.http]
kind = "openai/completion"
model_name = "minicpm3-4b"
api_endpoint = "http://localhost:8015/v1"
api_key = "xxx"
max_tokens = 256
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>"

[model.chat.http]

but the prompt_template seems not work.

I check the request from vllm server found the request is:

b'{"model":"minicpm3-4b","prompt":"def fibonacci(n):\\n    if n >=","suffix":" 1:","max_tokens":64,"temperature":0.1,"stream":true,"presence_penalty":0.0}'

The "suffix" param causes the 400 Bad Request.

I recheck the document completion part but there is not any cases or Instructions.

How can I solve this?

@zwpaper
Copy link
Member

zwpaper commented Oct 25, 2024

Hi @wapleeeeee, it's great that Tabby can help!

We have looked into the inference backend support, and found out that vLLM claims it's OpenAI compatible, but actually, it does not implement suffix field support.

The OpenAI completion kind is marked as legacy from OpenAI and different services have their own implementation, maybe we have to look deeper into the implementation of OpenAI completion kind, and figure out a solution for it.

I also noticed that you created a discussion about this, let's leave this issue to the update of llama.cpp and discuss the API support here #3323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants