Need to update the version of llama.cpp #3305

wapleeeeee · 2024-10-23T12:45:09Z

Please describe the feature you want

I want to apply local model 【Minicpm3-4B】 for test. But the error appeared:
llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:110: <chat>: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'minicpm3'

The latest version of tabby only support llama.cpp @ 5ef07e2 whose last update is 2 months ago.

I've noticed there was a PR at llama.cpp last month: ggml-org/llama.cpp#9322.

I wonder if you can update the llama.cpp version at the next version of tabby.

The text was updated successfully, but these errors were encountered:

zwpaper · 2024-10-24T03:26:28Z

Hi @wapleeeeee, Thanks for trying Tabby.

May I know your interest in using the Minicpm3-4B with Tabby?

As for the update of llama cpp server, we will generally update it to the newer version with tabby, we are currently working on v0.19.0 release, and I believe we can handle this update in the next release if necessary.

Please also notice that Tabby supports Model HTTP API, you can manually setup a llama cpp server or ollama server and connect to it by Model HTTP API, for more information, please refer to the doc https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/

wapleeeeee · 2024-10-24T13:22:36Z

Thanks so much for your reply!

Actually, we are going to using Minicpm3-4B for our product. Before that, we should test the coding ability of Minicpm3-4B.
We found tabby is a great tool which can both help us coding and test for the potential risk.

I use the Model HTTP API successfully with vllm. Thanks for your advice.
But now there's a situation, vllm can't accept {"input": {"prefix": xx, "suffix": xx}} format for 'v1/completion' ('v1/chat/completion' do work).

I tried to modify the ~/.tabby/config.toml but it seemed not work. Is there any way to solve that?

Here's my request:

curl -X 'POST' -H "Authorization:Bearer token-abc123"\
  'http://localhost:8015/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "language": "python",
  "segments": {
    "prefix": "def fib(n):\n    ",
    "suffix": "\n        return fib(n - 1) + fib(n - 2)"
  }
}'

Here is my error:

{"object":"error","message":"[{'type': 'missing', 'loc': ('body', 'model'), 'msg': 'Field required', 'input': {'language': 'python', 'segments': {'prefix': 'def fib(n):\\n    ', 'suffix': '\\n        return fib(n - 1) + fib(n - 2)'}}}, {'type': 'missing', 'loc': ('body', 'prompt'), 'msg': 'Field required', 'input': {'language': 'python', 'segments': {'prefix': 'def fib(n):\\n    ', 'suffix': '\\n        return fib(n - 1) + fib(n - 2)'}}}, {'type': 'extra_forbidden', 'loc': ('body', 'language'), 'msg': 'Extra inputs are not permitted', 'input': 'python'}, {'type': 'extra_forbidden', 'loc': ('body', 'segments'), 'msg': 'Extra inputs are not permitted', 'input': {'prefix': 'def fib(n):\\n    ', 'suffix': '\\n        return fib(n - 1) + fib(n - 2)'}}]","type":"BadRequestError","param":null,"code":400}

wapleeeeee · 2024-10-25T07:27:04Z

I set the ~/.tabby/config.toml with

[model.completion.http]
kind = "openai/completion"
model_name = "minicpm3-4b"
api_endpoint = "http://localhost:8015/v1"
api_key = "xxx"
max_tokens = 256
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>"

[model.chat.http]

but the prompt_template seems not work.

I check the request from vllm server found the request is:

b'{"model":"minicpm3-4b","prompt":"def fibonacci(n):\\n    if n >=","suffix":" 1:","max_tokens":64,"temperature":0.1,"stream":true,"presence_penalty":0.0}'

The "suffix" param causes the 400 Bad Request.

I recheck the document completion part but there is not any cases or Instructions.

How can I solve this?

zwpaper · 2024-10-25T15:27:06Z

Hi @wapleeeeee, it's great that Tabby can help!

We have looked into the inference backend support, and found out that vLLM claims it's OpenAI compatible, but actually, it does not implement suffix field support.

The OpenAI completion kind is marked as legacy from OpenAI and different services have their own implementation, maybe we have to look deeper into the implementation of OpenAI completion kind, and figure out a solution for it.

I also noticed that you created a discussion about this, let's leave this issue to the update of llama.cpp and discuss the API support here #3323

wapleeeeee added the enhancement New feature or request label Oct 23, 2024

wapleeeeee closed this as completed Oct 28, 2024

zwpaper mentioned this issue Oct 31, 2024

chore(llama.cpp): bump version to b3995 #3347

Merged

wsxiaoys reopened this Oct 31, 2024

wsxiaoys added the fixed-in-next-release label Oct 31, 2024

wsxiaoys closed this as completed Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to update the version of llama.cpp #3305

Need to update the version of llama.cpp #3305

wapleeeeee commented Oct 23, 2024

zwpaper commented Oct 24, 2024

wapleeeeee commented Oct 24, 2024

wapleeeeee commented Oct 25, 2024

zwpaper commented Oct 25, 2024

Need to update the version of llama.cpp #3305

Need to update the version of llama.cpp #3305

Comments

wapleeeeee commented Oct 23, 2024

zwpaper commented Oct 24, 2024

wapleeeeee commented Oct 24, 2024

wapleeeeee commented Oct 25, 2024

zwpaper commented Oct 25, 2024