[Q] How do I use a base (completion only, no instruct/chat) model? #108

NightMachinery · 2024-04-14T20:40:36Z

I have set:

(setopt ellama-provider
	(make-llm-ollama
	 :chat-model "deepseek-coder:6.7b-base-q8_0"
	 :embedding-model "deepseek-coder:6.7b-base-q8_0"
         ))

But ellama seems to be sending some kind of template-y message to ollama when I do ellama-code-complete. I don't want any "prompt engineering", I just want to feed the context near point into this base model and get its next N lines of prediction.

The text was updated successfully, but these errors were encountered:

NightMachinery · 2024-04-14T20:44:58Z

Indeed, looking at the logs, ellama is using a template:

[2024-04-15 00:13:02] [Emacs --> deepseek-coder:6.7b-base-q8_0]:
Interactions:
User: Continue the following code, only write new code in format ```language
...
```:
```
#!/usr/bin/env python3

# prints hello world
print('
```

NightMachinery · 2024-04-14T20:46:25Z

ellama seems to be parsing the response as well and doing magic on it. (E.g., the response gets suddenly deleted when it finishes streaming, presumably because it wasn't in backticks.) I want to disable all magic.

s-kostyaev · 2024-04-14T21:07:44Z

You can use ellama-complete command for that purpose. I don't think it will be useful tough.

NightMachinery · 2024-04-15T00:59:32Z

@s-kostyaev Thanks. Is there a way to limit the completion so that it stops on a newline?

s-kostyaev · 2024-04-15T04:33:28Z

@NightMachinery Sure. You need to create custom model with ollama. Add parameter:

PARAMETER stop "\n"

And create custom model from this modelfile. And use this new created model. For example, I use https://ollama.com/sskostyaev/openchat:1l to create chat names.

NightMachinery · 2024-04-16T21:37:59Z

@s-kostyaev Looking at the logs, ellama-complete still uses the chat API and we can see User: at the start of its request. I want to directly use the completion API:

res = openrouter_client.completions.create(
    model="mistralai/mixtral-8x22b",
    prompt="""...""",
    stream=True,
    echo=False, #: Echo back the prompt in addition to the completion
    max_tokens=100,
)

This completion API works pretty good for completing text in my tests. This API with a reasonable max_tokens can be a viable alternative for Copilot IMO.

s-kostyaev added the question Further information is requested label Apr 15, 2024

NightMachinery closed this as completed Apr 15, 2024

NightMachinery reopened this Apr 16, 2024

NightMachinery mentioned this issue Apr 16, 2024

[Q/FR] "Legacy" Completion ahyatt/llm#45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Q] How do I use a base (completion only, no instruct/chat) model? #108

[Q] How do I use a base (completion only, no instruct/chat) model? #108

NightMachinery commented Apr 14, 2024

NightMachinery commented Apr 14, 2024

NightMachinery commented Apr 14, 2024

s-kostyaev commented Apr 14, 2024

NightMachinery commented Apr 15, 2024

s-kostyaev commented Apr 15, 2024

NightMachinery commented Apr 16, 2024

[Q] How do I use a base (completion only, no instruct/chat) model? #108

[Q] How do I use a base (completion only, no instruct/chat) model? #108

Comments

NightMachinery commented Apr 14, 2024

NightMachinery commented Apr 14, 2024

NightMachinery commented Apr 14, 2024

s-kostyaev commented Apr 14, 2024

NightMachinery commented Apr 15, 2024

s-kostyaev commented Apr 15, 2024

NightMachinery commented Apr 16, 2024