Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] How do I use a base (completion only, no instruct/chat) model? #108

Open
NightMachinery opened this issue Apr 14, 2024 · 6 comments
Open
Labels
question Further information is requested

Comments

@NightMachinery
Copy link

I have set:

(setopt ellama-provider
	(make-llm-ollama
	 :chat-model "deepseek-coder:6.7b-base-q8_0"
	 :embedding-model "deepseek-coder:6.7b-base-q8_0"
         ))

But ellama seems to be sending some kind of template-y message to ollama when I do ellama-code-complete. I don't want any "prompt engineering", I just want to feed the context near point into this base model and get its next N lines of prediction.

@NightMachinery
Copy link
Author

Indeed, looking at the logs, ellama is using a template:

[2024-04-15 00:13:02] [Emacs --> deepseek-coder:6.7b-base-q8_0]:
Interactions:
User: Continue the following code, only write new code in format ```language
...
```:
```
#!/usr/bin/env python3

# prints hello world
print('
```

@NightMachinery
Copy link
Author

ellama seems to be parsing the response as well and doing magic on it. (E.g., the response gets suddenly deleted when it finishes streaming, presumably because it wasn't in backticks.) I want to disable all magic.

@s-kostyaev
Copy link
Owner

You can use ellama-complete command for that purpose. I don't think it will be useful tough.

@NightMachinery
Copy link
Author

@s-kostyaev Thanks. Is there a way to limit the completion so that it stops on a newline?

@s-kostyaev
Copy link
Owner

@NightMachinery Sure. You need to create custom model with ollama. Add parameter:

PARAMETER stop "\n"

And create custom model from this modelfile. And use this new created model. For example, I use https://ollama.com/sskostyaev/openchat:1l to create chat names.

@s-kostyaev s-kostyaev added the question Further information is requested label Apr 15, 2024
@NightMachinery
Copy link
Author

@s-kostyaev Looking at the logs, ellama-complete still uses the chat API and we can see User: at the start of its request. I want to directly use the completion API:

res = openrouter_client.completions.create(
    model="mistralai/mixtral-8x22b",
    prompt="""...""",
    stream=True,
    echo=False, #: Echo back the prompt in addition to the completion
    max_tokens=100,
)

This completion API works pretty good for completing text in my tests. This API with a reasonable max_tokens can be a viable alternative for Copilot IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants