-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I finally found prove that server output can be different (and vs groq now) - model name : llama3 8b Instruct #6955
Comments
You are not using any seed, isn't? |
@x4080 Please remove RNG by enabling |
Hi, I didnt use any seed, so I should add seed instead ? Edit : Tried using seed on both, and the results are still not change :
and
|
Why do you expect Maybe try |
the model name is ignored by llama cpp server, I use it because it used to be calling chat gpt api |
Ok, today I tried new gguf fix https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF and updated llama cpp |
I think this was closed prematurely, @Jeximo. Simple math questions with only one correct answer and with sampling turned off by zero Just kindly retake your test for the purpose of e.g. tweet generation instead of math operations (repeated several times with the same prompt and seed) with the exactly opposite (i.e. maximized) |
@mirekphd, I found out what makes difference results between server and regular llama cpp :
edit : and repeat penalty |
@x4080 I think the reason for the model response randomness is even simpler here (in |
@mirekphd thats interesting, I didnt know that about the seed, so is it a feature or a bug ? What I found out is about the repeat penalty, in the doc I think default is 1.1 in fact it is 1.0 |
Yes, I can confirm your observation of Could you file a bug report for this? And I will report the issue with On the other hand, while harder to prove, your finding is arguably more serious, because it affects all users of high-level OpenAI API, where the |
I did it here: #7381 |
I think I did an issue weeks ago : #7109 |
Hi, I dont know if this is a bug or not, Previously I was noticing that answer from server is different than using regular llama.cpp.
Now I can prove it, here goes :
First this is using regular llama.cpp (and also output from groq)
testprompt.txt
output :
Here's using server
Here's the json to call
here's the output:
So basically function json is not generated instead it directly write the code
I understand that it seems impossible that result can be different between server and regular llama cpp, but it did happened
PS: I tried also in ollama and it's output is like the server one
Is this a bug ?
Thanks
The text was updated successfully, but these errors were encountered: