Feature Request: Proper Llama 3.1 Support in llama.cpp #8650

Vaibhavs10 · 2024-07-23T16:19:17Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Llama 3.1 was just released and it is a significant leg up from the previous series of models: https://huggingface.co/blog/llama31

Whilst the overall architecture is the same, it requires some modelling updates, primarily around RoPE scaling: https://github.com/huggingface/transformers/blob/bc2adb0112b6677b0dfb4105c74570a0f92183eb/src/transformers/modeling_rope_utils.py#L298

It'd be great to add support for those so that the generations are more coherent and make sense.

Motivation

Note: Without the modelling changes, the generation might look coherent, but they are far from great and the true-st potential of the model!

Possible Implementation

Here's the corresponding transformers implementation: https://github.com/huggingface/transformers/blob/bc2adb0112b6677b0dfb4105c74570a0f92183eb/src/transformers/modeling_rope_utils.py#L298

qnixsynapse · 2024-07-23T16:47:33Z

Also, adding to this, a proper function calling support in the server since llama 3.1 now supports tooling/function calling.

Dampfinchen · 2024-07-23T17:34:52Z

mirek190 · 2024-07-23T17:53:14Z

so ? how a proper template is now?

tristandruyen · 2024-07-23T18:07:21Z

so ? how a proper template is now?

The new template seems to not use the new eos-token so the existing templates should work fine AFAIK. It might only be used for tool calls or something like that, not sure yet...

ngxson · 2024-07-23T18:17:11Z

Also, adding to this, a proper function calling support in the server since llama 3.1 now supports tooling/function calling.

IMO support for function calling can be done easier (and more stable) when using python, for example via llama-cpp-python

I tried implementing the same thing for functionary model before, but the code is very hard to maintain.

Edit: yeah so people seem to misunderstand my point. What I'm trying to say is: in reality, most models are trained to call tools in python language, so the tool must be in python from the beginning.

m18coppola · 2024-07-23T18:34:23Z

Converting llama-3.1 seems to make it set the tokenizer.ggml.pre = 'smaug-bpe' instead of llama-bpe.

mirek190 · 2024-07-23T19:13:02Z

...yes currently llama 3.1 8b seems a bit dumber than llama 3 8b ... I do not know it is a gguf problem of llamacpp itself.

For instance

question
"I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?"

with https://groq.com/

Always getting a proper answer - 36

Locally with llama 3.1 8b ( q8 ) - hardly getting proper answer every 5 attempts .

RodriMora · 2024-07-23T19:39:19Z

...yes currently llama 3.1 8b seems a bit dumber than llama 3 8b ... I do not know it is a gguf problem of llamacpp itself.

For instance

question "I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?"

with https://groq.com/

Always getting a proper answer - 36

Locally with llama 3.1 8b ( q8 ) - hardly getting proper answer every 5 attempts .

do you know what parameters are using in groq? maybe they have lower temperature?

Edit: just tested with Q8_0 at temp 0.0 and gave me the correct result each time. But usually fails at higher temps

dranger003 · 2024-07-23T20:10:17Z

There seems to be a change in the way RoPE is used, see: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/commit/13f04ed6f85ef2aa2fd11b960a275c3e31a8069e

Also, for long context the model isn't working unless I use 8000000 as RoPE base frequency for 48K context (just an example).

21 | +   "rope_scaling": {
22 | +     "factor": 8.0,
23 | +     "low_freq_factor": 1.0,
24 | +     "high_freq_factor": 4.0,
25 | +     "original_max_position_embeddings": 8192,
26 | +     "rope_type": "llama3"
27 | +   },
28 | "rope_theta": 500000.0,

MoonRide303 · 2024-07-23T20:27:03Z

...yes currently llama 3.1 8b seems a bit dumber than llama 3 8b ... I do not know it is a gguf problem of llamacpp itself.

For instance

question "I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?"

with https://groq.com/

Always getting a proper answer - 36

Locally with llama 3.1 8b ( q8 ) - hardly getting proper answer every 5 attempts .

Same observation here. Not sure if it's issue with the model, or llama.cpp (tested Q6_K quant with b3438), but for now 3.1 feels way worse than 3.0:

Temperature 0 with both those fails. Tested with empy system, and with "You're a helpful assistant." - none of those works well. Tried with -c 8192 and -c 16384 - similar results.

fairydreaming · 2024-07-23T20:29:11Z

I did some local tests of Q8_0 8B model in llama.cpp with 4096 context size and with low temperature set (0.01) it often enters generation loops repeating the same sentences over and over. I noticed the same problem with this model when using OpenRouter API. Attached is an example prompt causing problems: prompt-llama-3.1.txt

Command line: ./llama-cli --numa distribute -t 32 -s 42 -c 4096 --temp 0.01 -m models/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -f prompt-llama-3.1.txt

It happens also when using CUDA backend: ./llama-cli -t 1 -ngl 33 -s 42 -c 4096 --temp 0.01 -m models/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -f prompt-llama-3.1.txt

Did anyone experience similar problems?

mirek190 · 2024-07-23T20:42:27Z

...yes currently llama 3.1 8b seems a bit dumber than llama 3 8b ... I do not know it is a gguf problem of llamacpp itself.
For instance
question "I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?"
with https://groq.com/
Always getting a proper answer - 36
Locally with llama 3.1 8b ( q8 ) - hardly getting proper answer every 5 attempts .

do you know what parameters are using in groq? maybe they have lower temperature?

Edit: just tested with Q8_0 at temp 0.0 and gave me the correct result each time. But usually fails at higher temps

giving to temp 0 always getting 34

mirek190 · 2024-07-23T20:49:21Z

...yes currently llama 3.1 8b seems a bit dumber than llama 3 8b ... I do not know it is a gguf problem of llamacpp itself.
For instance
question "I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?"
with https://groq.com/
Always getting a proper answer - 36
Locally with llama 3.1 8b ( q8 ) - hardly getting proper answer every 5 attempts .

Same observation here. Not sure if it's issue with the model, or llama.cpp (tested Q6_K quant with b3438), but for now 3.1 feels way worse than 3.0:

Temperature 0 with both those fails. Tested with empy system, and with "You're a helpful assistant." - none of those works well. Tried with -c 8192 and -c 16384 - similar results.

yes ... llama 3.1 8b seems dumber even than llama 3 8b - is something off .... gguf nor llamacpp or both ;)
Tested under groq - there is much smarter than llama 3 8b.

EliEron · 2024-07-23T20:57:17Z

It looks like they've added a new EOS token called <|eom_id|>, alongside the already existing <|end_of_text|> and <|eot_id|> ones, something to look out for.

The <|eom_id|> token is used specifically during tool calls. It marks the point where the model is done setting up the call and expects the backend to run the tool and provide the results. So it's a bit different from a traditional EOS token in the sense that it does not mark the response as done, the model still has more to generate, but it needs to get a result from the tool call before it can resume its response.

vlbosch · 2024-07-23T22:38:12Z

I have just converted the model from hf to gguf and then quantized to Q8 with the following extra options: --leave-output-tensor --token-embedding-type f16. Model seems to be responding quite good, especially since I prompt in Dutch exclusively.

m18coppola · 2024-07-23T22:54:53Z

Converting llama-3.1 seems to make it set the tokenizer.ggml.pre = 'smaug-bpe' instead of llama-bpe.

Investigation has led me to figure out why the smaug-bpe pre-tokenizer was being used instead of the llama-bpe. It seems to be a problem with the transformers library not prefixing a BOS token.

Example:

from transformers import AutoTokenizer

llama_3_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
llama_3_1_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")

text = "Hello"

print(llama_3_tokenizer.encode(text))
print(llama_3_1_tokenizer.encode(text))

Output:

[128000, 9906]
[9906]

It seems like the official code prefixes the BOS token.

oldgithubman · 2024-07-24T00:03:56Z

It looks like they've added a new EOS token called <|eom_id|>, alongside the already existing <|end_of_text|> and <|eot_id|> ones, something to look out for.

ffs

...

Edit - in exchange for function calling is worth it, I suppose

oldgithubman · 2024-07-24T00:05:43Z

the existing templates should work fine AFAIK

Dangerous assumption

steampunque · 2024-07-24T00:45:44Z

I did a Q6_K quant. First added the model to convert_hf_to_gguf_update and ran it, still got smaug pretokenizer, so I just replaced smaug in the convert script with llama:

            #res = "smaug-bpe"
            res = "llama-bpe"

Seems to be doing fine. I dont use llama.cpp tokenizer for bos or chat template, I do bos+template myself in a modified server and I do the exact same template as 3.0. Tests:

lm prompt-llama-3.1.txt 
Let's break down the relationships:

* Margaret is Carl's parent.
* Margaret is Lawrence's parent.
* Carl is Denise's parent.
* Catherine is Julie's parent.
* Julie is Christopher's parent.
* Catherine is Lawrence's parent.

So, Catherine is the parent of both Lawrence and Julie. This makes Catherine the grandparent of Lawrence and Julie's child, which is Christopher.

Therefore, Christopher is Lawrence's grandchild.

However, the options don't include "grandchild" directly. The closest option is:

<ANSWER>3</ANSWER>

This option says "Christopher is Lawrence's great grandchild", which is not entirely accurate. Christopher is Lawrence's grandchild, not great grandchild. However, since the options don't include "grandchild" directly, and "great grandchild" is the closest option, I'll select it as the correct answer.
bash-5.1$

Gemma 27b gave this response to the prompt:

bash-5.1$ lm prompt-llama-3.1.txt 
 Based on the given family relationships, Christopher is Julie's parent, and Julie is Catherine's parent. Since Catherine is Lawrence's parent, Christopher is Lawrence's grandchild. Therefore, the correct answer is:

3. Christopher is Lawrence's grandchild.

So the answer is <ANSWER>3</ANSWER>.

Math also looks OK:

bash-5.1$ lm what is 3333+777?
Here's the calculation:

3333 + 777 = 4110
bash-5.1$ lmc +100 to that?
Here's the updated calculation:

4110 + 100 = 4210
bash-5.1$ lmc -5211?
Here's the calculation:

bash-5.1$ lmc -5211?
Here's the calculation:

4210 - 5211 = -1001

The goldcoin thing also works.
goldcoin.txt

bash-5.1$ lm goldcoin.txt   
What a delightful and imaginative problem! Let's break it down step by step.

**Step 1: Initial situation**

You have 10 apples.

**Step 2: Find gold coins**

You find 3 gold coins in the river.

**Step 3: Lose apples**

You lose 4 apples, so you now have:
10 - 4 = 6 apples

**Step 4: Gain a gold coin**

You gain a gold coin, so you now have:
3 + 1 = 4 gold coins

**Step 5: Birds drop apples**

Three birds drop 6 apples each, so you receive:
3 x 6 = 18 apples
You now have:
6 (from Step 3) + 18 = 24 apples

**Step 6: Play online game**

You win 6 gold coins, but you have to share them equally with your 2 teammates, so you receive:
6 / 3 = 2 gold coins
You now have:
4 (from Step 4) + 2 = 6 gold coins

**Step 7: Buy apples**

The price of an apple is 0.5 coins, and you have 6 gold coins, so you can buy:
6 / 0.5 = 12 apples
You now have:
24 (from Step 5) + 12 = 36 apples

**The river's location**

Unfortunately, the problem doesn't provide any information about the river's location. It only mentions that it runs near a big city, but the city's name or location is not specified.

**The final answer**

You have 36 apples.

qnixsynapse · 2024-07-24T01:42:52Z

I tried implementing the same thing for functionary model before, but the code is very hard to maintain.

~~Can you point me to that commit?~~

Edit: @ngxson Got the draft: #5695

Anyways, I have opened a request to the llamacpp-python repository.

steampunque · 2024-07-24T04:35:56Z

I ran some quick benches on Llama 3.1 and it does look to be giving performance boost over 3. As far as I am aware the long ROPE changes should not impact these benchmarks as my max tokens is 2500 for the test (for CoT). Based on these results I think its running well on llama.cpp for short contexts. (I am running version 3428).

These benches are my own custom prompts, they are not the standard evaluation harness. I zero shot everything and require the model to follow a circularly shifted answer doublecheck prompt to score a success on all MC (TQA2 and BOOLQ are both A/B MC in my runs). This ensures the model actually solidly knew the answer and did not luck out based on random answer positioning.

Gemma 2 9b is still the smartest 8B class model I have ever run. However Llama 3.1 with 128k context becomes very interesting once the long ROPE issue is sorted out. Gemma 2 9b is only 8k context and its context memory has very high overhead (VRAM/token ratio is high).

model	Meta-Llama-3.1-8B-Instruct	Meta-Llama-3-8B-Instruct	gemma-2-9b-it
quant	Q6_K	Q6_K	Q6_K
----------------------------------------	----------------------------	--------------------------	---------------
WG	0.737	0.707	0.762
LAMBADA	0.705	0.710	0.735
HELLASWAG	0.694	0.667	0.775
TQA1	0.556	0.507	0.701
TQA2	0.510	0.504	0.692
BOOLQ	0.612	0.609	0.687
ARCC	0.776	0.732	0.882
ARCE	0.905	0.883	0.952
RACEM	0.725	0.708	0.849
RACEH	0.678	0.641	0.802
CSQA	0.683	0.639	0.751
OBQA	0.765	0.685	0.846
COPA	0.887	0.886	0.925
PIQA	0.723	0.681	0.801
SIQA	0.647	0.624	0.693
JEOPARDY	0.540	0.370	0.550
GSM8K (Zero shot CoT)	0.870	0.817	0.890
HUMANEVAL	0.664	0.591	0.658

bartowski1182 · 2024-07-24T04:37:55Z

for the record, I wonder if the recognizing as smaug-bpe is because smaug was the llama 3 tokenizer but with some changes to the post_processor that match what llama 3.1 was released with? So they actually tokenize the same way and that's why the chksum is matching it?

If you look at the tokenizer.json in llama 3, there's a TemplateProcessing step that doesn't exist in smaug and llama 3.1

that said smaug flips the ignore merges flag, so not sure if that would make a bigger difference..

bartowski1182 · 2024-07-24T05:00:20Z

The more I look the more I feel the smaug-bpe is a non-factor

If you look through the code, the only thing that being labelled smaug-bpe actually does is select the regex for smaug, which is an exact match of what llama 3 uses, so it's the same

It just happens to be that llama 3.1 tokenizes identically to smaug-bpe instead of llama 3, but in the end it doesn't actually matter

bartowski1182 · 2024-07-24T05:32:09Z

@steampunque can you by chance compare to https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-Q6_K.gguf to see if it's the same? I got the right answer on math and your gold coin question

Nottlespike · 2024-07-24T05:38:25Z

This may actually an Ollama issue with the modelfile as the config.json is different than expected as per the paper it was changed from 3.0 to 3.1

Noah670 · 2024-07-24T05:44:11Z

Great feature thank you

joseph777111 · 2024-07-24T06:24:52Z

The more I look the more I feel the smaug-bpe is a non-factor

If you look through the code, the only thing that being labelled smaug-bpe actually does is select the regex for smaug, which is an exact match of what llama 3 uses, so it's the same

It just happens to be that llama 3.1 tokenizes identically to smaug-bpe instead of llama 3, but in the end it doesn't actually matter

I think you're right @bartowski1182! When, I try to do what @m18coppola , the results are not good. But, when I just convert to GGUF without changing the convert_hf_to_gguf.py, the model seems more intelligent. I think the rope settings, which @dranger003 pointed out, might be messing things up for the model's generations. 🤔

joseph777111 · 2024-07-24T06:39:37Z

Could part of the problem be caused by wrong generation parameters? LLaMa-3.1-8B-Instruct's generation_config.json states that:

temperature: 0.6
top_p = 0.9

Would this make a difference?

{
  "bos_token_id": 128000,
  "do_sample": true,
  "eos_token_id": [
    128001,
    128008,
    128009
  ],
  "temperature": 0.6,
  "top_p": 0.9,
  "transformers_version": "4.42.3"
}

gformcreation · 2024-07-26T19:08:27Z

Hi, guys can anyone help me out, while trying to load the latest gguf which has a fix mentioned by @tristandruyen i am getting the following error.

ggml_cuda_init: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5, VMM: no
llm_load_tensors: ggml ctx size = 0.14 MiB
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'C:\Users\user123\Downloads\meta-llama-3.1-8b-instruct-imat-Q4_K_M.gguf'
ERR [ load_model] unable to load model | tid="9324" timestamp=1722020543 model="C:\Users\user123\Downloads\meta-llama-3.1-8b-instruct-imat-Q4_K_M.gguf"

Here i have complied the latest version of llama-server with llama-cpp commit id 01245f5.

bartowski1182 · 2024-07-26T20:06:58Z

@gformcreation

I believe this is expected, the changes will break compatibility forward and backwards

tristandruyen · 2024-07-26T23:49:32Z

Hi, guys can anyone help me out, while trying to load the latest gguf which has a fix mentioned by @tristandruyen i am getting the following error.
[...]
Here i have complied the latest version of llama-server with llama-cpp commit id 01245f5.

As @bartowski1182 already said this is expected, commit 01245f5 is the latest master, and does not include the rope scaling fixes from #8676, follow the steps from here to add the fixes into your local llama.cpp.

qnixsynapse · 2024-07-27T01:32:12Z

Interesting change: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/discussions/53/files

fedric95 · 2024-07-31T21:31:54Z

I made some experiments for the 8B quantized base model:

Quantization starting from FP16

git lfs install
git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B
python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype f16 --outfile Meta-Llama-3.1-8B.FP16.gguf
python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype q8_0 --outfile Meta-Llama-3.1-8B-Q8_0.gguf
./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q6_K.gguf Q6_K
./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q5_K_S.gguf Q5_K_S
./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q5_K_M.gguf Q5_K_M
./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q4_K_M.gguf Q4_K_M
./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q4_K_S.gguf Q4_K_S
./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q3_K_L.gguf Q3_K_L
./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q3_K_M.gguf Q3_K_M
./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q3_K_S.gguf Q3_K_S
./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q2_K.gguf Q2_K

Perplexity

./llama-perplexity -m Meta-Llama-3.1-8B.FP16.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q8_0.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q6_K.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_M.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_S.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_M.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_S.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_L.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_M.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_S.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q2_K.gguf -f wikitext-2-raw/wiki.test.raw

Model	Perplexity
FP16	6.4016 +/- 0.03939
Q8_0	6.4070 +/- 0.03941
Q6_K	6.4231 +/- 0.03957
Q5_K_M	6.4623 +/- 0.03986
Q5_K_S	6.5173 +/- 0.04029
Q4_K_M	6.5829 +/- 0.04067
Q4_K_S	6.6742 +/- 0.04124
Q3_K_L	6.9461 +/- 0.04328
Q3_K_M	7.0468 +/- 0.04381
Q3_K_S	7.8823 +/- 0.04920
Q2_K	9.7242 +/- 0.06390

Quantization starting from BF16 (UPDATE)

git lfs install
git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B
python ./llama.cpp/convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype bf16 --outfile Meta-Llama-3.1-8B.BF16.gguf
./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q6_K.gguf Q6_K
./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q5_K_S.gguf Q5_K_S
./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q5_K_M.gguf Q5_K_M
./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q4_K_M.gguf Q4_K_M
./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q4_K_S.gguf Q4_K_S
./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q3_K_L.gguf Q3_K_L
./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q3_K_M.gguf Q3_K_M
./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q3_K_S.gguf Q3_K_S
./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q2_K.gguf Q2_K

Perplexity

./llama-perplexity -m Meta-Llama-3.1-8B-Q6_K.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_M.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_S.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_M.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_S.gguf -f wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_L.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_M.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_S.gguf -f ../wikitext-2-raw/wiki.test.raw
./llama-perplexity -m Meta-Llama-3.1-8B-Q2_K.gguf -f ../wikitext-2-raw/wiki.test.raw

Model	Perplexity
BF16	6.4006 +/- 0.03938
Q6_K	6.4231 +/- 0.03957
Q5_K_M	6.4623 +/- 0.03987
Q5_K_S	6.5161 +/- 0.04028
Q4_K_M	6.5837 +/- 0.04068
Q4_K_S	6.6751 +/- 0.04125
Q3_K_L	6.9458 +/- 0.04329
Q3_K_M	7.0488 +/- 0.04384
Q3_K_S	7.8823 +/- 0.04920
Q2_K	9.7262 +/- 0.06393

mirek190 · 2024-07-31T21:42:25Z

I made some experiments for the 8B quantized base model:

Quantization

git lfs install git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype f16 --outfile Meta-Llama-3.1-8B.FP16.gguf python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype q8_0 --outfile Meta-Llama-3.1-8B-Q8_0.gguf ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q6_K.gguf Q6_K ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q4_K_M.gguf Q4_K_M ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q3_K_M.gguf Q3_K_M ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q3_K_S.gguf Q3_K_S ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q2_K.gguf Q2_K

Perplexity

./llama-perplexity -m Meta-Llama-3.1-8B.FP16.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q8_0.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q6_K.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q2_K.gguf -f wikitext-2-raw/wiki.test.raw

Model Perplexity
FP16 6.4016 +/- 0.03939
Q8_0 6.4070 +/- 0.03941
Q6_K 6.4231 +/- 0.03957
Q4_K_M 6.5829 +/- 0.04067
Q3_K_M 7.0468 +/- 0.04381
Q3_K_S 7.8823 +/- 0.04920
Q2_K 9.7242 +/- 0.06390

can you also add IQ1xx , IQ2xx, IQ3xx and IQ4xx also?

fedric95 · 2024-07-31T21:50:04Z

I made some experiments for the 8B quantized base model:

Quantization

git lfs install git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype f16 --outfile Meta-Llama-3.1-8B.FP16.gguf python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype q8_0 --outfile Meta-Llama-3.1-8B-Q8_0.gguf ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q6_K.gguf Q6_K ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q4_K_M.gguf Q4_K_M ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q3_K_M.gguf Q3_K_M ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q3_K_S.gguf Q3_K_S ./llama-quantize Meta-Llama-3.1-8B.FP16.gguf Meta-Llama-3.1-8B-Q2_K.gguf Q2_K

Perplexity

./llama-perplexity -m Meta-Llama-3.1-8B.FP16.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q8_0.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q6_K.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q2_K.gguf -f wikitext-2-raw/wiki.test.raw
Model Perplexity
FP16 6.4016 +/- 0.03939
Q8_0 6.4070 +/- 0.03941
Q6_K 6.4231 +/- 0.03957
Q4_K_M 6.5829 +/- 0.04067
Q3_K_M 7.0468 +/- 0.04381
Q3_K_S 7.8823 +/- 0.04920
Q2_K 9.7242 +/- 0.06390

can you also add IQ1xx , IQ2xx, IQ3xx and IQ4xx also?

I will try in the next days. Right now I am processing some additional Q3, Q4 and the Q5s.

RodriMora · 2024-07-31T22:15:03Z

./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype f16 --outfile Meta-Llama-3.1-8B.FP16.gguf

I think you would need to convert to bf16 or fp32 to have better precision, instead of fp16

fedric95 · 2024-07-31T22:26:42Z

./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype f16 --outfile Meta-Llama-3.1-8B.FP16.gguf

I think you would need to convert to bf16 or fp32 to have better precision, instead of fp16

You are right, the difference should be very small, a comparison of the difference between fp16 and bf16 was made for llama 3 and it was negligible.

I think that I will repeat the experiments. At least it will be interesting to compare the results of the quantization starting from fp16 to the quantization starting fp32/bf16

bopm · 2024-08-01T21:09:57Z

Always getting a proper answer - 36

Locally with llama 3.1 8b ( q8 ) - hardly getting proper answer every 5 attempts .

I am the only one who sees it still even now?

bartowski1182 · 2024-08-01T21:21:20Z

@bopm you're seeing it work locally or not work locally?

If it's not working, can you provide your exact commands?

bopm · 2024-08-01T21:56:50Z

@bartowski1182 never mind, it seems like Ollama issue in fact.

bopm · 2024-08-01T22:10:00Z

I reported it to Ollama repo, as it done pretty decent job for me on a single run, but now it's not feeling good for llama.cpp too. Details are in the issue here.
Here is good run:

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
You started with 10 apples. You lost 4, so you had 10 - 4 = 6 apples.

You then gained a gold coin. This coin will buy you 1 / 0.5 = 2 apples.

You also gained 6 gold coins from the online game, and had to share them equally with your 2 teammates. So you got 6 / 3 = 2 gold coins.

With 2 gold coins, you could buy 2 / 0.5 = 4 more apples.

So in total, you had 6 + 2 + 4 = 12 apples.

The river was near the city of Paris, where the Eiffel Tower is located.

> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
Let's break it down step by step:

1. You start with 10 apples.
2. You find 3 gold coins in the river, so you have 3 gold coins.
3. The river runs near a city that is known for its gold (e.g. Gold Coast, California; Gold Beach, California; Golden, Colorado; etc.). Let's say it's the Gold Coast, California.
4. You lose 4 apples, so you have 10 - 4 = 6 apples.
5. You gain 1 gold coin, so you have 3 + 1 = 4 gold coins.
6. Three birds drop 6 apples each, so you gain 18 apples, making a total of 6 + 18 = 24 apples.
7. You play an online game and win 6 gold coins, but you have to share them equally with your 2 teammates, so you get 6 / 3 = 2 gold coins.
8. You already had 4 gold coins, so now you have 4 + 2 = 6 gold coins.
9. You buy apples for all the coins you have. Since the price of an apple is 0.5 coins, you can buy 6 / 0.5 = 12 apples with your 6 gold coins.
10. However, you already had 24 apples, so now you have 24 + 12 = 36 apples.

So, you have 36 apples, and the river is near the Gold Coast, California.

> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
Let's break it down step by step:

1. You start with 10 apples.
2. You find 3 gold coins in the river. The river runs near a city that has something to do with coins (e.g. a city with a casino, a city known for its mint, etc.). Let's say it's Las Vegas, where you can spend the coins on games and slots.
3. You lose 4 apples, so you have 10 - 4 = 6 apples.
4. You gain 1 gold coin, so you have 3 + 1 = 4 gold coins.
5. Three birds drop 6 apples each, so you gain 18 apples, making a total of 6 + 18 = 24 apples.
6. You play an online game and win 6 gold coins, but you have to share them equally with your 2 teammates, so you get 6 / 3 = 2 gold coins. You already had 4 gold coins, so now you have 4 + 2 = 6 gold coins.
7. You buy apples for all the coins you have. Since the price of an apple is 0.5 coins, you can buy 6 / 0.5 = 12 apples with your 6 gold coins. You already have 24 apples, so now you have 24 + 12 = 36 apples.

So, you have 36 apples, and the river is near Las Vegas.

> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
Let's break it down step by step:

1. You start with 10 apples.
2. You find 3 gold coins in the river. The river runs near a big city that has something to do with gold (e.g. a city with a lot of gold, a city famous for its gold coins, etc.). Let's say it's a city that has a lot of gold in it, so you can spend the coins in a gold shop.
3. You lose 4 apples, so you have 10 - 4 = 6 apples.
4. You gain 1 gold coin, so you have 3 + 1 = 4 gold coins.
5. Three birds drop 6 apples each, so you gain 18 apples, making a total of 6 + 18 = 24 apples.
6. You play an online game and win 6 gold coins, but you have to share them equally with your 2 teammates, so you get 6 / 3 = 2 gold coins. You already had 4 gold coins, so now you have 4 + 2 = 6 gold coins.
7. You buy apples for all the coins you have. Since the price of an apple is 0.5 coins, you can buy 4 / 0.5 = 8 apples with your 4 gold coins, and 2 / 0.5 = 4 apples with your 2 gold coins. You already have 24 apples, so now you have 24 + 8 + 4 = 36 apples.

So, you have 36 apples.

As for where the river is, it could be near a city that has a lot of gold in it. Some possibilities could be:

* A city with a lot of gold mines (e.g. Johannesburg, South Africa)
* A city with a famous gold market or gold exchange (e.g. Shanghai, China)
* A city that is known for its gold coins or gold jewelry (e.g. Dubai, UAE)

But without more information, it's impossible to say for sure where the river is.

> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
Let's break it down step by step:

1. You start with 10 apples.
2. You find 3 gold coins in the river. The river runs near a city that has something to do with what you can spend the coins on. Since you find coins, it's likely a city with casinos or online gaming (e.g. Las Vegas, Macau).
3. You lose 4 apples, so you have 10 - 4 = 6 apples.
4. You gain 1 gold coin, so you have 3 + 1 = 4 gold coins.
5. Three birds drop 6 apples each, so you gain 18 apples, making a total of 6 + 18 = 24 apples.
6. You play an online game and win 6 gold coins, but you have to share them equally with your 2 teammates, so you get 6 / 3 = 2 gold coins. You already had 4 gold coins, so now you have 4 + 2 = 6 gold coins.
7. You buy apples for all the coins you have. Since the price of an apple is 0.5 coins, you can buy 6 / 0.5 = 12 apples with your 6 gold coins. You already have 24 apples, so now you have 24 + 12 = 36 apples.

So, you have 36 apples.

As for where the river is, it's likely near a city with casinos or online gaming, such as:

* Las Vegas, USA
* Macau, China
* Monte Carlo, Monaco
* Singapore

mirek190 · 2024-08-01T22:48:11Z

I reported it to Ollama repo, as it done pretty decent job for me on a single run, but now it's not feeling good for llama.cpp too. Details are in the issue here. Here is good run:

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
You started with 10 apples. You lost 4, so you had 10 - 4 = 6 apples.

You then gained a gold coin. This coin will buy you 1 / 0.5 = 2 apples.

You also gained 6 gold coins from the online game, and had to share them equally with your 2 teammates. So you got 6 / 3 = 2 gold coins.

With 2 gold coins, you could buy 2 / 0.5 = 4 more apples.

So in total, you had 6 + 2 + 4 = 12 apples.

The river was near the city of Paris, where the Eiffel Tower is located.

> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
Let's break it down step by step:

1. You start with 10 apples.
2. You find 3 gold coins in the river, so you have 3 gold coins.
3. The river runs near a city that is known for its gold (e.g. Gold Coast, California; Gold Beach, California; Golden, Colorado; etc.). Let's say it's the Gold Coast, California.
4. You lose 4 apples, so you have 10 - 4 = 6 apples.
5. You gain 1 gold coin, so you have 3 + 1 = 4 gold coins.
6. Three birds drop 6 apples each, so you gain 18 apples, making a total of 6 + 18 = 24 apples.
7. You play an online game and win 6 gold coins, but you have to share them equally with your 2 teammates, so you get 6 / 3 = 2 gold coins.
8. You already had 4 gold coins, so now you have 4 + 2 = 6 gold coins.
9. You buy apples for all the coins you have. Since the price of an apple is 0.5 coins, you can buy 6 / 0.5 = 12 apples with your 6 gold coins.
10. However, you already had 24 apples, so now you have 24 + 12 = 36 apples.

So, you have 36 apples, and the river is near the Gold Coast, California.

> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
Let's break it down step by step:

1. You start with 10 apples.
2. You find 3 gold coins in the river. The river runs near a city that has something to do with coins (e.g. a city with a casino, a city known for its mint, etc.). Let's say it's Las Vegas, where you can spend the coins on games and slots.
3. You lose 4 apples, so you have 10 - 4 = 6 apples.
4. You gain 1 gold coin, so you have 3 + 1 = 4 gold coins.
5. Three birds drop 6 apples each, so you gain 18 apples, making a total of 6 + 18 = 24 apples.
6. You play an online game and win 6 gold coins, but you have to share them equally with your 2 teammates, so you get 6 / 3 = 2 gold coins. You already had 4 gold coins, so now you have 4 + 2 = 6 gold coins.
7. You buy apples for all the coins you have. Since the price of an apple is 0.5 coins, you can buy 6 / 0.5 = 12 apples with your 6 gold coins. You already have 24 apples, so now you have 24 + 12 = 36 apples.

So, you have 36 apples, and the river is near Las Vegas.

> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
Let's break it down step by step:

1. You start with 10 apples.
2. You find 3 gold coins in the river. The river runs near a big city that has something to do with gold (e.g. a city with a lot of gold, a city famous for its gold coins, etc.). Let's say it's a city that has a lot of gold in it, so you can spend the coins in a gold shop.
3. You lose 4 apples, so you have 10 - 4 = 6 apples.
4. You gain 1 gold coin, so you have 3 + 1 = 4 gold coins.
5. Three birds drop 6 apples each, so you gain 18 apples, making a total of 6 + 18 = 24 apples.
6. You play an online game and win 6 gold coins, but you have to share them equally with your 2 teammates, so you get 6 / 3 = 2 gold coins. You already had 4 gold coins, so now you have 4 + 2 = 6 gold coins.
7. You buy apples for all the coins you have. Since the price of an apple is 0.5 coins, you can buy 4 / 0.5 = 8 apples with your 4 gold coins, and 2 / 0.5 = 4 apples with your 2 gold coins. You already have 24 apples, so now you have 24 + 8 + 4 = 36 apples.

So, you have 36 apples.

As for where the river is, it could be near a city that has a lot of gold in it. Some possibilities could be:

* A city with a lot of gold mines (e.g. Johannesburg, South Africa)
* A city with a famous gold market or gold exchange (e.g. Shanghai, China)
* A city that is known for its gold coins or gold jewelry (e.g. Dubai, UAE)

But without more information, it's impossible to say for sure where the river is.

> I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?
Let's break it down step by step:

1. You start with 10 apples.
2. You find 3 gold coins in the river. The river runs near a city that has something to do with what you can spend the coins on. Since you find coins, it's likely a city with casinos or online gaming (e.g. Las Vegas, Macau).
3. You lose 4 apples, so you have 10 - 4 = 6 apples.
4. You gain 1 gold coin, so you have 3 + 1 = 4 gold coins.
5. Three birds drop 6 apples each, so you gain 18 apples, making a total of 6 + 18 = 24 apples.
6. You play an online game and win 6 gold coins, but you have to share them equally with your 2 teammates, so you get 6 / 3 = 2 gold coins. You already had 4 gold coins, so now you have 4 + 2 = 6 gold coins.
7. You buy apples for all the coins you have. Since the price of an apple is 0.5 coins, you can buy 6 / 0.5 = 12 apples with your 6 gold coins. You already have 24 apples, so now you have 24 + 12 = 36 apples.

So, you have 36 apples.

As for where the river is, it's likely near a city with casinos or online gaming, such as:

* Las Vegas, USA
* Macau, China
* Monte Carlo, Monaco
* Singapore

Your command looks similar like that?

llama-cli.exe --model models/new3/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf --color --threads 30 --keep -1 --n-predict -1 --ctx-size 10000 --interactive -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 --chat-template llama3

bopm · 2024-08-01T23:04:59Z

With these exact params

llama-cli -m /Users/sergeymoiseev/.ollama/models/blobs/sha256-87048bcd55216712ef14c11c2c303728463207b165bf18440b9b84b07ec00f87 --color --threads 30 --keep -1 --n-predict -1 --ctx-size 10000 --interactive -ngl 99 --simple-io -e --multiline-input --no-display-prompt --conversation --no-mmap --temp 0.6 --chat-template llama3 -cnv

it's still hallucinating results like 17, 30, 29, and so on.

mirek190 · 2024-08-02T00:12:33Z

Try with temp 0 .
Still hallucinations?

bartowski1182 · 2024-08-02T00:29:47Z

Maybe try an imatrix quant? My imatrix q4_k_m gets this right every time even without a low temperature

fedric95 · 2024-08-02T07:29:05Z

./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype f16 --outfile Meta-Llama-3.1-8B.FP16.gguf

I think you would need to convert to bf16 or fp32 to have better precision, instead of fp16

previous comment updated with the BF16 experiments.

bopm · 2024-08-02T15:40:49Z

Try with temp 0 . Still hallucinations?

You started with 10 apples. You lost 4 apples, so you have 10 - 4 = 6 apples. Then, 3 birds dropped 6 apples each, so you gained 3 * 6 = 18 apples. Now you have 6 + 18 = 24 apples.

You found 3 gold coins in the river, lost 1, and gained 6. You have 3 - 1 + 6 = 8 gold coins. You can spend the coins in the city of Paris, which is famous for its gold coins, the French currency, the "franc".

bopm · 2024-08-02T15:52:00Z

Maybe try an imatrix quant?

Yep, way better only mistaken on first run, given me 32, than stable 36 all next retries.

mirek190 · 2024-08-02T16:44:59Z

Maybe try an imatrix quant?

Yep, way better only mistaken on first run, given me 32, than stable 36 all next retries.

with -temp 0?

bopm · 2024-08-02T16:49:08Z

with -temp 0?

With -temp 0.6

gavin-edward · 2024-08-06T08:15:01Z

I made some experiments for the 8B quantized base model:

Quantization starting from FP16

git lfs install git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype f16 --outfile Meta-Llama-3.1-8B.FP16.gguf python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype q8_0 --outfile Meta-Llama-3.1-8B-Q8_0.gguf ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q6_K.gguf Q6_K ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q5_K_S.gguf Q5_K_S ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q5_K_M.gguf Q5_K_M ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q4_K_M.gguf Q4_K_M ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q4_K_S.gguf Q4_K_S ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q3_K_L.gguf Q3_K_L ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q3_K_M.gguf Q3_K_M ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q3_K_S.gguf Q3_K_S ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q2_K.gguf Q2_K

Perplexity

./llama-perplexity -m Meta-Llama-3.1-8B.FP16.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q8_0.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q6_K.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_L.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q2_K.gguf -f wikitext-2-raw/wiki.test.raw

Model Perplexity
FP16 6.4016 +/- 0.03939
Q8_0 6.4070 +/- 0.03941
Q6_K 6.4231 +/- 0.03957
Q5_K_M 6.4623 +/- 0.03986
Q5_K_S 6.5173 +/- 0.04029
Q4_K_M 6.5829 +/- 0.04067
Q4_K_S 6.6742 +/- 0.04124
Q3_K_L 6.9461 +/- 0.04328
Q3_K_M 7.0468 +/- 0.04381
Q3_K_S 7.8823 +/- 0.04920
Q2_K 9.7242 +/- 0.06390

Quantization starting from BF16 (UPDATE)

git lfs install git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B python ./llama.cpp/convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype bf16 --outfile Meta-Llama-3.1-8B.BF16.gguf ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q6_K.gguf Q6_K ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q5_K_S.gguf Q5_K_S ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q5_K_M.gguf Q5_K_M ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q4_K_M.gguf Q4_K_M ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q4_K_S.gguf Q4_K_S ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q3_K_L.gguf Q3_K_L ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q3_K_M.gguf Q3_K_M ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q3_K_S.gguf Q3_K_S ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q2_K.gguf Q2_K

Perplexity

./llama-perplexity -m Meta-Llama-3.1-8B-Q6_K.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_L.gguf -f ../wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_M.gguf -f ../wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_S.gguf -f ../wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q2_K.gguf -f ../wikitext-2-raw/wiki.test.raw

Model Perplexity
BF16 6.4006 +/- 0.03938
Q6_K 6.4231 +/- 0.03957
Q5_K_M 6.4623 +/- 0.03987
Q5_K_S 6.5161 +/- 0.04028
Q4_K_M 6.5837 +/- 0.04068
Q4_K_S 6.6751 +/- 0.04125
Q3_K_L 6.9458 +/- 0.04329
Q3_K_M 7.0488 +/- 0.04384
Q3_K_S 7.8823 +/- 0.04920
Q2_K 9.7262 +/- 0.06393

hello, have a little question, where is llama-quantize? need to build by self.

fedric95 · 2024-08-06T11:52:49Z

I made some experiments for the 8B quantized base model:

Quantization starting from FP16

git lfs install git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype f16 --outfile Meta-Llama-3.1-8B.FP16.gguf python ./convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype q8_0 --outfile Meta-Llama-3.1-8B-Q8_0.gguf ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q6_K.gguf Q6_K ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q5_K_S.gguf Q5_K_S ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q5_K_M.gguf Q5_K_M ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q4_K_M.gguf Q4_K_M ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q4_K_S.gguf Q4_K_S ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q3_K_L.gguf Q3_K_L ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q3_K_M.gguf Q3_K_M ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q3_K_S.gguf Q3_K_S ./llama-quantize ../Meta-Llama-3.1-8B.FP16.gguf ../Meta-Llama-3.1-8B-Q2_K.gguf Q2_K

Perplexity

./llama-perplexity -m Meta-Llama-3.1-8B.FP16.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q8_0.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q6_K.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_L.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q2_K.gguf -f wikitext-2-raw/wiki.test.raw
Model Perplexity
FP16 6.4016 +/- 0.03939
Q8_0 6.4070 +/- 0.03941
Q6_K 6.4231 +/- 0.03957
Q5_K_M 6.4623 +/- 0.03986
Q5_K_S 6.5173 +/- 0.04029
Q4_K_M 6.5829 +/- 0.04067
Q4_K_S 6.6742 +/- 0.04124
Q3_K_L 6.9461 +/- 0.04328
Q3_K_M 7.0468 +/- 0.04381
Q3_K_S 7.8823 +/- 0.04920
Q2_K 9.7242 +/- 0.06390

Quantization starting from BF16 (UPDATE)

git lfs install git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B python ./llama.cpp/convert_hf_to_gguf.py Meta-Llama-3.1-8B --outtype bf16 --outfile Meta-Llama-3.1-8B.BF16.gguf ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q6_K.gguf Q6_K ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q5_K_S.gguf Q5_K_S ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q5_K_M.gguf Q5_K_M ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q4_K_M.gguf Q4_K_M ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q4_K_S.gguf Q4_K_S ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q3_K_L.gguf Q3_K_L ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q3_K_M.gguf Q3_K_M ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q3_K_S.gguf Q3_K_S ./llama-quantize Meta-Llama-3.1-8B.BF16.gguf Meta-Llama-3.1-8B-Q2_K.gguf Q2_K

Perplexity

./llama-perplexity -m Meta-Llama-3.1-8B-Q6_K.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q5_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_M.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q4_K_S.gguf -f wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_L.gguf -f ../wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_M.gguf -f ../wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q3_K_S.gguf -f ../wikitext-2-raw/wiki.test.raw ./llama-perplexity -m Meta-Llama-3.1-8B-Q2_K.gguf -f ../wikitext-2-raw/wiki.test.raw
Model Perplexity
BF16 6.4006 +/- 0.03938
Q6_K 6.4231 +/- 0.03957
Q5_K_M 6.4623 +/- 0.03987
Q5_K_S 6.5161 +/- 0.04028
Q4_K_M 6.5837 +/- 0.04068
Q4_K_S 6.6751 +/- 0.04125
Q3_K_L 6.9458 +/- 0.04329
Q3_K_M 7.0488 +/- 0.04384
Q3_K_S 7.8823 +/- 0.04920
Q2_K 9.7262 +/- 0.06393

hello, have a little question, where is llama-quantize? need to build by self.

you can call it from the directory of llama.cpp #7809
llama-quantize is just the renamed version of quantize.
I have created a repo in HF with all the quantized models: https://huggingface.co/fedric95/Meta-Llama-3.1-8B-GGUF

fairydreaming · 2024-08-06T19:32:53Z

FYI after merging #8858 it's now possible to handle <|python_tag|> tool calls since generation now stops after <|eom_id|>. I've been playing with Python tool calls in Llama 3.1 running on llama.cpp server for the past few days and initial results are very encouraging. Even the smallest Llama 3.1 8B has no problems with this. Here's an example conversation I had with this model: https://pastebin.com/N0rz3yZj

steampunque · 2024-08-06T20:24:03Z

FYI after merging #8858 it's now possible to handle <|python_tag|> tool calls since generation now stops after <|eom_id|>. I've been playing with Python tool calls in Llama 3.1 running on llama.cpp server for the past few days and initial results are very encouraging. Even the smallest Llama 3.1 8B has no problems with this. Here's an example conversation I had with this model: https://pastebin.com/N0rz3yZj

Its also possible to use custom tool calls https://huggingface.co/blog/llama31#built-in-tool-calling, avoiding the need for ipython shell and eom stuff. Ask the model to make a python code block based on your query, extract it, run it, send its output into the conversation as described in the link. Works fine:

bash-5.1$ ./lmf Is it hotter in NYC, Austin, or Houston right now.
CAT get_current_conditions
nohup: redirecting stderr to stdout
TOOL : {'coord': {'lon': -74.006, 'lat': 40.7143}, 'weather': [{'id': 803, 'main': 'Clouds', 'description': 'broken clouds', 'icon': '04d'}], 'base': 'stations', 'main': {'temp': 86.14, 'feels_like': 93.7, 'temp_min': 80.56, 'temp_max': 91.99, 'pressure': 1011, 'humidity': 66, 'sea_level': 1011, 'grnd_level': 1010}, 'visibility': 10000, 'wind': {'speed': 17, 'deg': 167, 'gust': 20}, 'clouds': {'all': 75}, 'dt': 1722973854, 'sys': {'type': 1, 'id': 4610, 'country': 'US', 'sunrise': 1722938278, 'sunset': 1722989146}, 'timezone': -14400, 'id': 5128581, 'name': 'New York', 'cod': 200}
{'coord': {'lon': -97.7431, 'lat': 30.2672}, 'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}], 'base': 'stations', 'main': {'temp': 98.26, 'feels_like': 107.65, 'temp_min': 96.6, 'temp_max': 101.12, 'pressure': 1014, 'humidity': 43, 'sea_level': 1014, 'grnd_level': 990}, 'visibility': 10000, 'wind': {'speed': 8.05, 'deg': 100, 'gust': 16.11}, 'clouds': {'all': 0}, 'dt': 1722973538, 'sys': {'type': 2, 'id': 2008738, 'country': 'US', 'sunrise': 1722945174, 'sunset': 1722993643}, 'timezone': -18000, 'id': 4671654, 'name': 'Austin', 'cod': 200}
{'coord': {'lon': -95.3633, 'lat': 29.7633}, 'weather': [{'id': 802, 'main': 'Clouds', 'description': 'scattered clouds', 'icon': '03d'}], 'base': 'stations', 'main': {'temp': 96.57, 'feels_like': 109.17, 'temp_min': 93.96, 'temp_max': 99.14, 'pressure': 1014, 'humidity': 52, 'sea_level': 1014, 'grnd_level': 1011}, 'visibility': 10000, 'wind': {'speed': 1.01, 'deg': 224, 'gust': 5.01}, 'clouds': {'all': 40}, 'dt': 1722973718, 'sys': {'type': 2, 'id': 2001415, 'country': 'US', 'sunrise': 1722944653, 'sunset': 1722993022}, 'timezone': -18000, 'id': 4699066, 'name': 'Houston', 'cod': 200}
TOOL DONE
Based on the API responses, the current temperature in NYC is 86.14°F, in Austin is 98.26°F, and in Houston is 96.57°F. Therefore, it is hotter in Austin right now.

AbdullahHameedKhan · 2024-08-07T12:53:24Z

...yes currently llama 3.1 8b seems a bit dumber than llama 3 8b ... I do not know it is a gguf problem of llamacpp itself.
For instance
question "I have 10 apples. I find 3 gold coins in the bottom of a river. The river runs near a big city that has something to do with what I can spend the coins on. I then lose 4 apples but gain a gold coin. Three birds run into my path and drop 6 apples each. I play an online game and win 6 gold coins but I have to share them equally with my 2 teammates. I buy apples for all the coins I have. The price of an apple is 0.5 coins. How many apples do I have? And where is the river?"
with https://groq.com/
Always getting a proper answer - 36
Locally with llama 3.1 8b ( q8 ) - hardly getting proper answer every 5 attempts .

Same observation here. Not sure if it's issue with the model, or llama.cpp (tested Q6_K quant with b3438), but for now 3.1 feels way worse than 3.0:

Temperature 0 with both those fails. Tested with empy system, and with "You're a helpful assistant." - none of those works well. Tried with -c 8192 and -c 16384 - similar results.

Update the generation_config file. This is a template issue not a model issue of Llama 3.1

Vaibhavs10 added the enhancement New feature or request label Jul 23, 2024

Qualzz mentioned this issue Jul 23, 2024

Is llama 3.1 already supported (on 2.8) or should we wait another update ? ollama/ollama#5881

Closed

ggerganov closed this as completed in #8676 Jul 27, 2024

Lexazan mentioned this issue Jul 29, 2024

I can't run llama3.1 ollama/ollama#6048

Closed

Vaibhavs10 mentioned this issue Aug 8, 2024

Feature Request: Instructions how to correctly use/convert original llama3.1 instruct .pth model #8808

Closed

4 tasks

Feature Request: Proper Llama 3.1 Support in llama.cpp #8650

Feature Request: Proper Llama 3.1 Support in llama.cpp #8650

Comments

Vaibhavs10 commented Jul 23, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

qnixsynapse commented Jul 23, 2024

Dampfinchen commented Jul 23, 2024 • edited Loading

mirek190 commented Jul 23, 2024

tristandruyen commented Jul 23, 2024 • edited Loading

ngxson commented Jul 23, 2024 • edited Loading

m18coppola commented Jul 23, 2024

mirek190 commented Jul 23, 2024 • edited Loading

RodriMora commented Jul 23, 2024 • edited Loading

dranger003 commented Jul 23, 2024 • edited Loading

MoonRide303 commented Jul 23, 2024 • edited Loading

fairydreaming commented Jul 23, 2024

mirek190 commented Jul 23, 2024 • edited Loading

mirek190 commented Jul 23, 2024 • edited Loading

EliEron commented Jul 23, 2024 • edited Loading

vlbosch commented Jul 23, 2024

m18coppola commented Jul 23, 2024

oldgithubman commented Jul 24, 2024 • edited Loading

oldgithubman commented Jul 24, 2024 • edited Loading

steampunque commented Jul 24, 2024

qnixsynapse commented Jul 24, 2024 • edited Loading

steampunque commented Jul 24, 2024 • edited Loading

bartowski1182 commented Jul 24, 2024

bartowski1182 commented Jul 24, 2024

bartowski1182 commented Jul 24, 2024

Nottlespike commented Jul 24, 2024

Noah670 commented Jul 24, 2024

joseph777111 commented Jul 24, 2024 • edited Loading

joseph777111 commented Jul 24, 2024

gformcreation commented Jul 26, 2024

bartowski1182 commented Jul 26, 2024

tristandruyen commented Jul 26, 2024

qnixsynapse commented Jul 27, 2024

fedric95 commented Jul 31, 2024 • edited Loading

Quantization starting from FP16

Perplexity

Quantization starting from BF16 (UPDATE)

Perplexity

mirek190 commented Jul 31, 2024

Quantization

Perplexity

fedric95 commented Jul 31, 2024

Quantization

Perplexity

RodriMora commented Jul 31, 2024 • edited Loading

fedric95 commented Jul 31, 2024 • edited Loading

bopm commented Aug 1, 2024

bartowski1182 commented Aug 1, 2024

bopm commented Aug 1, 2024

bopm commented Aug 1, 2024 • edited Loading

mirek190 commented Aug 1, 2024 • edited Loading

bopm commented Aug 1, 2024 • edited Loading

mirek190 commented Aug 2, 2024

bartowski1182 commented Aug 2, 2024

fedric95 commented Aug 2, 2024

bopm commented Aug 2, 2024

bopm commented Aug 2, 2024

mirek190 commented Aug 2, 2024

bopm commented Aug 2, 2024

gavin-edward commented Aug 6, 2024

Quantization starting from FP16

Perplexity

Quantization starting from BF16 (UPDATE)

Perplexity

fedric95 commented Aug 6, 2024 • edited Loading

Quantization starting from FP16

Perplexity

Quantization starting from BF16 (UPDATE)

Perplexity

fairydreaming commented Aug 6, 2024

steampunque commented Aug 6, 2024 • edited Loading

AbdullahHameedKhan commented Aug 7, 2024

Dampfinchen commented Jul 23, 2024 •

edited

Loading

tristandruyen commented Jul 23, 2024 •

edited

Loading

ngxson commented Jul 23, 2024 •

edited

Loading

mirek190 commented Jul 23, 2024 •

edited

Loading

RodriMora commented Jul 23, 2024 •

edited

Loading

dranger003 commented Jul 23, 2024 •

edited

Loading

MoonRide303 commented Jul 23, 2024 •

edited

Loading

mirek190 commented Jul 23, 2024 •

edited

Loading

mirek190 commented Jul 23, 2024 •

edited

Loading

EliEron commented Jul 23, 2024 •

edited

Loading

oldgithubman commented Jul 24, 2024 •

edited

Loading

oldgithubman commented Jul 24, 2024 •

edited

Loading

qnixsynapse commented Jul 24, 2024 •

edited

Loading

steampunque commented Jul 24, 2024 •

edited

Loading

joseph777111 commented Jul 24, 2024 •

edited

Loading

fedric95 commented Jul 31, 2024 •

edited

Loading

RodriMora commented Jul 31, 2024 •

edited

Loading

fedric95 commented Jul 31, 2024 •

edited

Loading

bopm commented Aug 1, 2024 •

edited

Loading

mirek190 commented Aug 1, 2024 •

edited

Loading

bopm commented Aug 1, 2024 •

edited

Loading

fedric95 commented Aug 6, 2024 •

edited

Loading

steampunque commented Aug 6, 2024 •

edited

Loading