You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I quantized a (Uncensored) QLORA Merge of a Llama v2 model.
When I quantized 13B, it came out perfect (used 8.0 bpw)
But for 70B, the model came out totally censored and nothing like it was supposed to (no gibberish, but it's totally censored)
Can you help:
For --cal_dataset, I merged the QLORA's uncensored dataset in a single .parquet file. This seems to work really well for 13B. Did I do that incorrectly and just got lucky? Should I be using the wikitext-test.parquet to calibrate instead?
For 13B, I used default params and 8.0 bpw. For 70B, which came out censored, I used default params, and 5.0 bpw. Do you recommend a different set of params for the other args, like measurement_length, and the others?
I ran both quantized models on Oobabooga, same machine
I'm also redoing my work totally from scratch to make sure I didn't mix anything up - but would really appreciate some input here...
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
I quantized a (Uncensored) QLORA Merge of a Llama v2 model.
When I quantized 13B, it came out perfect (used 8.0 bpw)
But for 70B, the model came out totally censored and nothing like it was supposed to (no gibberish, but it's totally censored)
Can you help:
I ran both quantized models on Oobabooga, same machine
I'm also redoing my work totally from scratch to make sure I didn't mix anything up - but would really appreciate some input here...
Thanks very much!
Beta Was this translation helpful? Give feedback.
All reactions