You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
thanks for sharing the code. I have tryed to use your repo using bitsandbytes for model quantization. Unfortunately, the training process does not work: the layers defined in modelling_llama.py as
do not get trained, and after finetuning they contain only nanvalues. I guess it is a data type conflict, as the hidden layers are loaded in 4/8 bits, while the classifier is still saved in memory as float16... Any clue/plan on how to fix that?
The text was updated successfully, but these errors were encountered:
Hi @ferrazzipietro, we didn’t test 4/8 bit training. Which backbone do you use? If the backbone is not LLaMA, it is better to specify the targert_modules explicitly.
I have tried Llama and Mistral, both resulting in nans weights. I've seen the new repo as well, but the issue persists. I will let you know if I'll have the chance to deep into it!
Hi,
thanks for sharing the code. I have tryed to use your repo using
bitsandbytes
for model quantization. Unfortunately, the training process does not work: the layers defined inmodelling_llama.py
asdo not get trained, and after finetuning they contain only
nan
values. I guess it is a data type conflict, as the hidden layers are loaded in 4/8 bits, while the classifier is still saved in memory as float16... Any clue/plan on how to fix that?The text was updated successfully, but these errors were encountered: