Bitsandbytes quantization extension #19

ferrazzipietro · 2024-05-03T15:11:27Z

Hi,
thanks for sharing the code. I have tryed to use your repo using bitsandbytes for model quantization. Unfortunately, the training process does not work: the layers defined in modelling_llama.py as

        self.dropout = nn.Dropout(classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

do not get trained, and after finetuning they contain only nanvalues. I guess it is a data type conflict, as the hidden layers are loaded in 4/8 bits, while the classifier is still saved in memory as float16... Any clue/plan on how to fix that?

The text was updated successfully, but these errors were encountered:

SeanLee97 · 2024-05-06T05:55:45Z

Hi @ferrazzipietro, we didn’t test 4/8 bit training. Which backbone do you use? If the backbone is not LLaMA, it is better to specify the targert_modules explicitly.

SeanLee97 · 2024-05-06T05:56:43Z

BTW, you can also try to use https://github.com/WhereIsAI/BiLLM.
This one supports the latest transformers.

ferrazzipietro · 2024-05-06T15:01:58Z

I have tried Llama and Mistral, both resulting in nans weights. I've seen the new repo as well, but the issue persists. I will let you know if I'll have the chance to deep into it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bitsandbytes quantization extension #19

Bitsandbytes quantization extension #19

ferrazzipietro commented May 3, 2024

SeanLee97 commented May 6, 2024

SeanLee97 commented May 6, 2024

ferrazzipietro commented May 6, 2024 •

edited

Loading

Bitsandbytes quantization extension #19

Bitsandbytes quantization extension #19

Comments

ferrazzipietro commented May 3, 2024

SeanLee97 commented May 6, 2024

SeanLee97 commented May 6, 2024

ferrazzipietro commented May 6, 2024 • edited Loading

ferrazzipietro commented May 6, 2024 •

edited

Loading