-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model generating random sequence #11
Comments
Did you trained the model? |
No, sadly not: trying to train with your code will make my gpu run out of memory, and trying to run it with LoRA will break the model, printing (under inference) that " next_token = torch.multinomial( |
oh I think LoRA is not compatible with this: 'cause model have to get a chance to learn 'how to use long term memory' but if you initiate with LoRA, except you explicitly declare to learn 'gate' params, model might loose chance to learn it. How about try with: |
Playing around I managed to stop getting the "inf" error: other than add "modules_to_save" in the LoRA config, I was loading the model in fp16: turning it back to "torch_dtype="auto"" fixed the training process. Loss starts still high (about 22), lowers down to 8 in a few epochs with a very small test database, and still generate random tokens. I'm now trying with some bigger dataset (still, sadly, split into smaller pieces...) and I'll see again what happens. With LoRA I'm targeting all the basic modules: l_config = LoraConfig( I'll let you know if that works |
After about 40 mins of training with 4b precision and 600 block_size (I can't train the model with 8b precision, max block size before going out of memory was 15) the loss went down from 22 to 11, and sentences were less random (english was also mroe common). I guess that with enough time I could have decent output for a 4b model, but since I'm limited to a 3060 and 12GB, I'll have to wait for someone to release an open model |
Oh your loss seems pretty high. |
LoRa is not compatible with nn.Parameter so you can't train the gate with LoRa, you can switch it to nn.Embedding which works with LoRa but need a little modification on the code |
I'll try, but I'm still studying deep learning and transformers models, not sure I will make it work. Any chance to release a trained model with 1mln context? 👀 |
By saving the model and reloading it I managed to get the model working, both with quantized and full precision (it still uses 10gb max of gpu ram).
However, the model generates random characters. Here's the output:
When the model is printed it correctly says "GemmaInfiniAttention" for the self_attn layers, but it still generates random characters. What am I doing wrong?
The text was updated successfully, but these errors were encountered: