-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I train my models? #31
Comments
Hi, you can refer to this link to learn how to train. |
Thank you very much for your guidance, I made it according to your tips. |
can you give more information to me about how you trained the model??? I tried but 'CUDA error: device-side assert triggered ' occurred. |
Hi professor @ridgerchu , I am trying to train an embedding model based on matmul free pretrained model, just use the HGRNBitModel last_hidden_states output's last token vector to represent the sentence vector, and train with E5 sythetic data, but it shows the loss didn't drop with different learning rate. I have finetuned BitNet to an embedding model in this way and it succeeded, does matmul free model has some difference with other llm model? do you have any insights on it? |
Hello @huopan, when training this model, we usually apply about 10x larger learning rate due to the ternary weight. |
Thanks @ridgerchu , I have tried larger learning rate and it works, another question is does this model support AMD GPU training? like MI200. And can it cooperate with deepspeed, I have a error when training on MI200 and deepspeed. the error is "RuntimeError: Triton Error [HIP]: Code: 1, Messsage: invalid argument", do you have any insights? |
Hi @huopan , this is due to triton does not support AMD ROCm until its 3.0 version, but our framework does not tested on ROCm triton version... |
ok, thanks a lot! |
Hello Professor, thank you for your contribution. As a new student, I am very interested in your research. When I read the paper and wanted to find more details from the running of the code, I encountered problems. Your paper states that there are benefits in model training in terms of storage, etc., but I don't know how to train my model, if you can provide more guidance or open source one of the simple demos, it would be very helpful. Thank you very much!
The text was updated successfully, but these errors were encountered: