Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I train my models? #31

Open
zxk907 opened this issue Jul 1, 2024 · 8 comments
Open

How do I train my models? #31

zxk907 opened this issue Jul 1, 2024 · 8 comments

Comments

@zxk907
Copy link

zxk907 commented Jul 1, 2024

Hello Professor, thank you for your contribution. As a new student, I am very interested in your research. When I read the paper and wanted to find more details from the running of the code, I encountered problems. Your paper states that there are benefits in model training in terms of storage, etc., but I don't know how to train my model, if you can provide more guidance or open source one of the simple demos, it would be very helpful. Thank you very much!

@ridgerchu
Copy link
Owner

Hi, you can refer to this link to learn how to train.

@zxk907
Copy link
Author

zxk907 commented Jul 4, 2024

Hi, you can refer to this link to learn how to train.

Thank you very much for your guidance, I made it according to your tips.

@KETTY2
Copy link

KETTY2 commented Aug 12, 2024

can you give more information to me about how you trained the model??? I tried but 'CUDA error: device-side assert triggered ' occurred.

@huopan
Copy link

huopan commented Aug 26, 2024

Hi professor @ridgerchu , I am trying to train an embedding model based on matmul free pretrained model, just use the HGRNBitModel last_hidden_states output's last token vector to represent the sentence vector, and train with E5 sythetic data, but it shows the loss didn't drop with different learning rate. I have finetuned BitNet to an embedding model in this way and it succeeded, does matmul free model has some difference with other llm model? do you have any insights on it?

@ridgerchu
Copy link
Owner

Hello @huopan, when training this model, we usually apply about 10x larger learning rate due to the ternary weight.

@huopan
Copy link

huopan commented Sep 2, 2024

Hello @huopan, when training this model, we usually apply about 10x larger learning rate due to the ternary weight.

Thanks @ridgerchu , I have tried larger learning rate and it works, another question is does this model support AMD GPU training? like MI200. And can it cooperate with deepspeed, I have a error when training on MI200 and deepspeed. the error is "RuntimeError: Triton Error [HIP]: Code: 1, Messsage: invalid argument", do you have any insights?
image

@ridgerchu
Copy link
Owner

ridgerchu commented Sep 3, 2024

Hi @huopan , this is due to triton does not support AMD ROCm until its 3.0 version, but our framework does not tested on ROCm triton version...

@huopan
Copy link

huopan commented Sep 4, 2024

Hi @huopan , this is due to triton does not support AMD ROCm until its 3.0 version, but our framework does not tested on ROCm triton version...

ok, thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants