How do I train my models? #31

zxk907 · 2024-07-01T10:00:55Z

Hello Professor, thank you for your contribution. As a new student, I am very interested in your research. When I read the paper and wanted to find more details from the running of the code, I encountered problems. Your paper states that there are benefits in model training in terms of storage, etc., but I don't know how to train my model, if you can provide more guidance or open source one of the simple demos, it would be very helpful. Thank you very much!

ridgerchu · 2024-07-02T15:11:34Z

Hi, you can refer to this link to learn how to train.

zxk907 · 2024-07-04T08:09:08Z

Hi, you can refer to this link to learn how to train.

Thank you very much for your guidance, I made it according to your tips.

KETTY2 · 2024-08-12T05:02:09Z

can you give more information to me about how you trained the model??? I tried but 'CUDA error: device-side assert triggered ' occurred.

huopan · 2024-08-26T07:15:22Z

Hi professor @ridgerchu , I am trying to train an embedding model based on matmul free pretrained model, just use the HGRNBitModel last_hidden_states output's last token vector to represent the sentence vector, and train with E5 sythetic data, but it shows the loss didn't drop with different learning rate. I have finetuned BitNet to an embedding model in this way and it succeeded, does matmul free model has some difference with other llm model? do you have any insights on it?

ridgerchu · 2024-08-26T09:58:36Z

Hello @huopan, when training this model, we usually apply about 10x larger learning rate due to the ternary weight.

huopan · 2024-09-02T07:18:39Z

Hello @huopan, when training this model, we usually apply about 10x larger learning rate due to the ternary weight.

Thanks @ridgerchu , I have tried larger learning rate and it works, another question is does this model support AMD GPU training? like MI200. And can it cooperate with deepspeed, I have a error when training on MI200 and deepspeed. the error is "RuntimeError: Triton Error [HIP]: Code: 1, Messsage: invalid argument", do you have any insights?

ridgerchu · 2024-09-03T11:58:53Z

Hi @huopan , this is due to triton does not support AMD ROCm until its 3.0 version, but our framework does not tested on ROCm triton version...

huopan · 2024-09-04T02:15:11Z

Hi @huopan , this is due to triton does not support AMD ROCm until its 3.0 version, but our framework does not tested on ROCm triton version...

ok, thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I train my models? #31

How do I train my models? #31

zxk907 commented Jul 1, 2024

ridgerchu commented Jul 2, 2024

zxk907 commented Jul 4, 2024

KETTY2 commented Aug 12, 2024

huopan commented Aug 26, 2024

ridgerchu commented Aug 26, 2024

huopan commented Sep 2, 2024

ridgerchu commented Sep 3, 2024 •

edited

Loading

huopan commented Sep 4, 2024

How do I train my models? #31

How do I train my models? #31

Comments

zxk907 commented Jul 1, 2024

ridgerchu commented Jul 2, 2024

zxk907 commented Jul 4, 2024

KETTY2 commented Aug 12, 2024

huopan commented Aug 26, 2024

ridgerchu commented Aug 26, 2024

huopan commented Sep 2, 2024

ridgerchu commented Sep 3, 2024 • edited Loading

huopan commented Sep 4, 2024

ridgerchu commented Sep 3, 2024 •

edited

Loading