You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you so much for releasing this great code base!
I noticed that your Laion blog says that the pre-training of OpenLM 1B/7B took place on 128 or 256 A100s. Therefore, I'm wondering if the current code supports multi-node training? The current training command seems to only use 4 gpus on 1 node.
Thank you very much!
The text was updated successfully, but these errors were encountered:
Yes, OpenLM supports multi-node training. The standard torchrun multi-node setup should work fine. If you are using something like AWS sagemaker, we also have sample codes here
Hi, thank you so much for releasing this great code base!
I noticed that your Laion blog says that the pre-training of OpenLM 1B/7B took place on 128 or 256 A100s. Therefore, I'm wondering if the current code supports multi-node training? The current training command seems to only use 4 gpus on 1 node.
Thank you very much!
The text was updated successfully, but these errors were encountered: