Replies: 9 comments
-
The problem with using distributed networks for training large models is the massive bandwidth needed to transfer gradients. Models would likely be in the billions of parameters and sending this data back and forward from nodes to a central server every training step would make training unfeasible due to the latency it would create. |
Beta Was this translation helpful? Give feedback.
-
@someone13574 In what form will these gradients be transmitted? What data sizes are we talking about? Is it possible to represent them in binary form and compress them, for example with zstd? I don't think sending huge amounts of data over net is actually a problem. It's cheaper to get more bandwidth than getting another few GPU's in rent. |
Beta Was this translation helpful? Give feedback.
-
It is not super easy to do distributed training, but there are specialized groups, e.g. check https://github.com/learning-at-home/hivemind |
Beta Was this translation helpful? Give feedback.
-
@andreaskoepf there's also BONIC-driven solutions |
Beta Was this translation helpful? Give feedback.
-
I would take inspiration from earlier LAION success and lean more towards distributed pre-processing of training data. There is plenty of cost in things like the reinforcement learning rollouts of RLHF, Chain of Thought, and ToolFormer. |
Beta Was this translation helpful? Give feedback.
-
I think distributed pre-processing would work great for specific elements, such as the “execution of API calls”-step in ToolFormer. You could:
Excerpt from the ToolFormer paper:
You would not be transferring millions of parameters, gradients, etc. You would be sending urls to docker images and the API calls you want executed, and you would be receiving responses from these API calls. Validation of responses could be simple majority voting. |
Beta Was this translation helpful? Give feedback.
-
Very interesting idea. If someone is interested in closer planning and prototyping this please let us know (either here or OA discord). |
Beta Was this translation helpful? Give feedback.
-
I am not really familiar with all of these, but I can implement it if I'll get more in-depth explanation. |
Beta Was this translation helpful? Give feedback.
-
The amount of data in the dataset is growing at an incredible rate, so I suggest leveraging the power of the community by, for example, installing a special node on the user's computer in a Docker container that would allow Open Assistant developers to create tasks on these nodes to train the model.
Benefits:
Cons:
Feel free to discuss
Beta Was this translation helpful? Give feedback.
All reactions