-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SFT Script and Hyperparameters used for DBRX-Instruct #99
Comments
The training would be on H100s. Another question - how many do you need at minimum? |
@tgale96 might have scripts for megatron LM integration We will have integrations with other stacks soon. For DBRX specifically, you do not necessarily need to use megablocks (though it is more efficient) -- Zero3 + the HF model code is sufficient. For example, foundry would work with this: https://github.com/mosaicml/llm-foundry CC: @dakinggg |
Thank you very much! Do you have insight into the hyperparameters used for DBRX Instruct? Hyperparameter exploration on this scale is very expensive and out of reach for most of the open source, so this would be incredibly helpful to have. |
If there's any chance you could confirm, might these be the hyperparams used for DBRX Instruct? |
One more question (if the above config is what was actually used) - it is noted there that 8x8x80GB are required for the fine-tune. Would you mind sharing approximately the number of tokens or sft examples, the GPUs you used and how long this took? |
Hi, I saw you mentioned that you used your fork of Megatron-LM for training - could you please provide scripts and hyperparams used for the SFT of DBRX? It would mean the world for the OSS community!
At openchat, we'd like to fine-tune your model on our data and open source it.
The text was updated successfully, but these errors were encountered: