SFT Script and Hyperparameters used for DBRX-Instruct #99

alpayariyak · 2024-03-28T20:46:03Z

Hi, I saw you mentioned that you used your fork of Megatron-LM for training - could you please provide scripts and hyperparams used for the SFT of DBRX? It would mean the world for the OSS community!

At openchat, we'd like to fine-tune your model on our data and open source it.

alpayariyak · 2024-03-28T20:46:49Z

The training would be on H100s.

Another question - how many do you need at minimum?

mvpatel2000 · 2024-03-29T01:04:03Z

@tgale96 might have scripts for megatron LM integration

We will have integrations with other stacks soon.

For DBRX specifically, you do not necessarily need to use megablocks (though it is more efficient) -- Zero3 + the HF model code is sufficient. For example, foundry would work with this: https://github.com/mosaicml/llm-foundry

CC: @dakinggg

alpayariyak · 2024-03-29T02:12:54Z

Thank you very much! Do you have insight into the hyperparameters used for DBRX Instruct?

Hyperparameter exploration on this scale is very expensive and out of reach for most of the open source, so this would be incredibly helpful to have.

alpayariyak · 2024-03-29T02:19:33Z

If there's any chance you could confirm, might these be the hyperparams used for DBRX Instruct?
https://github.com/mosaicml/llm-foundry/blob/7a8a1564827cbcbc281a6bdc4a11bc8f584142bd/scripts/train/yamls/finetune/dbrx-full-ft.yaml

alpayariyak · 2024-03-29T02:23:38Z

One more question (if the above config is what was actually used) - it is noted there that 8x8x80GB are required for the fine-tune. Would you mind sharing approximately the number of tokens or sft examples, the GPUs you used and how long this took?

alpayariyak mentioned this issue Mar 28, 2024

Does this framework support SFT? #90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFT Script and Hyperparameters used for DBRX-Instruct #99

SFT Script and Hyperparameters used for DBRX-Instruct #99

alpayariyak commented Mar 28, 2024

alpayariyak commented Mar 28, 2024

mvpatel2000 commented Mar 29, 2024

alpayariyak commented Mar 29, 2024 •

edited

Loading

alpayariyak commented Mar 29, 2024

alpayariyak commented Mar 29, 2024

SFT Script and Hyperparameters used for DBRX-Instruct #99

SFT Script and Hyperparameters used for DBRX-Instruct #99

Comments

alpayariyak commented Mar 28, 2024

alpayariyak commented Mar 28, 2024

mvpatel2000 commented Mar 29, 2024

alpayariyak commented Mar 29, 2024 • edited Loading

alpayariyak commented Mar 29, 2024

alpayariyak commented Mar 29, 2024

alpayariyak commented Mar 29, 2024 •

edited

Loading