bigscience/train at master · bigscience-workshop/bigscience

History

Name		Name	Last commit message	Last commit date
parent directory ..
tr1-13B-base		tr1-13B-base
tr10-13B-ml		tr10-13B-ml
tr11-176B-ml		tr11-176B-ml
tr12-1B3-oscar		tr12-1B3-oscar
tr13-mtf		tr13-mtf
tr14-mup		tr14-mup
tr2		tr2
tr3-1B3-baseline		tr3-1B3-baseline
tr4-1B3-rotary		tr4-1B3-rotary
tr5-1B3-multilingual		tr5-1B3-multilingual
tr6-1B3-prefix-lm		tr6-1B3-prefix-lm
tr7-alibi		tr7-alibi
tr8-104B-wide		tr8-104B-wide
tr8b-104B		tr8b-104B
tr9-glu		tr9-glu
README.md		README.md
arch-and-scaling-template.slurm		arch-and-scaling-template.slurm
fixes.md		fixes.md
lessons-learned.md		lessons-learned.md
memory.md		memory.md
sanity-checks.md		sanity-checks.md
tflops_optimization.md		tflops_optimization.md

README.md

Training scripts

This folder gathers training scripts for the different arch/scaling and engineering experiments. The naming convention is tr<number>-<short-description>. The current baseline that architecture and scaling experiments compare to is tr3d. In order to launch a new experiment, you should probably start from the arch-and-scaling template.

Some tips:

TFlops optimization: How to make sure that given a set of hardware you optimize the speed at which you train.
Instrumentation: How to sync with the hub

Stored checkpoints

Location of the checkpoints of the trained models plus logs and anything else of importance - e.g. eval harness results:

tr1-13B: gs://bigscience-backups/tr1-13B/
tr3m-1B3-emb-norm-pile: $six_ALL_CCFRSTORE/checkpoints/tr3m-1B3-emb-norm-pile
tr4-1B3-rotary: `$six_ALL_CCFRSTORE/checkpoints/
tr4b-350M-rotary: `$six_ALL_CCFRSTORE/checkpoints/
tr4c-1B3-rotary-oscar: $six_ALL_CCFRSTORE/checkpoints/tr4c-1B3-rotary-oscar
tr6-1B3-prefix-lm: $six_ALL_CCFRSTORE/checkpoints/tr6-1B3-prefix-lm
tr6-1B3-prefix-lm-unbiased-loss: $six_ALL_CCFRSTORE/checkpoints/tr6-1B3-prefix-lm-unbiased-loss
tr6b-350M-prefix-lm: $six_ALL_CCFRSTORE/checkpoints/tr6b-350M-prefix-lm
tr6b-350M-prefix-lm-PP2: $six_ALL_CCFRSTORE/checkpoints/tr6b-350M-prefix-lm-PP2
tr6b-350M-prefix-lm-unbiased-loss: $six_ALL_CCFRSTORE/checkpoints/tr6b-350M-prefix-lm-unbiased-loss
tr6c-350M-prefix-lm-reset-attention-mask: $six_ALL_CCFRSTORE/checkpoints/tr6c-350M-prefix-lm-reset-attention-mask
tr6c-350M-prefix-lm-reset-attention-mask.backup: $six_ALL_CCFRSTORE/checkpoints/tr6c-350M-prefix-lm-reset-attention-mask.backup
tr6d-350M-prefix-lm-pile: $six_ALL_CCFRSTORE/checkpoints/tr6d-350M-prefix-lm-pile
tr6e-1B3-pile: $six_ALL_CCFRSTORE/checkpoints/tr6e-1B3-pile
tr6f-1B3-oscar-no-loss-on-targets-only: $six_ALL_CCFRSTORE/checkpoints/tr6f-1B3-oscar-no-loss-on-targets-only
tr6g-1B3-oscar-loss-reweighting: $six_ALL_CCFRSTORE/checkpoints/tr6g-1B3-oscar-loss-reweighting
tr7a-1B3-alibi (not a real alibi pos embedding experiment - the alibi matrix were not used in this experiment): $six_ALL_CCFRSTORE/checkpoints/tr7a-1B3-alibi
tr7b-350-alibi (not a real alibi pos embedding experiment - the alibi matrix were not used in this experiment): $six_ALL_CCFRSTORE/checkpoints/tr7b-350M-alibi
tr7d-1B3-alibi: six_ALL_CCFRSTORE/checkpoints/tr7d-1B3-alibi
tr9b-350M-swiglu: six_ALL_CCFRSTORE/checkpoints/tr9b-350M-swiglu
tr9c-1B3-swiglu-pile: six_ALL_CCFRSTORE/checkpoints/tr9b-1B3-swiglu-pile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train

train

README.md

Training scripts

Stored checkpoints

Files

train

Directory actions

More options

Directory actions

More options

Latest commit

History

train

Folders and files

parent directory

README.md

Training scripts

Stored checkpoints