Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "blocked"/"flash" cross entropy #790

Merged
merged 46 commits into from
Nov 6, 2024
Merged
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
fa9fd25
Add llama fineweb yaml
Ivan-Zhou Jun 25, 2024
ac80e57
small modification
Ivan-Zhou Jun 25, 2024
8b9bd78
pre commit checks
Ivan-Zhou Jun 25, 2024
525f4a6
mypy
Ivan-Zhou Jun 25, 2024
92425ca
add fire
Ivan-Zhou Jun 25, 2024
1d9f6f8
add more html
Ivan-Zhou Jun 25, 2024
b1c0905
add more md urls
Ivan-Zhou Jun 25, 2024
89b6192
delete get_files_on_gcs.py
Ivan-Zhou Jun 26, 2024
0b34522
CC-MAIN-*/*/*_processed_md.jsonl.gz
Ivan-Zhou Jun 27, 2024
4aa2f2c
Adding configs related to DCLM
abhinavg4 Jul 18, 2024
dde9ed0
Adding configs related to DCLM
abhinavg4 Jul 19, 2024
b991e29
Adding Z loss
abhinavg4 Jul 19, 2024
bb674bb
pre commit changes
abhinavg4 Jul 19, 2024
6c99dfb
Adding z_loss as part of train_lm.py
abhinavg4 Jul 19, 2024
24469e7
Reverting changes to llama.py for z_loss
abhinavg4 Jul 19, 2024
e12c1b6
Merge remote-tracking branch 'origin/dclm' into dclm
dlwh Aug 20, 2024
c9ebc88
match specs in dclm
dlwh Aug 20, 2024
7727696
publish dev build
dlwh Aug 21, 2024
55e4d98
wip
dlwh Aug 21, 2024
de51236
fix imports and such
dlwh Aug 22, 2024
7863989
get default zone from gcloud config
dlwh Aug 22, 2024
a550bb5
factor out docker command, build
dlwh Aug 22, 2024
6341252
Merge remote-tracking branch 'origin/main' into dclm
dlwh Aug 22, 2024
e9ca517
Merge remote-tracking branch 'origin/main' into dclm
dlwh Aug 28, 2024
d674dd9
wip
dlwh Aug 29, 2024
06dc304
wip
dlwh Aug 29, 2024
f13cfde
bump equinox
dlwh Sep 5, 2024
8d3dfe0
wip
dlwh Sep 6, 2024
8ecb7ea
768
dlwh Sep 6, 2024
0ea3eb4
Merge remote-tracking branch 'origin/main' into dclm
dlwh Oct 14, 2024
2f53923
wip
dlwh Oct 21, 2024
9050258
wip
dlwh Oct 30, 2024
b15e5d3
Merge remote-tracking branch 'origin/main' into blocked_cross_entropy
dlwh Oct 30, 2024
d6a3ded
wip
dlwh Oct 30, 2024
2e25357
it works?!?
dlwh Nov 4, 2024
2390058
tuning. just about there
dlwh Nov 5, 2024
795fd08
wip
dlwh Nov 5, 2024
05afef0
Merge remote-tracking branch 'origin/main' into blocked_cross_entropy
dlwh Nov 6, 2024
0de1482
pre-commit
dlwh Nov 6, 2024
fc01b9e
remove stray files
dlwh Nov 6, 2024
d907aa4
implement lm_head
dlwh Nov 6, 2024
eee3ecd
implement lm_head
dlwh Nov 6, 2024
879d5c0
misc test fixes
dlwh Nov 6, 2024
6fe0fb8
Merge remote-tracking branch 'origin/blocked_cross_entropy' into bloc…
dlwh Nov 6, 2024
5cf5f17
increase tolerances
dlwh Nov 6, 2024
db08341
increase tolerances
dlwh Nov 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Adding configs related to DCLM
abhinavg4 committed Jul 19, 2024
commit dde9ed0b2557abbc3a74012d865322c779a7155d
30 changes: 30 additions & 0 deletions config/llama_1b_dclm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
data: !include data/dclm_gpt_neo.yaml
model: # 1B class model
type: llama
seq_len: 2048
hidden_dim: 2048
intermediate_dim: 8192
num_layers: 24
num_heads: 16
num_kv_heads: 16
use_flash_attention: True
flash_attention_block_size: 1024
trainer:
tracker:
type: wandb
project: "marin"
tags: ["llama", "fineweb", "markdown"]

mp: p=f32,c=bfloat16
train_batch_size: 256 # 2048 * 2048 = 4,194,304
num_train_steps: 71526 # 300,000,000,000 / 4,194,304 = 71,526
steps_per_eval: 1000
tensor_parallel_axes: ["mlp", "heads"]
fsdp_axis: "embed"
batch_axis: "batch"
optimizer:
learning_rate: 3E-3
weight_decay: 0.033
min_lr_ratio: 0.1
warmup: 5000
cooldown: 3E-5