Skip to content

Commit

Permalink
Corrected T5x large baselines
Browse files Browse the repository at this point in the history
Updated T5x-large MNLI and SQUAD baselines
  • Loading branch information
terrykong committed Aug 27, 2023
1 parent 985ff36 commit 1fa57af
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/usage/gpu-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ For our Pile convergence runs, we used a Global batch size of 2304 for XXL and 2
| size | GPU | Precision | #GPUs | TP | BS / GPU | Sequences/Sec | Seq/Sec/GPU | Est. Walltime | GPU-days | MNLI 2.0 - matched | SQuAD v1.1 (EM/F1) | Convergence Log | Config |
| ---- | ------------ | --------- | ----- | ----- | -------- | ------------- | ----------- | ------------- | -------- |------------------ | ------------------ | --------------- | ---- |
| [T5-v1.1-small](../t5/t5_1_1/small.gin) | A100 80G SXM | bf16 | 8 | 1 | 256 | ~5712 | 714 | 4.2 days | 33 | 83.06% | 78.33 / 86.63 | [log](https://tensorboard.dev/experiment/lWnHal7PRnOLeZuewyWVxQ/#scalars&_smoothingWeight=0) | [pile](../t5/t5_1_1/examples/small_pile_pretrain.gin)
| [T5-v1.1-large](../t5/t5_1_1/large.gin) | A100 80G SXM | bf16 | 64 | 1 | 32 | ~4853 | 75.8 | 4.8 days | 309 | 90.50% | 87.31 / 94.04 | [log](https://tensorboard.dev/experiment/aOxJBIvTQBeTJ8XGXxaL6Q/#scalars&_smoothingWeight=0) |[pile](../t5/t5_1_1/examples/large_pile_pretrain.gin)
| [T5-v1.1-large](../t5/t5_1_1/large.gin) | A100 80G SXM | bf16 | 64 | 1 | 32 | ~4853 | 75.8 | 4.8 days | 309 | 89.23% | 86.12 / 93.21 | [log](https://tensorboard.dev/experiment/aOxJBIvTQBeTJ8XGXxaL6Q/#scalars&_smoothingWeight=0) |[pile](../t5/t5_1_1/examples/large_pile_pretrain.gin)
| [T5-v1.1-xl](../t5/t5_1_1/xl.gin) | A100 80G SXM | bf16 | 144 | 1 | 8 | ~3021 | 21.0 | 7.9 days | 1,133 | N/A(perf test) | N/A (perf test) | |[pile](../t5/t5_1_1/examples/xl_pile_pretrain.gin)
| [T5-v1.1-xl](../t5/t5_1_1/xl.gin) | A100 80G SXM | bf16 | 256 | 1 | 8 | ~4322 | 16.9 | 5.5 days | 1,408 | 91.15% | 89.36 / 95.29 | [log](https://tensorboard.dev/experiment/vuRoEYgkRgWiEtbvgxlOqw/#scalars&_smoothingWeight=0) |[pile](../t5/t5_1_1/examples/xl_pile_pretrain.gin)
| [T5-v1.1-xxl](../t5/t5_1_1/xxl.gin) | A100 80G SXM | bf16 | 512 | 8 | 36 | ~1887 | 3.69 | 12.6 days | 6,431 |N/A(partial run) | N/A(partial run) | |[pile](../t5/t5_1_1/examples/xxl_pile_pretrain.gin)
Expand Down

0 comments on commit 1fa57af

Please sign in to comment.