Llama2 #524

xiezipeng-ML · 2023-12-05T07:30:30Z

libai sft model_0000499(base on Llama-2-7b-hf):

{
    'results': {
        'hellaswag': {'acc': 0.5746863174666401, 'acc_stderr': 0.004933800927560528, 'acc_norm': 0.759609639514041, 'acc_norm_stderr': 0.004264472071282531}
    }, 
    'versions': {
        'hellaswag': 0
    }, 
    'config': {
        'model': 'llama', 'batch_size': 1, 'device': 'cuda:0', 'num_fewshot': 0, 'limit': None, 'bootstrap_iters': 100000
        }
}

Llama-2-7b-chat-hf

hf-causal (pretrained=/data/home/xiezipeng/hf_models/meta-llama/Llama-2-7b-chat-hf), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|  Task   |Version| Metric |Value |   |Stderr|
|---------|------:|--------|-----:|---|-----:|
|hellaswag|      0|acc     |0.5778|±  |0.0049|
|         |       |acc_norm|0.7553|±  |0.0043|

libai/models/utils/model_loader/base_loader.py

L1aoXingyu · 2023-12-11T07:19:55Z

projects/Llama/configs/llama_sft.py

+train.update(
+    dict(
+        output_dir="./sft_result",
+        train_micro_batch_size=1,


batch_size 只能跑 1 吗

在A100上测的 2dp+4pp，batch_size最大只能是2

在A100上测的 2dp+4pp，batch_size最大只能是2

有没有试过其他的并行组合，比如 pp + tp ，还有 activation_checkpoint 有打开吗？

测了下2dp+2tp+2pp最大只能micro_batch_size=2，activation_checkpoint 稍后支持一下

L1aoXingyu · 2023-12-11T07:27:12Z

projects/Llama/train_net.py

+        model = build_model(cfg.model)
+        logger = logging.getLogger(__name__)
+        logger.info("Model:\n{}".format(model))
+        model._apply(dist.convert_to_distributed_default_setting)


_apply 接口一般不在外面调用，考虑改成 apply 或者导出一个新接口

projects/Llama/utils/eval_adapter.py

projects/Llama/utils/prepare_alpaca.py

xiezipeng-ML added 17 commits November 9, 2023 10:36

add llama

254a882

reformat & update requirements

44c2cf3

refine

df7f35e

refine

3563318

reformat

c5fee90

add pipeline

044de1d

add alpaca and test

d89d478

add sft

403f5fc

update rotary embedding

e5e3029

refine

26cba97

support llama trainer

67c3377

modify rotary embed

df66856

test sft ckpt

fd46caf

support EleutherAI

fba489a

refine

1d49d8e

refine and add readme

4291328

fix inference in float16

c3c3043

xiezipeng-ML requested a review from L1aoXingyu December 7, 2023 08:19

L1aoXingyu reviewed Dec 11, 2023

View reviewed changes

xiezipeng-ML and others added 4 commits December 12, 2023 15:19

fix mask dtype

55a74b7

add activation_checkpoint

47f352b

refine by comment

1b218d4

Merge branch 'main' into llama2

503524e

xiezipeng-ML requested a review from oneflow-ci-bot December 12, 2023 14:46

refine

0b24b45

xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot December 14, 2023 09:26

L1aoXingyu approved these changes Dec 15, 2023

View reviewed changes

xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot December 15, 2023 07:34

xiezipeng-ML merged commit ddb5ea1 into main Dec 18, 2023
4 checks passed

xiezipeng-ML deleted the llama2 branch December 18, 2023 03:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama2 #524

Llama2 #524

xiezipeng-ML commented Dec 5, 2023 •

edited

Loading

L1aoXingyu Dec 11, 2023

xiezipeng-ML Dec 11, 2023

Ldpe2G Dec 11, 2023 •

edited

Loading

xiezipeng-ML Dec 12, 2023

L1aoXingyu Dec 11, 2023

Llama2 #524

Llama2 #524

Conversation

xiezipeng-ML commented Dec 5, 2023 • edited Loading

L1aoXingyu Dec 11, 2023

Choose a reason for hiding this comment

xiezipeng-ML Dec 11, 2023

Choose a reason for hiding this comment

Ldpe2G Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

xiezipeng-ML Dec 12, 2023

Choose a reason for hiding this comment

L1aoXingyu Dec 11, 2023

Choose a reason for hiding this comment

xiezipeng-ML commented Dec 5, 2023 •

edited

Loading

Ldpe2G Dec 11, 2023 •

edited

Loading