Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama2 #524

Merged
merged 22 commits into from
Dec 18, 2023
Merged

Llama2 #524

merged 22 commits into from
Dec 18, 2023

Conversation

xiezipeng-ML
Copy link
Contributor

@xiezipeng-ML xiezipeng-ML commented Dec 5, 2023

  • libai sft model_0000499(base on Llama-2-7b-hf):
{
    'results': {
        'hellaswag': {'acc': 0.5746863174666401, 'acc_stderr': 0.004933800927560528, 'acc_norm': 0.759609639514041, 'acc_norm_stderr': 0.004264472071282531}
    }, 
    'versions': {
        'hellaswag': 0
    }, 
    'config': {
        'model': 'llama', 'batch_size': 1, 'device': 'cuda:0', 'num_fewshot': 0, 'limit': None, 'bootstrap_iters': 100000
        }
}
  • Llama-2-7b-chat-hf
hf-causal (pretrained=/data/home/xiezipeng/hf_models/meta-llama/Llama-2-7b-chat-hf), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|  Task   |Version| Metric |Value |   |Stderr|
|---------|------:|--------|-----:|---|-----:|
|hellaswag|      0|acc     |0.5778|±  |0.0049|
|         |       |acc_norm|0.7553|±  |0.0043|

libai/models/utils/model_loader/base_loader.py Outdated Show resolved Hide resolved
train.update(
dict(
output_dir="./sft_result",
train_micro_batch_size=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size 只能跑 1 吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在A100上测的 2dp+4pp,batch_size最大只能是2

Copy link
Collaborator

@Ldpe2G Ldpe2G Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在A100上测的 2dp+4pp,batch_size最大只能是2

有没有试过其他的并行组合,比如 pp + tp ,还有 activation_checkpoint 有打开吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测了下2dp+2tp+2pp最大只能micro_batch_size=2,activation_checkpoint 稍后支持一下

model = build_model(cfg.model)
logger = logging.getLogger(__name__)
logger.info("Model:\n{}".format(model))
model._apply(dist.convert_to_distributed_default_setting)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_apply 接口一般不在外面调用,考虑改成 apply 或者导出一个新接口

projects/Llama/utils/eval_adapter.py Outdated Show resolved Hide resolved
projects/Llama/utils/eval_adapter.py Show resolved Hide resolved
projects/Llama/utils/prepare_alpaca.py Outdated Show resolved Hide resolved
projects/Llama/utils/prepare_alpaca.py Outdated Show resolved Hide resolved
@xiezipeng-ML xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot December 14, 2023 09:26
@xiezipeng-ML xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot December 15, 2023 07:34
@xiezipeng-ML xiezipeng-ML merged commit ddb5ea1 into main Dec 18, 2023
4 checks passed
@xiezipeng-ML xiezipeng-ML deleted the llama2 branch December 18, 2023 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants