-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama2 #524
Llama2 #524
Conversation
xiezipeng-ML
commented
Dec 5, 2023
•
edited
Loading
edited
- libai sft model_0000499(base on Llama-2-7b-hf):
- Llama-2-7b-chat-hf
projects/Llama/configs/llama_sft.py
Outdated
train.update( | ||
dict( | ||
output_dir="./sft_result", | ||
train_micro_batch_size=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_size 只能跑 1 吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在A100上测的 2dp+4pp,batch_size最大只能是2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在A100上测的 2dp+4pp,batch_size最大只能是2
有没有试过其他的并行组合,比如 pp + tp ,还有 activation_checkpoint 有打开吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
测了下2dp+2tp+2pp最大只能micro_batch_size=2,activation_checkpoint 稍后支持一下
model = build_model(cfg.model) | ||
logger = logging.getLogger(__name__) | ||
logger.info("Model:\n{}".format(model)) | ||
model._apply(dist.convert_to_distributed_default_setting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_apply
接口一般不在外面调用,考虑改成 apply
或者导出一个新接口