-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why the magatron_v4.patch
is needed?
#14
Comments
magatron_v4.patch
is needed?
Hi @hxdtest , the megatron_v4.patch is necessary for veRL for two main reasons:
For case 1, config.hidden_size should be equal to hidden_size. |
Many thanks for your reply. |
@PeterSH6 |
@hxdtest , we haven't tested verl on the 405B model. I think we can try it by using a larger TP size in rollout or implementing pipeline parallelism in vLLM rollout. This is one of our plans. |
https://github.com/volcengine/verl/blob/main/patches/megatron_v4.patch
For example:
what is the difference between hidden_size and config.hidden_size?
Why do you need
next_forward_k
andbackward_k
?Why
False
is needed?And for current apex, it seems that memory_efficient is set as False by default. fused_layer_norm.py
Why do you need overlap_param_gather? Does it have side-effects on training?
Many thanks !
The text was updated successfully, but these errors were encountered: