Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any consideration on why use 4 sp & 32 tp? #74

Open
ParanoidHW opened this issue Apr 28, 2024 · 0 comments
Open

Any consideration on why use 4 sp & 32 tp? #74

ParanoidHW opened this issue Apr 28, 2024 · 0 comments

Comments

@ParanoidHW
Copy link

Hi, authors, great work!
I have a small question on the parallelism. It seems ring attention can hide the communication time under the local attn computation. So why still use more tensor parallelism than sequential parallelism, e.g. 32 tp vs. 4 sp during inference, instead the opposite? since the communication costs caused by TP cannot be ignored or overlapped.
Hope you can answer my question. Many thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant