-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset schedule earlier to allow overlap with ggml graph computation on device #6933
Conversation
The change looks good, I was aware that these |
It's this the reason the performance difference between a PCIE X2 P40 and PCIE X8 P40. |
Thanks, now moved the reset call as suggested. |
This change doesn't affect any PCIe data transfers, only CPU activity. However if you have different CPUs or CPU memory configs in the two systems it could contribute to any difference. |
…n device (ggerganov#6933) * Reset schedule earlier to allow overlap with graph computation on device
Previously, significant CPU memset calls between each token generation were on critical path.
This change performs these earlier, while the CPU is waiting for the previous token to be
generated on the device.
Refs #6763