Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is GPU inference slow? #4

Open
budgetdevv opened this issue Sep 15, 2024 · 2 comments
Open

Why is GPU inference slow? #4

budgetdevv opened this issue Sep 15, 2024 · 2 comments
Assignees

Comments

@budgetdevv
Copy link
Owner

Find out why. I'm thinking because I am not using IOBindings.

@budgetdevv budgetdevv self-assigned this Sep 15, 2024
@budgetdevv
Copy link
Owner Author

Apparently it is just slow on DirectML - Full GPU utilization when inferencing on CUDA.

I guess it can still be faster, though, via using IOBindings.

@budgetdevv
Copy link
Owner Author

Inference speed for full, non-quantized model is around 6 seconds on my 8700K RTX 3070 setup ( Fits nicely into VRAM )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant