-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a reason why backend couldn't be selected at runtime? #891
Comments
Backends often need to link to a shared library that may not be available on the systems without the supported hardware drivers installed. Eg. you can't run the CUDA backend on systems without the CUDA driver. In the future I would like to move the backends to dynamic libraries that can be loaded at runtime, but that's a more complex change than an if statement. |
You can easily have the host-side cpu inference method behind an if statement, right? It would be really convenient to switch it out and see the performance difference. For example I found my Vulkan implementation performs about the same as my CPU with 4 threads. |
Switching backend at runtime requires building all backend in the first place, which is complicated to setup, takes a lot of time and produces big binary size. For the same reason, pytorch offers different packages for CUDA/CPU/ROCm. Out of the box, ggml comes with CPU + a backend of your choice. |
There is nothing stopping you from building ggml with multiple backends and use all of them with |
+1 for that feature. |
We select the backend at build time by selecting CUDA, Vulkan, SYCL, etc. Wouldn't it be better if you build with the backends you want to support and then select the backend at runtime? It's literally just one runtime if statement and that would make it much easier to compare the performance of the different backends.
The text was updated successfully, but these errors were encountered: