-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native Intel IPEX-LLM Support #7190
Comments
@iamhumanipromise As my understanding of this issue, you hope use IPEX-LLM as backend to support Intel GPU. If yes, will it be quicker than TensorFlow/Pytorch? Why not use TenfsorFlow/Pytorch directly? |
IPEX LLM already supports llama.cpp I think: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html Also PyTorch's IPEX and openxla both use Intel OneAPI SYCL which is used by llama.cpp's SYCL backend. So, it is already supported. |
What IPEX-LLM has is a fork of llama.cpp and some other projects that has optimizations that have not been upstreamed here for one reason or another. I'm a current user of it and typically, usage of it doubles the speed of upstream. However, it can't support mixed GPU + CPU scenarios which is the main issue and new model support may take a while to filter over. Hence why I have upstream and the fork for my use cases. |
SYCL backend is still focusing on the missed functions to support more features and model. |
which issue/pull would you recommend we follow for latest info about the SYCL branch @NeoZhangJianyu? EDIT: I take it that [SYCL] Refactor would be it?
they seem to be keeping it reasonably up to date, as their published version of LlamaCPP-IPEX is using a week old version as its base line so far. I do hope they provide a bit of clarity about how to go about actually pulling new versions of their IPEX branch though... also Interesting about the lack of GPU overflow/partial offload capability, I was not aware of that... |
Going to respond to this since the other comment was deleted from another person from Intel. I think it should be working but for some reason, it fails as it forces you to fully offload in the case of something like Llama 3 8B or faults on an illegal instruction for something bigger like Llama 3 70B or Command-R which had support just added from what I tested. I haven't upgraded in a while, so I'll probably be rechecking this before opening a ticket in the other repository to fix this since upstream works but the fork doesn't in this situation. |
Do they have a fork on github for llamacpp? I actually haven't found it, I just installed from the readingthedocs site that I linked to. hell, I don't actually know how to go about updating the install; just have a hypothesis about what I need to do. |
Most of the stuff for IPEX-LLM has been upstreamed into llama.cpp. IPEX-LLM llama.cpp vs llama.cpp (upstream) is basically the same perf at this point. I think the question shouldn't be for IPEX-LLM support, but for SYCL support using upstream llama.cpp (which the IPEX-LLM team is already upstreaming into llama.cpp) Also note that this doesn't require IPEX itself. IPEX-LLM does, but the native SYCL support does not. And yes I work for Intel and yes I'm talking to IPEX-LLM teams and others :) |
With a Q6_K quant of a llama 3 that had been quanted from a BF16 GGUF with the correct pre-tokeniser and EOS token, I get 30 tokens per second at the beginning of context with the IPEX branch compared to 17 tokens per second with the llamacpp-SYCL version b2885. that's actually quite a stark difference in performance as I see it and I feel that if it's possible, it'd be awesome to see the performance of the IPEX branch becoming generally available from the standard SYCL branch of llamacpp, as installing the IPEX branch was troublesome. So I'll be waiting with bated breath, I guess. |
Yeah that’s fair. It definitely depends on model size etc. Will work with the team to try to upstream asap as we can. |
I suggest to use the latest code in master branch. |
Can also attest to differences in SYCL build (as outlined in https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md) and the IPEX-LLM branch. Intel Arc A770M, Llama 3 8B Q8_0, full offload with the prompt |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
I have found this closed issue where someone manually (?how?) implemented IPEX-LLM. However, looking forward to native IPEX-LLM support for Intel Xe iGPUs + Intel Arc dGPUs on Windows and Linux
#7042
TL;DR is IPEX-LLM now provides a C++ interface, which can be used as a backend for running llama.cpp on Intel GPUs. Incorporating this interface into llama.cpp would allow for leveraging the optimized performance of IPEX-LLM.
Motivation
Intel Xe graphics launched in 2020. Flex, Max Datacenter and Arc Consumer cards for laptop and desktop launched in 2022. This is a lot of devices in production/circulation.
This would "permit" llama.cpp users to utilize their integrated Xe GPUs and dedicated Arc GPUs, Datacenter Flex and Max cards with llama.cpp on BOTH Windows and Linux natively (without a confusing manual build).
Possible Implementation
The implementation of native Intel IPEX-LLM support would be something like... Integrate --> Test --> Document --> Release.
Full manual/guide: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html
Full verified model list: https://ipex-llm.readthedocs.io/en/latest/#verified-models
Github: https://github.com/intel-analytics/ipex-llm
The "owners" of this process will be the devs and engineers here; in this Github (simple nerds such as myself do not have the expertise to tackle something like this... even locally)
For example from the documentation it looks like this would be create a new conda envioronment --> set up environment --> configure oneapi variables --> update cmakelists.txt or makefile with paths to IPEX-LLM library and headers --> then ??map llama.cpp functionalities to ipex apis (which Intel has already done).
The "owners" of this step would be wide-ranging overall.
Documentation and Examples: Someone would have to "own" updating the documentation to guide users on how to enable and use the new IPEX-LLM support. Providing examples and quickstart guides can significantly help; but ultimately for independent users it will be up to them and then for GUI and TUI/CLI frontends, the documentation will need to be updated by them.
Release After all of this has been done, going forward to launch woot woot.
I'm sure there are many, many steps I am missing here. Just wanted to "kick off" the process.
The text was updated successfully, but these errors were encountered: