-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. #8984
Conversation
…LAVA CLIP model. - The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available. - A GGML_OP_ACC shader has been added. - The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU. Signed-off-by: Changyeon Kim <[email protected]>
Signed-off-by: Changyeon Kim <[email protected]>
Signed-off-by: Changyeon Kim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation isn't correct yet. You can check that with test-backend-ops -o ACC
, here's the results from my system:
» build_vk/bin/test-backend-ops -o ACC
ggml_vulkan: Found 3 Vulkan devices:
Vulkan0: AMD Radeon Pro VII (RADV VEGA20) (radv) | uma: 0 | fp16: 1 | warp size: 64
Vulkan1: Tesla P40 (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32
Vulkan2: NVIDIA GeForce RTX 3090 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
Testing 4 backends
Backend 1/4 (CPU)
Skipping CPU backend
Backend 2/4 (Vulkan0)
Backend name: Vulkan0
ACC(type=f32,ne_a=[1024,577,1,1],ne_b=[1024,576,1,1]): [ACC] NMSE = 1.002194223 > 0.000000100 FAIL
1341/1342 tests passed
Backend Vulkan0: FAIL
Backend 3/4 (Vulkan1)
Backend name: Vulkan1
ACC(type=f32,ne_a=[1024,577,1,1],ne_b=[1024,576,1,1]): [ACC] NMSE = 1.000925279 > 0.000000100 FAIL
1341/1342 tests passed
Backend Vulkan1: FAIL
Backend 4/4 (Vulkan2)
Backend name: Vulkan2
ACC(type=f32,ne_a=[1024,577,1,1],ne_b=[1024,576,1,1]): [ACC] NMSE = 1.000223190 > 0.000000100 FAIL
1341/1342 tests passed
Backend Vulkan2: FAIL
1/4 backends passed
FAIL
Do you want to fix this or would you prefer me to? I don't mind, it's not a complicated operator and I have the most experience with the backend.
Signed-off-by: Changyeon Kim <[email protected]>
0cc4m Thank you for letting me know about the OP test method. Your comments have greatly contributed to my growth. As you mentioned, I confirmed that the parameter was missing and have made the necessary corrections. Here are the results from the retest. PS C:\work\llm\cyzero\llama.cpp.latest> .\build\bin\Release\test-backend-ops.exe -o ACC Backend 1/2 (CPU) 2/2 backends passed encode_image_with_clip: image encoded in 882.93 ms by CLIP ( 1.53 ms per image patch) The image shows a happy golden retriever dog with a blue bandana around its neck. The dog is sitting down on the grass and looking at the camera with a smile on its face. Its tongue is hanging out, showing a playful and joyful expression. The dog appears to be a beloved and well-cared-for pet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and passes the tests, only a small change needed and it's ready to merge.
Signed-off-by: Changyeon Kim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that looks correct.
…LAVA CLIP model. (ggerganov#8984) * llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. - The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available. - A GGML_OP_ACC shader has been added. - The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU. Signed-off-by: Changyeon Kim <[email protected]> * fix-up coding style. Signed-off-by: Changyeon Kim <[email protected]> * Fix-up the missing initial parameter to resolve the compilation warning. Signed-off-by: Changyeon Kim <[email protected]> * [fix] Add missing parameters. Signed-off-by: Changyeon Kim <[email protected]> * [fix] Use nb1 and nb2 for dst. Signed-off-by: Changyeon Kim <[email protected]> * Fix check results ggml_acc call --------- Signed-off-by: Changyeon Kim <[email protected]> Co-authored-by: 0cc4m <[email protected]>
…LAVA CLIP model. (ggerganov#8984) * llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. - The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available. - A GGML_OP_ACC shader has been added. - The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU. Signed-off-by: Changyeon Kim <[email protected]> * fix-up coding style. Signed-off-by: Changyeon Kim <[email protected]> * Fix-up the missing initial parameter to resolve the compilation warning. Signed-off-by: Changyeon Kim <[email protected]> * [fix] Add missing parameters. Signed-off-by: Changyeon Kim <[email protected]> * [fix] Use nb1 and nb2 for dst. Signed-off-by: Changyeon Kim <[email protected]> * Fix check results ggml_acc call --------- Signed-off-by: Changyeon Kim <[email protected]> Co-authored-by: 0cc4m <[email protected]>
The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available.
A GGML_OP_ACC shader has been added.
The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU.
I have read the contributing guidelines
Self-reported review complexity:
Test image : https://raw.githubusercontent.com/neuralmagic/deepsparse/main/tests/deepsparse/pipelines/sample_images/buddy.jpeg
master :
PR :
Full logs: