Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan Optimizations and Fixes #8959

Merged
merged 9 commits into from
Aug 14, 2024
Merged

Vulkan Optimizations and Fixes #8959

merged 9 commits into from
Aug 14, 2024

Conversation

0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Aug 9, 2024

I have implemented a number of Vulkan optimizations and fixes:

  • Implement REPEAT operator shader to fix low performance of Vulkan copy-based implementation
  • Use GLSL FMA instruction where possible
  • Add GGML_VULKAN_PERF option to get approximate performance data about a running model
  • Rework and fix Vulkan Descriptor Set handling, this improves performance in my tests on AMD RADV
  • Fix validation error on float32 concat f16 shader

I will keep this on draft while I check a few more things, but feel free to test and benchmark. Don't expect a huge difference.


@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 9, 2024
@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level performance Speed related topics bugfix fixes an issue or bug labels Aug 10, 2024
MaggotHATE added a commit to MaggotHATE/Llama_chat that referenced this pull request Aug 10, 2024
…ng and more

* fixed default sampling queue to include p_step
* changed sampling queue display to better reflect the actual logic
* added VK-specific settings `use_mmap_vk`, `flash_attn_vk`, `no_kv_offload_vk`
* added new presets for testing
@0cc4m 0cc4m marked this pull request as ready for review August 11, 2024 08:59
@0cc4m
Copy link
Collaborator Author

0cc4m commented Aug 11, 2024

I missed a validation issue in #8943, but the fix is now in this branch. I think this should be ready for a review and then merge.

sz * FLOAT_TYPE((data_a[ib0 + i].scales[v_im + 4] & 0x0f) | ((data_a[ib0 + i].scales[v_im] & 0xc0) >> 2)) + sw * FLOAT_TYPE((data_a[ib0 + i].scales[v_im + 5] & 0x0f) | ((data_a[ib0 + i].scales[v_im + 1] & 0xc0) >> 2))) - dmin * smin);
const uint tmp_idx = 16 * ix + tid;
tmp[tmp_idx] = fma(dall, (fma(sx, FLOAT_TYPE(data_a[ib0 + i].scales[v_im] & 0x3f), fma(sy, FLOAT_TYPE(data_a[ib0 + i].scales[v_im + 1] & 0x3f),
fma(sz, FLOAT_TYPE((data_a[ib0 + i].scales[v_im + 4] & 0x0f) | ((data_a[ib0 + i].scales[v_im] & 0xc0) >> 2)), fma(sw, FLOAT_TYPE((data_a[ib0 + i].scales[v_im + 5] & 0x0f) | ((data_a[ib0 + i].scales[v_im + 1] & 0xc0) >> 2))))))), fma(-dmin, smin, tmp[tmp_idx]));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you consider only the FMA changes, is there a measurable performance gain?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very hard to tell. The GLSL compiler should be using FMA instructions anyways, basically this change just makes it certain instead of leaving it to the optimizer. Hopefully this means a few more FMA calls in SPIR-V, which could be checked.

But afterwards the SPIR-V code gets compiled again to a device-specific driver-internal representation, where some more optimization takes place. Since there are many combinations of devices, I can't really be sure whether this helped anywhere, but at least I'm sure it doesn't cause slow downs. I haven't seen a significant performance difference on my devices.

@0cc4m 0cc4m requested a review from ggerganov August 13, 2024 18:44
@0cc4m
Copy link
Collaborator Author

0cc4m commented Aug 14, 2024

@ggerganov @slaren Can one of you review the non-Vulkan parts of this PR and approve if that's fine?

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to fix the CI before merging

@0cc4m 0cc4m merged commit 5fd89a7 into master Aug 14, 2024
53 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-optimization branch August 14, 2024 16:32
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Aug 14, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* Optimize Vulkan REPEAT performance

* Use Vulkan GLSL fused multiply-add instruction where possible

* Add GGML_VULKAN_PERF option to output performance data per operator

* Rework and fix Vulkan descriptor set and descriptor pool handling

* Fix float32 concat f16 shader validation error

* Add Vulkan GROUP_NORM eps parameter

* Fix validation error with transfer queue memory barrier flags

* Remove trailing whitespaces
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* Optimize Vulkan REPEAT performance

* Use Vulkan GLSL fused multiply-add instruction where possible

* Add GGML_VULKAN_PERF option to output performance data per operator

* Rework and fix Vulkan descriptor set and descriptor pool handling

* Fix float32 concat f16 shader validation error

* Add Vulkan GROUP_NORM eps parameter

* Fix validation error with transfer queue memory barrier flags

* Remove trailing whitespaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug ggml changes relating to the ggml tensor library for machine learning performance Speed related topics Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants