sycl-exp : dequant q4 k improvements #7972

AidanBeltonS · 2024-06-17T09:58:09Z

This PR provides improvements to the dequantize_block_q4_K kernel. It focuses on improving the global memory accesses.

Three main changes are implemented:

Single 32 bit load for half2 rather than two 16 bit loads
Load all scales in to local memory then do random access on results
Vectorize the q load so we load 32bits each time rather than 8bits

All results below collected on A100 GPU

		Without Changes	With Changes	% Change
LLama-bench 70 B	PP Throughput (t/s)	503.36	564.04	-11.85	Negative change is better
	NSYS Avg Kernel time (us)	587.54	409.52	30.30	Positive change is better

No meaningful change in Intel GPU results have been observed.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

joeatodd

Looks good, and I've tested it all locally 👍

* Remove double lines * Single load for half2 * Store scales in local mem * Vectorize q load

Aidan added 4 commits June 17, 2024 10:16

Remove double lines

4a48155

Single load for half2

cb3fb42

Store scales in local mem

604ef6b

Vectorize q load

a235b7c

AidanBeltonS requested a review from joeatodd June 17, 2024 09:58

github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Jun 17, 2024

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 18, 2024

joeatodd approved these changes Jun 18, 2024

View reviewed changes

joeatodd merged commit 0e4699e into codeplay/sycl-main Jun 18, 2024
67 checks passed

Alcpz pushed a commit to Alcpz/llama.cpp that referenced this pull request Jun 20, 2024

sycl-exp : dequant q4 k improvements (ggerganov#7972)

92f007a

* Remove double lines * Single load for half2 * Store scales in local mem * Vectorize q load

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl-exp : dequant q4 k improvements #7972

sycl-exp : dequant q4 k improvements #7972

AidanBeltonS commented Jun 17, 2024

joeatodd left a comment

sycl-exp : dequant q4 k improvements #7972

sycl-exp : dequant q4 k improvements #7972

Conversation

AidanBeltonS commented Jun 17, 2024

joeatodd left a comment

Choose a reason for hiding this comment