b3813

github-actions released this 24 Sep 01:36

116efee

cuda: add q8_0->f32 cpy operation (#9571)

llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.

Assets 22

Provide feedback