Skip to content

b3813

Compare
Choose a tag to compare
@github-actions github-actions released this 24 Sep 01:36
116efee
cuda: add q8_0->f32 cpy operation (#9571)

llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.