b2902 #122

Nexesenex · 2024-05-16T13:21:18Z

No description provided.

Signed-off-by: Daniel Bevenius <[email protected]>

… MSVC (#7191) * logging: add proper checks for clang to avoid errors and warnings with VA_ARGS * build: add CMake Presets and toolchian files for Windows ARM64 * matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings * ci: add support for optimized Windows ARM64 builds with MSVC and LLVM * matmul-int8: fixed typos in q8_0_q8_0 matmuls Co-authored-by: Georgi Gerganov <[email protected]> * matmul-int8: remove unnecessary casts in q8_0_q8_0 --------- Co-authored-by: Georgi Gerganov <[email protected]>

Switch to Ninja Multi-Config CMake generator to resurect bin/Release path that broke artifact packaging in CI.

…l. (#7288) * chore: add references to the quantisation space. * fix grammer lol. * Update README.md Co-authored-by: Julien Chaumond <[email protected]> * Update README.md Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Julien Chaumond <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

Co-authored-by: Brian <[email protected]>

ref: #7293

This can be overridden with the -m command line option ref: #7293

* Adding q6_0_r4 We get PP-512(LLaMA-3.1-8B) = 257 t/s on a Ryzen-7950X. * q6_0_r4: NEON We get PP-512(LLaMA-3.1-8B) = 95 t/s on M2-Max. In terms of ops, q6_0_r4 is identical to q5_0_r4 except for loading the high bits being vld1q_u8_x2 instead of vld1q_u8. It is strange that this can make a 5% difference in performance, especially considering that this is amortized (re-used) over 8 columns in the right matrix. Or am I running out of vector registers? * Fix AVX2 --------- Co-authored-by: Iwan Kawrakow <[email protected]>

danbev and others added 8 commits May 15, 2024 23:41

readme : remove stray double quote (#7310)

8f7080b

Signed-off-by: Daniel Bevenius <[email protected]>

ci: fix bin/Release path for windows-arm64 builds (#7317)

172b782

Switch to Ninja Multi-Config CMake generator to resurect bin/Release path that broke artifact packaging in CI.

grammar, json, llama: replace push on emplace if it possible (#7273)

0350f58

convert : get general.name from model dir, not its parent (#5615)

dda64fc

Co-authored-by: Brian <[email protected]>

rpc : add command line arg for specifying backend memory

3b3963c

ref: #7293

rpc : get available mem for the CPU backend

9afdffe

This can be overridden with the -m command line option ref: #7293

Nexesenex merged commit bff35ad into Nexesenex:downstream May 16, 2024
32 of 40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b2902 #122

b2902 #122

Nexesenex commented May 16, 2024

b2902 #122

b2902 #122

Conversation

Nexesenex commented May 16, 2024