Nexesenex
released this
08 Dec 19:36
·
181 commits
to croco_exp_0
since this release
New IQ_K quants of Ikawrakow available for inference on Cuda.
- IQ2_K, IQ3_K, IQ4_K, IQ5_K and IQ6_K.
Almost no models, if any, are quantized with it and shared on HF.
But it's one step ahead.
The newer quants of IK are a bit harder to implement for me (I can't use the .c files of Llama.CPP and need to plainly integrate IK's work (in C++), so it'll take a bit longer, I learn as I do it basically.
It works on Python, I'm compiling an .exe for Pascal, Turing, and beyond right now.
Edit : I can't make an working .exe right now. I'll see what's up later.
What you can try if you don't know better :
Download the source, put the dll in the repository, install the requirements with the Install requirements.bat, then launch with Croco.Cpp_python_launch.bat
Non Cuda users, use the previous version. No IQ_K quants there yet, though.
I joined a compiled version of IK_LLAMA_CPP, with some edits of mine. Credits go to Ikawrakow.