Skip to content

Vim‐QQ

Oleksandr Kuvshynov edited this page Aug 7, 2024 · 1 revision

Just some notes.

Server on CPU:

./llama.cpp/llama-server --model ./llms/gguf/llama3.1.70b.q8.inst.gguf --chat-template llama3 --host 0.0.0.0 --top_p 0.0 --top_k 1 --port 8080 -ngl 0

Server on GPU:

./llama.cpp/llama-server --model ./llms/gguf/llama3.1.70b.q8.inst.gguf --chat-template llama3 --host 0.0.0.0 --top_p 0.0 --top_k 1 --port 8080 -ngl 99

Clone this wiki locally