Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 712 Bytes

204-03-18_quantization_and_pruning.md

File metadata and controls

15 lines (11 loc) · 712 Bytes

How to apply quantization on a model with LLamaCPP

url: https://medium.com/@ingridwickstevens/quantization-of-llms-with-llama-cpp-9bbf59deda35
title: "Quantization of LLMs with llama.cpp"
description: "Understanding and Implementing n-bit Quantization Techniques for Efficient Inference in LLMs"
host: medium.com
favicon: https://miro.medium.com/v2/1*m-R_BkNf1Qjr1YbyOIJY2w.png
image: https://miro.medium.com/v2/resize:fit:1024/1*MZr3VVarzQPWuZs63TdfxQ.jpeg