Blog post: A Hands-On Walkthrough on Model Quantization

What?

Quantization is a technique used to reduce the computational and memory overhead of a machine learning model by reducing the precision of the numbers used to represent the model's parameters. Typically, models use 32-bit floating-point numbers, but quantization converts these to 8-bit integers (or 4-bit integers). This can significantly reduce the model size and increase the inference speed, especially on CPUs and other hardware with limited computational resources. While this can lead to a slight reduction in model accuracy, the trade-off is often worthwhile for faster and more efficient deployments.

This GitHub repo contains the notebook from "A Hands-On Walkthrough on Model Quantization" blog post. This notebook demonstrates the process of quantizing and saving a Transformer model to improve the inference speed on a CPU and reduce the model size.

Notebook

Description	Link
A Hands-On Walkthrough on Model Quantization

License

See our LICENSE for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blog post: A Hands-On Walkthrough on Model Quantization

Table of contents

What?

Notebook

License

About

Releases

Packages

Languages

License

medoidai/model-quantization-blog-notebooks

Folders and files

Latest commit

History

Repository files navigation

Blog post: A Hands-On Walkthrough on Model Quantization

Table of contents

What?

Notebook

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages